Skip to content

AVM Bytecode Encoding


title: AVM Bytecode Encoding tags: - AVM green paper - bytecode - serialization


Agda Compilation Error

File: AVM/Bytecode.lagda.md

Agda compilation failed: Stdout: Checking AVM.Bytecode (/tmp/agda-mkdocs-fol06ug7/AVM/Bytecode.lagda.md). /tmp/agda-mkdocs-fol06ug7/AVM/Bytecode.lagda.md:25,88-97 (ℕ → TxId) !=< Set when checking that the expression freshTxId has type Set

Original Content

{-# OPTIONS --without-K --type-in-type --guardedness --exact-split #-}
open import Foundation.BasicTypes

Module Parameters

module AVM.Bytecode
    -- Core types
    (Val : Set)
    (ObjectId : Set)
    (freshObjectId :   ObjectId)

    -- Controller/distribution types
    (ControllerId : Set)
    (ControllerVersion : Set)

    -- Transaction types
    (TxId : Set)
    (freshTxId :   TxId)
  where

-- Import Objects module to get Object
open import AVM.Objects Val ObjectId freshObjectId ControllerId ControllerVersion TxId freshTxId
  using (Object)

-- Import Instruction module with Object
open import AVM.Instruction Val ObjectId freshObjectId ControllerId ControllerVersion TxId freshTxId Object

Motivation

While the typed Instr₀ representation provides type safety and enables formal reasoning, a practical AVM implementation requires a serializable bytecode format for storage, transmission, and execution. This module defines the bytecode encoding and decoding mechanisms that bridge the gap between the high-level typed instruction set and the low-level executable representation.

Bytecode Representation

The bytecode layer erases type information, representing instructions as opcodes with encoded arguments. Since messages are already represented as Val S-expressions, the bytecode layer directly uses this representation for instruction arguments.

Opcodes

Each instruction in the Instr₀ type is assigned a unique opcode identifier:

data Opcode : Set where
  -- Transactions (Instr₀)
  OP_BEGIN_TX        : Opcode  -- 0x01
  OP_COMMIT_TX       : Opcode  -- 0x02
  OP_ABORT_TX        : Opcode  -- 0x03

  -- Base object ops (Instr₀)
  OP_CREATE_OBJ      : Opcode  -- 0x10
  OP_DESTROY_OBJ     : Opcode  -- 0x11
  OP_CALL            : Opcode  -- 0x20
  OP_SCRY_META       : Opcode  -- 0x30
  OP_SCRY_DEEP       : Opcode  -- 0x31
  OP_REFLECT         : Opcode  -- 0x32
  OP_SELF            : Opcode  -- 0x33
  OP_INPUT           : Opcode  -- 0x34

  -- Pure functions (Instr₁)
  OP_PURE_CALL       : Opcode  -- 0x40
  OP_PURE_REGISTER   : Opcode  -- 0x41

  -- Distribution (Instr₂)
  OP_GET_CURR_CTRL   : Opcode  -- 0x50
  OP_GET_CTRL        : Opcode  -- 0x51
  OP_TELEPORT        : Opcode  -- 0x52
  OP_TRANSFER_OBJ    : Opcode  -- 0x53
  OP_GET_CURR_VER    : Opcode  -- 0x54
  OP_SWITCH_VER      : Opcode  -- 0x55

  -- Control flow (not yet specified)
  -- TODO: branching, jumps, stack/register ops

Bytecode Instructions

A bytecode instruction pairs an opcode with its arguments encoded as S-expressions:

data Bytecode : Set where
  BC : Opcode  List Val  Bytecode

Encoding Typed Instructions

We define an encoding function that transforms typed instructions into untyped bytecode, erasing return type information. Since Input and Output are aliases for Val, messages require no encoding.

First, we postulate encoders for the composite types that appear in instruction arguments:

postulate
  encodeTxId : TxId  Val
  encodeObjId : ObjectId  Val
  encodeMetaFilter : (ObjectMeta  Bool)  Val
  encodeDeepFilter : (Object  ObjectMeta  Bool)  Val
  encodeControllerId : ControllerId  Val
  encodeControllerVersion : ControllerVersion  Val
  encodeString : String  Val
  encodeValList : List Val  Val

The encoding function maps each typed instruction to its bytecode representation:

encode₀ : {S : Safety} {A : Set}  Instr₀ S A  Bytecode
encode₀ (Obj (createObj inp))    = BC OP_CREATE_OBJ (inp  [])
encode₀ (Obj (destroyObj oid))   = BC OP_DESTROY_OBJ (encodeObjId oid  [])
encode₀ (Obj (call oid inp))     = BC OP_CALL (encodeObjId oid  inp  [])
encode₀ (Introspect self)        = BC OP_SELF []
encode₀ (Introspect input)       = BC OP_INPUT []
encode₀ (Introspect (scryMeta f)) = BC OP_SCRY_META (encodeMetaFilter f  [])
encode₀ (Introspect (scryDeep f)) = BC OP_SCRY_DEEP (encodeDeepFilter f  [])
encode₀ (Introspect (reflect oid)) = BC OP_REFLECT (encodeObjId oid  [])

encode₁ : {S : Safety} {A : Set}  Instr₁ S A  Bytecode
encode₁ (Obj instr)             = encode₀ (Obj instr)
encode₁ (Introspect instr)      = encode₀ (Introspect instr)
encode₁ (Tx beginTx)            = BC OP_BEGIN_TX []
encode₁ (Tx (commitTx txid))    = BC OP_COMMIT_TX (encodeTxId txid  [])
encode₁ (Tx (abortTx txid))     = BC OP_ABORT_TX (encodeTxId txid  [])

encode₂ : {S : Safety} {A : Set}  Instr₂ S A  Bytecode
encode₂ (Obj instr)             = encode₀ (Obj instr)
encode₂ (Introspect instr)      = encode₀ (Introspect instr)
encode₂ (Tx instr)              = encode₁ (Tx instr)
encode₂ (Pure (callPure name args))     = BC OP_PURE_CALL (encodeString name  encodeValList args  [])
encode₂ (Pure (registerPure name f))    = BC OP_PURE_REGISTER (encodeString name  [])

Decoding with Runtime Type Checking

Decoding bytecode requires runtime validation since type information has been erased. The decoder must verify that arguments match the expected types for each opcode and restore the safety level information.

Decode Result Type

The decoding result includes the safety level and potential type errors:

data DecodeResult : Set where
  SafeOp₀    : {A : Set}  Instr₀ Safe A  DecodeResult
  UnsafeOp₀  : {A : Set}  Instr₀ Unsafe A  DecodeResult
  SafeOp₁    : {A : Set}  Instr₁ Safe A  DecodeResult
  UnsafeOp₁  : {A : Set}  Instr₁ Unsafe A  DecodeResult
  SafeOp₂    : {A : Set}  Instr₂ Safe A  DecodeResult
  UnsafeOp₂  : {A : Set}  Instr₂ Unsafe A  DecodeResult
  TypeError  : String  DecodeResult

Argument Decoders

We postulate decoders for composite argument types. These return Maybe values to handle decoding failures:

postulate
  decodeTxId : Val  Maybe TxId
  decodeObjId : Val  Maybe ObjectId
  decodeMetaFilter : Val  Maybe (ObjectMeta  Bool)
  decodeDeepFilter : Val  Maybe (Object  ObjectMeta  Bool)
  decodeControllerId : Val  Maybe ControllerId
  decodeControllerVersion : Val  Maybe ControllerVersion
  decodeString : Val  Maybe String
  decodeValList : Val  Maybe (List Val)

Bytecode Decoder (Instr₀)

The decoder validates both the opcode and the argument list structure. For each opcode, it checks that the correct number and type of arguments are provided:

decode₀ : Bytecode  DecodeResult
decode₀ (BC op args) with op | args
... | OP_BEGIN_TX    | [] = SafeOp₁ (Tx beginTx)
... | OP_BEGIN_TX    | (_  _) = TypeError "beginTx takes no arguments"
... | OP_COMMIT_TX   | [] = TypeError "commitTx requires txid argument"
... | OP_COMMIT_TX   | (v  []) = decodeCommit v
  where
    decodeCommit : Val  DecodeResult
    decodeCommit v with decodeTxId v
    ... | just txid = SafeOp₁ (Tx (commitTx txid))
    ... | nothing   = TypeError "invalid txid"
... | OP_COMMIT_TX   | (_  _  _) = TypeError "commitTx takes exactly one argument"
... | OP_ABORT_TX    | [] = TypeError "abortTx requires txid argument"
... | OP_ABORT_TX    | (v  []) = decodeAbort v
  where
    decodeAbort : Val  DecodeResult
    decodeAbort v with decodeTxId v
    ... | just txid = SafeOp₁ (Tx (abortTx txid))
    ... | nothing   = TypeError "invalid txid"
... | OP_ABORT_TX    | (_  _  _) = TypeError "abortTx takes exactly one argument"
... | OP_CREATE_OBJ  | [] = TypeError "createObj requires input argument"
... | OP_CREATE_OBJ  | (inp  []) = SafeOp₀ (Obj (createObj inp))
... | OP_CREATE_OBJ  | (_  _  _) = TypeError "createObj takes exactly one argument"
... | OP_DESTROY_OBJ | [] = TypeError "destroyObj requires objectid argument"
... | OP_DESTROY_OBJ | (v  []) = decodeDestroy v
  where
    decodeDestroy : Val  DecodeResult
    decodeDestroy v with decodeObjId v
    ... | just oid = SafeOp₀ (Obj (destroyObj oid))
    ... | nothing  = TypeError "invalid objectid"
... | OP_DESTROY_OBJ | (_  _  _) = TypeError "destroyObj takes exactly one argument"
... | OP_CALL        | [] = TypeError "call requires objectid and input arguments"
... | OP_CALL        | (_  []) = TypeError "call requires both objectid and input arguments"
... | OP_CALL        | (v₁  v₂  []) = decodeCall v₁ v₂
  where
    decodeCall : Val  Val  DecodeResult
    decodeCall v₁ v₂ with decodeObjId v₁
    ... | just oid = SafeOp₀ (Obj (call oid v₂))
    ... | nothing  = TypeError "invalid objectid in call"
... | OP_CALL        | (_  _  _  _) = TypeError "call takes exactly two arguments"
... | OP_SCRY_META   | [] = TypeError "scryMeta requires filter argument"
... | OP_SCRY_META   | (v  []) = decodeScryMeta v
  where
    decodeScryMeta : Val  DecodeResult
    decodeScryMeta v with decodeMetaFilter v
    ... | just f  = UnsafeOp₀ (Introspect (scryMeta f))
    ... | nothing = TypeError "invalid metadata filter"
... | OP_SCRY_META   | (_  _  _) = TypeError "scryMeta takes exactly one argument"
... | OP_SCRY_DEEP   | [] = TypeError "scryDeep requires filter argument"
... | OP_SCRY_DEEP   | (v  []) = decodeScryDeep v
  where
    decodeScryDeep : Val  DecodeResult
    decodeScryDeep v with decodeDeepFilter v
    ... | just f  = UnsafeOp₀ (Introspect (scryDeep f))
    ... | nothing = TypeError "invalid deep filter"
... | OP_SCRY_DEEP   | (_  _  _) = TypeError "scryDeep takes exactly one argument"
... | OP_REFLECT     | [] = TypeError "reflect requires objectid argument"
... | OP_REFLECT     | (v  []) = decodeReflect v
  where
    decodeReflect : Val  DecodeResult
    decodeReflect v with decodeObjId v
    ... | just oid = UnsafeOp₀ (Introspect (reflect oid))
    ... | nothing  = TypeError "invalid objectid"
... | OP_REFLECT     | (_  _  _) = TypeError "reflect takes exactly one argument"
... | OP_SELF        | [] = SafeOp₀ (Introspect self)
... | OP_SELF        | (_  _) = TypeError "self takes no arguments"
... | OP_INPUT       | [] = SafeOp₀ (Introspect input)
... | OP_INPUT       | (_  _) = TypeError "input takes no arguments"
... | _              | _       = TypeError "unsupported opcode in decode₀"

Program Serialization

Finally, we postulate a function that serializes an entire AVM program into a bytecode stream:

postulate
  serializeProgram₀ : {A : Set}  AVMProgram₀ A  List Bytecode
  serializeProgram₁ : {A : Set}  AVMProgram₁ A  List Bytecode
  serializeProgram₂ : {A : Set}  AVMProgram₂ A  List Bytecode

Limitations and Future Work

The bytecode representation targets the Instruction layers as follows:

  • Instr₀: Fully covered (base + transactions).
  • Instr₁: callPure and registerPure are represented, but function bodies are not serialized (registration is unsafe and environment-dependent).
  • Instr₂: Distribution instructions are represented. Semantic guards (e.g., teleport/switchVersion invalid during active transactions) are enforced by the interpreter, not the decoder.

Notable gaps include:

Control Flow Instructions. The conditional branches in programs like kudosTransfer require opcodes for branching, jumping, and labels that are not present in the current Opcode type. Without these, we cannot represent the if-then-else logic that appears throughout AVM programs.

Continuation Representation. Interaction trees use monadic bind >>= to sequence operations and capture continuations. The bytecode layer must encode these dependencies, ensuring that return values from one instruction are available to subsequent instructions. This requires either:

  • A stack-based architecture with push/pop operations
  • A register-based architecture with explicit register allocation
  • Continuation-passing style with explicit closures

Instructions like beginTx return values (e.g., TxId) that are referenced later (e.g., in commitTx txId). The bytecode VM needs a mechanism to bind these intermediate results—either through a stack, named registers, or an environment of local variables.

A complete bytecode specification would extend Opcode with additional instructions that handle control flow and continuations.

Bytecode Decoder (Instr₁)

decode₁ : Bytecode → DecodeResult decode₁ (BC op args) with op | args ... | OP_PURE_CALL | (vName ∷ vArgs ∷ []) with decodeString vName | decodeValList vArgs ... | _ | _ = TypeError "callPure expects (name : String, args : List Val)" ... | just name | just args = SafeOp₁ (pure (callPure name args)) ... | _ | _ = TypeError "invalid callPure arguments" ... | OP_PURE_CALL | _ = TypeError "callPure expects (name : String, args : List Val)" ... | OP_PURE_REGISTER | (vName ∷ []) with decodeString vName ... | _ | _ = TypeError "registerPure expects (name : String)" ... | just name = UnsafeOp₁ (pure (registerPure name (λ _ → nothing))) ... | nothing = TypeError "invalid registerPure name" ... | _ | _ = decode₀ (BC op args)

Bytecode Decoder (Instr₂)

decode₂ : Bytecode → DecodeResult decode₂ (BC op args) with op | args ... | OP_GET_CURR_CTRL | [] = SafeOp₂ (distrib getCurrentController) ... | OP_GET_CURR_CTRL | _ = TypeError "getCurrentController takes no arguments" ... | OP_GET_CTRL | (v ∷ []) with decodeObjId v ... | _ | _ = TypeError "getController expects (objectid)" ... | just oid = SafeOp₂ (distrib (getController oid)) ... | nothing = TypeError "invalid objectid" ... | OP_TELEPORT | (v ∷ []) with decodeControllerId v ... | _ | _ = TypeError "teleport expects (controllerId)" ... | just cid = SafeOp₂ (distrib (teleport cid)) ... | nothing = TypeError "invalid controllerId" ... | OP_TRANSFER_OBJ | (v₁ ∷ v₂ ∷ []) with decodeObjId v₁ | decodeControllerId v₂ ... | _ | _ = TypeError "transferObject expects (objectid, controllerId)" ... | just oid | just cid = SafeOp₂ (distrib (transferObject oid cid)) ... | _ | _ = TypeError "invalid transferObject arguments" ... | OP_GET_CURR_VER | [] = SafeOp₂ (distrib getCurrentVersion) ... | OP_GET_CURR_VER | _ = TypeError "getCurrentVersion takes no arguments" ... | OP_SWITCH_VER | (v ∷ []) with decodeControllerVersion v ... | _ | _ = TypeError "switchVersion expects (version)" ... | just ver = SafeOp₂ (distrib (switchVersion ver)) ... | nothing = TypeError "invalid version" ... | _ | _ = decode₁ (BC op args)