AVM Bytecode Encoding
title: AVM Bytecode Encoding tags: - AVM green paper - bytecode - serialization
Agda Compilation Error
File: AVM/Bytecode.lagda.md
Agda compilation failed: Stdout: Checking AVM.Bytecode (/tmp/agda-mkdocs-fol06ug7/AVM/Bytecode.lagda.md). /tmp/agda-mkdocs-fol06ug7/AVM/Bytecode.lagda.md:25,88-97 (ℕ → TxId) !=< Set when checking that the expression freshTxId has type Set
Original Content
{-# OPTIONS --without-K --type-in-type --guardedness --exact-split #-}
open import Foundation.BasicTypes
Module Parameters
module AVM.Bytecode
-- Core types
(Val : Set)
(ObjectId : Set)
(freshObjectId : ℕ → ObjectId)
-- Controller/distribution types
(ControllerId : Set)
(ControllerVersion : Set)
-- Transaction types
(TxId : Set)
(freshTxId : ℕ → TxId)
where
-- Import Objects module to get Object
open import AVM.Objects Val ObjectId freshObjectId ControllerId ControllerVersion TxId freshTxId
using (Object)
-- Import Instruction module with Object
open import AVM.Instruction Val ObjectId freshObjectId ControllerId ControllerVersion TxId freshTxId Object
Motivation
While the typed Instr₀ representation provides type safety and enables formal reasoning, a practical AVM implementation requires a serializable bytecode format for storage, transmission, and execution. This module defines the bytecode encoding and decoding mechanisms that bridge the gap between the high-level typed instruction set and the low-level executable representation.
Bytecode Representation
The bytecode layer erases type information, representing instructions as opcodes with encoded arguments. Since messages are already represented as Val S-expressions, the bytecode layer directly uses this representation for instruction arguments.
Opcodes
Each instruction in the Instr₀ type is assigned a unique opcode identifier:
data Opcode : Set where
-- Transactions (Instr₀)
OP_BEGIN_TX : Opcode -- 0x01
OP_COMMIT_TX : Opcode -- 0x02
OP_ABORT_TX : Opcode -- 0x03
-- Base object ops (Instr₀)
OP_CREATE_OBJ : Opcode -- 0x10
OP_DESTROY_OBJ : Opcode -- 0x11
OP_CALL : Opcode -- 0x20
OP_SCRY_META : Opcode -- 0x30
OP_SCRY_DEEP : Opcode -- 0x31
OP_REFLECT : Opcode -- 0x32
OP_SELF : Opcode -- 0x33
OP_INPUT : Opcode -- 0x34
-- Pure functions (Instr₁)
OP_PURE_CALL : Opcode -- 0x40
OP_PURE_REGISTER : Opcode -- 0x41
-- Distribution (Instr₂)
OP_GET_CURR_CTRL : Opcode -- 0x50
OP_GET_CTRL : Opcode -- 0x51
OP_TELEPORT : Opcode -- 0x52
OP_TRANSFER_OBJ : Opcode -- 0x53
OP_GET_CURR_VER : Opcode -- 0x54
OP_SWITCH_VER : Opcode -- 0x55
-- Control flow (not yet specified)
-- TODO: branching, jumps, stack/register ops
Bytecode Instructions
A bytecode instruction pairs an opcode with its arguments encoded as S-expressions:
data Bytecode : Set where
BC : Opcode → List Val → Bytecode
Encoding Typed Instructions
We define an encoding function that transforms typed instructions into untyped bytecode, erasing return type information. Since Input and Output are aliases for Val, messages require no encoding.
First, we postulate encoders for the composite types that appear in instruction arguments:
postulate
encodeTxId : TxId → Val
encodeObjId : ObjectId → Val
encodeMetaFilter : (ObjectMeta → Bool) → Val
encodeDeepFilter : (Object → ObjectMeta → Bool) → Val
encodeControllerId : ControllerId → Val
encodeControllerVersion : ControllerVersion → Val
encodeString : String → Val
encodeValList : List Val → Val
The encoding function maps each typed instruction to its bytecode representation:
encode₀ : {S : Safety} {A : Set} → Instr₀ S A → Bytecode
encode₀ (Obj (createObj inp)) = BC OP_CREATE_OBJ (inp ∷ [])
encode₀ (Obj (destroyObj oid)) = BC OP_DESTROY_OBJ (encodeObjId oid ∷ [])
encode₀ (Obj (call oid inp)) = BC OP_CALL (encodeObjId oid ∷ inp ∷ [])
encode₀ (Introspect self) = BC OP_SELF []
encode₀ (Introspect input) = BC OP_INPUT []
encode₀ (Introspect (scryMeta f)) = BC OP_SCRY_META (encodeMetaFilter f ∷ [])
encode₀ (Introspect (scryDeep f)) = BC OP_SCRY_DEEP (encodeDeepFilter f ∷ [])
encode₀ (Introspect (reflect oid)) = BC OP_REFLECT (encodeObjId oid ∷ [])
encode₁ : {S : Safety} {A : Set} → Instr₁ S A → Bytecode
encode₁ (Obj instr) = encode₀ (Obj instr)
encode₁ (Introspect instr) = encode₀ (Introspect instr)
encode₁ (Tx beginTx) = BC OP_BEGIN_TX []
encode₁ (Tx (commitTx txid)) = BC OP_COMMIT_TX (encodeTxId txid ∷ [])
encode₁ (Tx (abortTx txid)) = BC OP_ABORT_TX (encodeTxId txid ∷ [])
encode₂ : {S : Safety} {A : Set} → Instr₂ S A → Bytecode
encode₂ (Obj instr) = encode₀ (Obj instr)
encode₂ (Introspect instr) = encode₀ (Introspect instr)
encode₂ (Tx instr) = encode₁ (Tx instr)
encode₂ (Pure (callPure name args)) = BC OP_PURE_CALL (encodeString name ∷ encodeValList args ∷ [])
encode₂ (Pure (registerPure name f)) = BC OP_PURE_REGISTER (encodeString name ∷ [])
Decoding with Runtime Type Checking
Decoding bytecode requires runtime validation since type information has been erased. The decoder must verify that arguments match the expected types for each opcode and restore the safety level information.
Decode Result Type
The decoding result includes the safety level and potential type errors:
data DecodeResult : Set where
SafeOp₀ : {A : Set} → Instr₀ Safe A → DecodeResult
UnsafeOp₀ : {A : Set} → Instr₀ Unsafe A → DecodeResult
SafeOp₁ : {A : Set} → Instr₁ Safe A → DecodeResult
UnsafeOp₁ : {A : Set} → Instr₁ Unsafe A → DecodeResult
SafeOp₂ : {A : Set} → Instr₂ Safe A → DecodeResult
UnsafeOp₂ : {A : Set} → Instr₂ Unsafe A → DecodeResult
TypeError : String → DecodeResult
Argument Decoders
We postulate decoders for composite argument types. These return Maybe values to handle decoding failures:
postulate
decodeTxId : Val → Maybe TxId
decodeObjId : Val → Maybe ObjectId
decodeMetaFilter : Val → Maybe (ObjectMeta → Bool)
decodeDeepFilter : Val → Maybe (Object → ObjectMeta → Bool)
decodeControllerId : Val → Maybe ControllerId
decodeControllerVersion : Val → Maybe ControllerVersion
decodeString : Val → Maybe String
decodeValList : Val → Maybe (List Val)
Bytecode Decoder (Instr₀)
The decoder validates both the opcode and the argument list structure. For each opcode, it checks that the correct number and type of arguments are provided:
decode₀ : Bytecode → DecodeResult
decode₀ (BC op args) with op | args
... | OP_BEGIN_TX | [] = SafeOp₁ (Tx beginTx)
... | OP_BEGIN_TX | (_ ∷ _) = TypeError "beginTx takes no arguments"
... | OP_COMMIT_TX | [] = TypeError "commitTx requires txid argument"
... | OP_COMMIT_TX | (v ∷ []) = decodeCommit v
where
decodeCommit : Val → DecodeResult
decodeCommit v with decodeTxId v
... | just txid = SafeOp₁ (Tx (commitTx txid))
... | nothing = TypeError "invalid txid"
... | OP_COMMIT_TX | (_ ∷ _ ∷ _) = TypeError "commitTx takes exactly one argument"
... | OP_ABORT_TX | [] = TypeError "abortTx requires txid argument"
... | OP_ABORT_TX | (v ∷ []) = decodeAbort v
where
decodeAbort : Val → DecodeResult
decodeAbort v with decodeTxId v
... | just txid = SafeOp₁ (Tx (abortTx txid))
... | nothing = TypeError "invalid txid"
... | OP_ABORT_TX | (_ ∷ _ ∷ _) = TypeError "abortTx takes exactly one argument"
... | OP_CREATE_OBJ | [] = TypeError "createObj requires input argument"
... | OP_CREATE_OBJ | (inp ∷ []) = SafeOp₀ (Obj (createObj inp))
... | OP_CREATE_OBJ | (_ ∷ _ ∷ _) = TypeError "createObj takes exactly one argument"
... | OP_DESTROY_OBJ | [] = TypeError "destroyObj requires objectid argument"
... | OP_DESTROY_OBJ | (v ∷ []) = decodeDestroy v
where
decodeDestroy : Val → DecodeResult
decodeDestroy v with decodeObjId v
... | just oid = SafeOp₀ (Obj (destroyObj oid))
... | nothing = TypeError "invalid objectid"
... | OP_DESTROY_OBJ | (_ ∷ _ ∷ _) = TypeError "destroyObj takes exactly one argument"
... | OP_CALL | [] = TypeError "call requires objectid and input arguments"
... | OP_CALL | (_ ∷ []) = TypeError "call requires both objectid and input arguments"
... | OP_CALL | (v₁ ∷ v₂ ∷ []) = decodeCall v₁ v₂
where
decodeCall : Val → Val → DecodeResult
decodeCall v₁ v₂ with decodeObjId v₁
... | just oid = SafeOp₀ (Obj (call oid v₂))
... | nothing = TypeError "invalid objectid in call"
... | OP_CALL | (_ ∷ _ ∷ _ ∷ _) = TypeError "call takes exactly two arguments"
... | OP_SCRY_META | [] = TypeError "scryMeta requires filter argument"
... | OP_SCRY_META | (v ∷ []) = decodeScryMeta v
where
decodeScryMeta : Val → DecodeResult
decodeScryMeta v with decodeMetaFilter v
... | just f = UnsafeOp₀ (Introspect (scryMeta f))
... | nothing = TypeError "invalid metadata filter"
... | OP_SCRY_META | (_ ∷ _ ∷ _) = TypeError "scryMeta takes exactly one argument"
... | OP_SCRY_DEEP | [] = TypeError "scryDeep requires filter argument"
... | OP_SCRY_DEEP | (v ∷ []) = decodeScryDeep v
where
decodeScryDeep : Val → DecodeResult
decodeScryDeep v with decodeDeepFilter v
... | just f = UnsafeOp₀ (Introspect (scryDeep f))
... | nothing = TypeError "invalid deep filter"
... | OP_SCRY_DEEP | (_ ∷ _ ∷ _) = TypeError "scryDeep takes exactly one argument"
... | OP_REFLECT | [] = TypeError "reflect requires objectid argument"
... | OP_REFLECT | (v ∷ []) = decodeReflect v
where
decodeReflect : Val → DecodeResult
decodeReflect v with decodeObjId v
... | just oid = UnsafeOp₀ (Introspect (reflect oid))
... | nothing = TypeError "invalid objectid"
... | OP_REFLECT | (_ ∷ _ ∷ _) = TypeError "reflect takes exactly one argument"
... | OP_SELF | [] = SafeOp₀ (Introspect self)
... | OP_SELF | (_ ∷ _) = TypeError "self takes no arguments"
... | OP_INPUT | [] = SafeOp₀ (Introspect input)
... | OP_INPUT | (_ ∷ _) = TypeError "input takes no arguments"
... | _ | _ = TypeError "unsupported opcode in decode₀"
Program Serialization
Finally, we postulate a function that serializes an entire AVM program into a bytecode stream:
postulate
serializeProgram₀ : {A : Set} → AVMProgram₀ A → List Bytecode
serializeProgram₁ : {A : Set} → AVMProgram₁ A → List Bytecode
serializeProgram₂ : {A : Set} → AVMProgram₂ A → List Bytecode
Limitations and Future Work
The bytecode representation targets the Instruction layers as follows:
- Instr₀: Fully covered (base + transactions).
- Instr₁:
callPureandregisterPureare represented, but function bodies are not serialized (registration is unsafe and environment-dependent). - Instr₂: Distribution instructions are represented. Semantic guards (e.g., teleport/switchVersion invalid during active transactions) are enforced by the interpreter, not the decoder.
Notable gaps include:
Control Flow Instructions. The conditional branches in programs like kudosTransfer require opcodes for branching, jumping, and labels that are not present in the current Opcode type. Without these, we cannot represent the if-then-else logic that appears throughout AVM programs.
Continuation Representation. Interaction trees use monadic bind >>= to sequence operations and capture continuations. The bytecode layer must encode these dependencies, ensuring that return values from one instruction are available to subsequent instructions. This requires either:
- A stack-based architecture with push/pop operations
- A register-based architecture with explicit register allocation
- Continuation-passing style with explicit closures
Instructions like beginTx return values (e.g., TxId) that are referenced later (e.g., in commitTx txId). The bytecode VM needs a mechanism to bind these intermediate results—either through a stack, named registers, or an environment of local variables.
A complete bytecode specification would extend Opcode with additional instructions that handle control flow and continuations.
Bytecode Decoder (Instr₁)
decode₁ : Bytecode → DecodeResult decode₁ (BC op args) with op | args ... | OP_PURE_CALL | (vName ∷ vArgs ∷ []) with decodeString vName | decodeValList vArgs ... | _ | _ = TypeError "callPure expects (name : String, args : List Val)" ... | just name | just args = SafeOp₁ (pure (callPure name args)) ... | _ | _ = TypeError "invalid callPure arguments" ... | OP_PURE_CALL | _ = TypeError "callPure expects (name : String, args : List Val)" ... | OP_PURE_REGISTER | (vName ∷ []) with decodeString vName ... | _ | _ = TypeError "registerPure expects (name : String)" ... | just name = UnsafeOp₁ (pure (registerPure name (λ _ → nothing))) ... | nothing = TypeError "invalid registerPure name" ... | _ | _ = decode₀ (BC op args)
Bytecode Decoder (Instr₂)
decode₂ : Bytecode → DecodeResult decode₂ (BC op args) with op | args ... | OP_GET_CURR_CTRL | [] = SafeOp₂ (distrib getCurrentController) ... | OP_GET_CURR_CTRL | _ = TypeError "getCurrentController takes no arguments" ... | OP_GET_CTRL | (v ∷ []) with decodeObjId v ... | _ | _ = TypeError "getController expects (objectid)" ... | just oid = SafeOp₂ (distrib (getController oid)) ... | nothing = TypeError "invalid objectid" ... | OP_TELEPORT | (v ∷ []) with decodeControllerId v ... | _ | _ = TypeError "teleport expects (controllerId)" ... | just cid = SafeOp₂ (distrib (teleport cid)) ... | nothing = TypeError "invalid controllerId" ... | OP_TRANSFER_OBJ | (v₁ ∷ v₂ ∷ []) with decodeObjId v₁ | decodeControllerId v₂ ... | _ | _ = TypeError "transferObject expects (objectid, controllerId)" ... | just oid | just cid = SafeOp₂ (distrib (transferObject oid cid)) ... | _ | _ = TypeError "invalid transferObject arguments" ... | OP_GET_CURR_VER | [] = SafeOp₂ (distrib getCurrentVersion) ... | OP_GET_CURR_VER | _ = TypeError "getCurrentVersion takes no arguments" ... | OP_SWITCH_VER | (v ∷ []) with decodeControllerVersion v ... | _ | _ = TypeError "switchVersion expects (version)" ... | just ver = SafeOp₂ (distrib (switchVersion ver)) ... | nothing = TypeError "invalid version" ... | _ | _ = decode₁ (BC op args)