AVM Bytecode Encoding

title: AVM Bytecode Encoding tags: - AVM green paper - bytecode - serialization

view source: Bytecode.lagda.md

Agda Compilation Error

File: AVM/Bytecode.lagda.md

Agda compilation failed: Stdout: Checking AVM.Bytecode (/tmp/agda-mkdocs-fol06ug7/AVM/Bytecode.lagda.md). /tmp/agda-mkdocs-fol06ug7/AVM/Bytecode.lagda.md:25,88-97 (ℕ → TxId) !=< Set when checking that the expression freshTxId has type Set

Original Content

{-# OPTIONS --without-K --type-in-type --guardedness --exact-split #-}
open import Foundation.BasicTypes

Module Parameters

module AVM.Bytecode
    -- Core types
    (Val : Set)
    (ObjectId : Set)
    (freshObjectId : ℕ → ObjectId)

    -- Controller/distribution types
    (ControllerId : Set)
    (ControllerVersion : Set)

    -- Transaction types
    (TxId : Set)
    (freshTxId : ℕ → TxId)
  where

-- Import Objects module to get Object
open import AVM.Objects Val ObjectId freshObjectId ControllerId ControllerVersion TxId freshTxId
  using (Object)

-- Import Instruction module with Object
open import AVM.Instruction Val ObjectId freshObjectId ControllerId ControllerVersion TxId freshTxId Object

Motivation

While the typed Instr₀ representation provides type safety and enables formal reasoning, a practical AVM implementation requires a serializable bytecode format for storage, transmission, and execution. This module defines the bytecode encoding and decoding mechanisms that bridge the gap between the high-level typed instruction set and the low-level executable representation.

Bytecode Representation

The bytecode layer erases type information, representing instructions as opcodes with encoded arguments. Since messages are already represented as Val S-expressions, the bytecode layer directly uses this representation for instruction arguments.

Opcodes

Each instruction in the Instr₀ type is assigned a unique opcode identifier:

data Opcode : Set where
  -- Transactions (Instr₀)
  OP_BEGIN_TX        : Opcode  -- 0x01
  OP_COMMIT_TX       : Opcode  -- 0x02
  OP_ABORT_TX        : Opcode  -- 0x03

  -- Base object ops (Instr₀)
  OP_CREATE_OBJ      : Opcode  -- 0x10
  OP_DESTROY_OBJ     : Opcode  -- 0x11
  OP_CALL            : Opcode  -- 0x20
  OP_SCRY_META       : Opcode  -- 0x30
  OP_SCRY_DEEP       : Opcode  -- 0x31
  OP_REFLECT         : Opcode  -- 0x32
  OP_SELF            : Opcode  -- 0x33
  OP_INPUT           : Opcode  -- 0x34

  -- Pure functions (Instr₁)
  OP_PURE_CALL       : Opcode  -- 0x40
  OP_PURE_REGISTER   : Opcode  -- 0x41

  -- Distribution (Instr₂)
  OP_GET_CURR_CTRL   : Opcode  -- 0x50
  OP_GET_CTRL        : Opcode  -- 0x51
  OP_TELEPORT        : Opcode  -- 0x52
  OP_TRANSFER_OBJ    : Opcode  -- 0x53
  OP_GET_CURR_VER    : Opcode  -- 0x54
  OP_SWITCH_VER      : Opcode  -- 0x55

  -- Control flow (not yet specified)
  -- TODO: branching, jumps, stack/register ops

Bytecode Instructions

A bytecode instruction pairs an opcode with its arguments encoded as S-expressions:

data Bytecode : Set where
  BC : Opcode → List Val → Bytecode

Encoding Typed Instructions

We define an encoding function that transforms typed instructions into untyped bytecode, erasing return type information. Since Input and Output are aliases for Val, messages require no encoding.

First, we postulate encoders for the composite types that appear in instruction arguments:

postulate
  encodeTxId : TxId → Val
  encodeObjId : ObjectId → Val
  encodeMetaFilter : (ObjectMeta → Bool) → Val
  encodeDeepFilter : (Object → ObjectMeta → Bool) → Val
  encodeControllerId : ControllerId → Val
  encodeControllerVersion : ControllerVersion → Val
  encodeString : String → Val
  encodeValList : List Val → Val

The encoding function maps each typed instruction to its bytecode representation:

encode₀ : {S : Safety} {A : Set} → Instr₀ S A → Bytecode
encode₀ (Obj (createObj inp))    = BC OP_CREATE_OBJ (inp ∷ [])
encode₀ (Obj (destroyObj oid))   = BC OP_DESTROY_OBJ (encodeObjId oid ∷ [])
encode₀ (Obj (call oid inp))     = BC OP_CALL (encodeObjId oid ∷ inp ∷ [])
encode₀ (Introspect self)        = BC OP_SELF []
encode₀ (Introspect input)       = BC OP_INPUT []
encode₀ (Introspect (scryMeta f)) = BC OP_SCRY_META (encodeMetaFilter f ∷ [])
encode₀ (Introspect (scryDeep f)) = BC OP_SCRY_DEEP (encodeDeepFilter f ∷ [])
encode₀ (Introspect (reflect oid)) = BC OP_REFLECT (encodeObjId oid ∷ [])

encode₁ : {S : Safety} {A : Set} → Instr₁ S A → Bytecode
encode₁ (Obj instr)             = encode₀ (Obj instr)
encode₁ (Introspect instr)      = encode₀ (Introspect instr)
encode₁ (Tx beginTx)            = BC OP_BEGIN_TX []
encode₁ (Tx (commitTx txid))    = BC OP_COMMIT_TX (encodeTxId txid ∷ [])
encode₁ (Tx (abortTx txid))     = BC OP_ABORT_TX (encodeTxId txid ∷ [])

encode₂ : {S : Safety} {A : Set} → Instr₂ S A → Bytecode
encode₂ (Obj instr)             = encode₀ (Obj instr)
encode₂ (Introspect instr)      = encode₀ (Introspect instr)
encode₂ (Tx instr)              = encode₁ (Tx instr)
encode₂ (Pure (callPure name args))     = BC OP_PURE_CALL (encodeString name ∷ encodeValList args ∷ [])
encode₂ (Pure (registerPure name f))    = BC OP_PURE_REGISTER (encodeString name ∷ [])

Decoding with Runtime Type Checking

Decoding bytecode requires runtime validation since type information has been erased. The decoder must verify that arguments match the expected types for each opcode and restore the safety level information.

Decode Result Type

The decoding result includes the safety level and potential type errors:

data DecodeResult : Set where
  SafeOp₀    : {A : Set} → Instr₀ Safe A → DecodeResult
  UnsafeOp₀  : {A : Set} → Instr₀ Unsafe A → DecodeResult
  SafeOp₁    : {A : Set} → Instr₁ Safe A → DecodeResult
  UnsafeOp₁  : {A : Set} → Instr₁ Unsafe A → DecodeResult
  SafeOp₂    : {A : Set} → Instr₂ Safe A → DecodeResult
  UnsafeOp₂  : {A : Set} → Instr₂ Unsafe A → DecodeResult
  TypeError  : String → DecodeResult

Argument Decoders

We postulate decoders for composite argument types. These return Maybe values to handle decoding failures:

postulate
  decodeTxId : Val → Maybe TxId
  decodeObjId : Val → Maybe ObjectId
  decodeMetaFilter : Val → Maybe (ObjectMeta → Bool)
  decodeDeepFilter : Val → Maybe (Object → ObjectMeta → Bool)
  decodeControllerId : Val → Maybe ControllerId
  decodeControllerVersion : Val → Maybe ControllerVersion
  decodeString : Val → Maybe String
  decodeValList : Val → Maybe (List Val)

Bytecode Decoder (Instr₀)

The decoder validates both the opcode and the argument list structure. For each opcode, it checks that the correct number and type of arguments are provided:

decode₀ : Bytecode → DecodeResult
decode₀ (BC op args) with op | args
... | OP_BEGIN_TX    | [] = SafeOp₁ (Tx beginTx)
... | OP_BEGIN_TX    | (_ ∷ _) = TypeError "beginTx takes no arguments"
... | OP_COMMIT_TX   | [] = TypeError "commitTx requires txid argument"
... | OP_COMMIT_TX   | (v ∷ []) = decodeCommit v
  where
    decodeCommit : Val → DecodeResult
    decodeCommit v with decodeTxId v
    ... | just txid = SafeOp₁ (Tx (commitTx txid))
    ... | nothing   = TypeError "invalid txid"
... | OP_COMMIT_TX   | (_ ∷ _ ∷ _) = TypeError "commitTx takes exactly one argument"
... | OP_ABORT_TX    | [] = TypeError "abortTx requires txid argument"
... | OP_ABORT_TX    | (v ∷ []) = decodeAbort v
  where
    decodeAbort : Val → DecodeResult
    decodeAbort v with decodeTxId v
    ... | just txid = SafeOp₁ (Tx (abortTx txid))
    ... | nothing   = TypeError "invalid txid"
... | OP_ABORT_TX    | (_ ∷ _ ∷ _) = TypeError "abortTx takes exactly one argument"
... | OP_CREATE_OBJ  | [] = TypeError "createObj requires input argument"
... | OP_CREATE_OBJ  | (inp ∷ []) = SafeOp₀ (Obj (createObj inp))
... | OP_CREATE_OBJ  | (_ ∷ _ ∷ _) = TypeError "createObj takes exactly one argument"
... | OP_DESTROY_OBJ | [] = TypeError "destroyObj requires objectid argument"
... | OP_DESTROY_OBJ | (v ∷ []) = decodeDestroy v
  where
    decodeDestroy : Val → DecodeResult
    decodeDestroy v with decodeObjId v
    ... | just oid = SafeOp₀ (Obj (destroyObj oid))
    ... | nothing  = TypeError "invalid objectid"
... | OP_DESTROY_OBJ | (_ ∷ _ ∷ _) = TypeError "destroyObj takes exactly one argument"
... | OP_CALL        | [] = TypeError "call requires objectid and input arguments"
... | OP_CALL        | (_ ∷ []) = TypeError "call requires both objectid and input arguments"
... | OP_CALL        | (v₁ ∷ v₂ ∷ []) = decodeCall v₁ v₂
  where
    decodeCall : Val → Val → DecodeResult
    decodeCall v₁ v₂ with decodeObjId v₁
    ... | just oid = SafeOp₀ (Obj (call oid v₂))
    ... | nothing  = TypeError "invalid objectid in call"
... | OP_CALL        | (_ ∷ _ ∷ _ ∷ _) = TypeError "call takes exactly two arguments"
... | OP_SCRY_META   | [] = TypeError "scryMeta requires filter argument"
... | OP_SCRY_META   | (v ∷ []) = decodeScryMeta v
  where
    decodeScryMeta : Val → DecodeResult
    decodeScryMeta v with decodeMetaFilter v
    ... | just f  = UnsafeOp₀ (Introspect (scryMeta f))
    ... | nothing = TypeError "invalid metadata filter"
... | OP_SCRY_META   | (_ ∷ _ ∷ _) = TypeError "scryMeta takes exactly one argument"
... | OP_SCRY_DEEP   | [] = TypeError "scryDeep requires filter argument"
... | OP_SCRY_DEEP   | (v ∷ []) = decodeScryDeep v
  where
    decodeScryDeep : Val → DecodeResult
    decodeScryDeep v with decodeDeepFilter v
    ... | just f  = UnsafeOp₀ (Introspect (scryDeep f))
    ... | nothing = TypeError "invalid deep filter"
... | OP_SCRY_DEEP   | (_ ∷ _ ∷ _) = TypeError "scryDeep takes exactly one argument"
... | OP_REFLECT     | [] = TypeError "reflect requires objectid argument"
... | OP_REFLECT     | (v ∷ []) = decodeReflect v
  where
    decodeReflect : Val → DecodeResult
    decodeReflect v with decodeObjId v
    ... | just oid = UnsafeOp₀ (Introspect (reflect oid))
    ... | nothing  = TypeError "invalid objectid"
... | OP_REFLECT     | (_ ∷ _ ∷ _) = TypeError "reflect takes exactly one argument"
... | OP_SELF        | [] = SafeOp₀ (Introspect self)
... | OP_SELF        | (_ ∷ _) = TypeError "self takes no arguments"
... | OP_INPUT       | [] = SafeOp₀ (Introspect input)
... | OP_INPUT       | (_ ∷ _) = TypeError "input takes no arguments"
... | _              | _       = TypeError "unsupported opcode in decode₀"

Program Serialization

Finally, we postulate a function that serializes an entire AVM program into a bytecode stream:

postulate
  serializeProgram₀ : {A : Set} → AVMProgram₀ A → List Bytecode
  serializeProgram₁ : {A : Set} → AVMProgram₁ A → List Bytecode
  serializeProgram₂ : {A : Set} → AVMProgram₂ A → List Bytecode

Limitations and Future Work

The bytecode representation targets the Instruction layers as follows:

Instr₀: Fully covered (base + transactions).
Instr₁: callPure and registerPure are represented, but function bodies are not serialized (registration is unsafe and environment-dependent).
Instr₂: Distribution instructions are represented. Semantic guards (e.g., teleport/switchVersion invalid during active transactions) are enforced by the interpreter, not the decoder.

Notable gaps include:

Control Flow Instructions. The conditional branches in programs like kudosTransfer require opcodes for branching, jumping, and labels that are not present in the current Opcode type. Without these, we cannot represent the if-then-else logic that appears throughout AVM programs.

Continuation Representation. Interaction trees use monadic bind >>= to sequence operations and capture continuations. The bytecode layer must encode these dependencies, ensuring that return values from one instruction are available to subsequent instructions. This requires either:

A stack-based architecture with push/pop operations
A register-based architecture with explicit register allocation
Continuation-passing style with explicit closures

Instructions like beginTx return values (e.g., TxId) that are referenced later (e.g., in commitTx txId). The bytecode VM needs a mechanism to bind these intermediate results—either through a stack, named registers, or an environment of local variables.

A complete bytecode specification would extend Opcode with additional instructions that handle control flow and continuations.

Bytecode Decoder (Instr₁)

Bytecode Decoder (Instr₂)