ToyEVM

I wanted to understand the Ethereum Virtual Machine — not at the Solidity level, but at the opcode level. How does a JUMP actually work? Why does storage cost so much gas? What does a control flow graph look like for a while loop compiled to bytecode?

The only way I could answer these questions was to build one myself. These are my notes from that process: an EVM interpreter built from scratch in Rust, opcode by opcode, with zero external dependencies. It is a toy, obviously — but enough of one to run real Solidity contracts and get a fair understanding of what production implementations like revm do under the hood.

The material is organized into seven modules. You are welcome to follow along, work through the exercises, and build your own.

Why I chose to build an EVM

Reading the Yellow Paper is one thing. Implementing it is another. Building the interpreter forced me to confront every detail I would have glossed over: how carries propagate in 256-bit addition, why JUMPDEST validation is a security property, what the 63/64 gas rule actually prevents, and how a basic block analysis reveals structure hidden in flat bytecode.

Along the way I also learned Rust patterns I now use everywhere — newtypes for type safety, repr(u8) enums for jump table dispatch, trait objects for pluggable backends, and why Result<T, E> is the right way to model EVM errors.

What we build

ModuleThemeWhat we implementSolidity milestone
1The Machine Exists256-bit integers, a bounded stack, byte-addressable memory
2The Interpreter LoopOpcode decoding, the fetch-decode-execute loop, gas meteringReturn42 runs
3State and Control FlowPersistent storage (EIP-2929), JUMP/JUMPI, loops in bytecodeStore42, Loop, Add, Counter run
4Contracts and DeploymentCREATE/CREATE2 address derivation, the Host trait, LOG eventsEventEmitter runs
5The Call StackCALL/DELEGATECALL/STATICCALL, the 63/64 gas rule, call frames
6From Bytecode to IRBasic blocks, control flow graphs, stack-height analysis, liveness
7Optimization PassesConstant folding, dead code elimination, speculative pre-execution, interpreter performance

By the end, we have a working EVM interpreter that can execute real Solidity-compiled contracts, plus a compiler backend for bytecode analysis and optimization.

How each module works

Each module has three parts:

  1. Notes — what I learned about the concept, with code snippets pulled from the source files
  2. Exercises — how I tested my understanding, with cargo test commands and hints
  3. Deep Dive — review questions to test and deepen understanding

The exercises use a todo_exercise!() macro — every function signature is already in place, every test is already written. The work is to fill in the implementations until cargo test goes green.

What you need

Comfortable with basic Rust — ownership, traits, enums, pattern matching, Result/Option. If you have completed 100 Exercises to Learn Rust or equivalent, you are ready. No prior Ethereum, blockchain, or compiler knowledge is required.

References

These are the sources I used throughout. You do not need to read them before starting — I introduce concepts as they come up — but they are valuable companions:

Ready? Head to the Getting Started page to set up the environment and begin Module 1.

Getting Started

Tools

ToolRequired?Install
Rust (nightly)Yesrustup.rs
FoundryOptionalgetfoundry.sh

Clone and set up

git clone https://github.com/jonaprieto/toyevm.git
cd toyevm

Start Module 1

The day-1-start branch has the project structure with stub files — all type definitions and method signatures are in place, but every function body is todo_exercise!(). The work is to fill them in.

git checkout day-1-start
git checkout -b my-day-1       # your personal working branch

Project structure

toyevm/
├── Cargo.lock
├── Cargo.toml          ← Rust project manifest (lib + binary)
├── src/
│   ├── lib.rs          ← crate root — declares modules + todo_exercise! macro
│   ├── main.rs         ← binary entry point
│   ├── memory.rs       ← ★ Exercise 1.7–1.8: implement Memory
│   ├── stack.rs        ← ★ Exercise 1.5–1.6: implement Stack
│   └── types/
│       ├── mod.rs      ← re-exports U256
│       └── u256.rs     ← ★ Exercise 1.1–1.4: implement U256 type
└── book/
    ├── book.toml
    └── src/
        ├── SUMMARY.md
        └── introduction.md

The files marked with ★ contain stubs like this:

#![allow(unused)]
fn main() {
pub fn wrapping_add(self, _rhs: Self) -> Self {
    todo_exercise!("Exercise 1.2 — add byte-by-byte from LSB to MSB, propagating carry")
}
}

Each stub has the correct signature — the work is replacing todo_exercise!(...) with a real implementation.

Run the tests

cargo test

Failing tests show hints:

---- types::u256::tests::ex_1_2_simple_add ----
thread panicked at 'Exercise not yet implemented.
  Hint: Exercise 1.2 — add byte-by-byte from LSB to MSB, propagating carry'

test result: FAILED. 4 passed; 33 failed; 0 ignored

Implement until all tests pass.

Compare with the reference

Once tests pass, see how your implementation compares to mine:

git diff day-1 -- src/

This shows the exact differences between your code and the reference solution. You might find a more elegant approach — or discover edge cases you had not considered.

Reading the reference without diffing

To peek at the full reference solution at any point: git show day-1:src/types/u256.rs (shows the file without switching branches)

Advance to the next module

git checkout -b my-day-2 my-day-1    # branch from your completed Module 1

For Modules 2–7, the exercise chapter provides stub code to paste into new files. The pattern is always:

  1. Create the stub files shown in the "Exercise Setup" section
  2. cargo test — new tests fail
  3. Implement until they pass
  4. git diff day-N -- src/ — compare to reference

How the branches work

Solution branches day-1 through day-7 are stacked snapshots — each contains everything from prior modules plus new code. Your personal my-day-N branches grow incrementally as you add modules. Use git diff day-N to compare any module.

Comparing your work — quick reference

ModuleAfter completingCompare with
1all 37 tests passgit diff day-1 -- src/types src/stack.rs src/memory.rs
2all tests passgit diff day-2 -- src/opcode.rs src/gas.rs src/interpreter.rs
3all tests passgit diff day-3 -- src/storage.rs src/context.rs src/interpreter.rs
4all tests passgit diff day-4 -- src/contract.rs src/host.rs
5all tests passgit diff day-5 -- src/call.rs
6all tests passgit diff day-6 -- src/compiler/
7all tests passgit diff day-7 -- src/compiler/fold.rs src/compiler/dce.rs

How each module works

Each module has three parts:

  1. Notes — what I learned about the concept, with Rust code pulled directly from the source files
  2. Exercises — how I tested my understanding, with cargo test commands and hints
  3. Deep Dive — review questions to test and deepen understanding

The workflow I followed:

Read the notes
    ↓
Look at the exercises
    ↓
Open the source file, read the tests
    ↓
Implement (or study the reference)
    ↓
cargo test — verify
    ↓
Work through the Deep Dive questions

Running Solidity examples

The repository includes a Foundry project in examples/ with six Solidity contracts. As you progress through the modules, the interpreter gains enough opcodes to execute them — each is a milestone that proves the EVM actually works.

ContractWhat it doesRunnable after
Return42Returns 42 from a fallback() in assemblyModule 2
Store42Writes 42 to storage, reads it back, returns itModule 3
LoopSums 1..10 using a for loop in assemblyModule 3
Addadd(uint256,uint256) — ABI dispatch + checked arithmeticModule 3
Counterincrement() / get() — storage + function dispatchModule 3
EventEmitterset(uint256) — emits a LOG2 eventModule 4

Integration tests in tests/solidity_integration.rs verify these milestones automatically:

cargo test --test solidity_integration

If you have Foundry installed, you can also inspect the bytecode directly:

cd examples
forge build
forge inspect Return42 deployedBytecode    # raw hex bytecode
forge inspect Add abi                       # function signatures

What is the EVM?

The Ethereum Virtual Machine (EVM) is a deterministic, stack-based virtual machine that executes smart contract bytecode. Every Ethereum node runs the same EVM implementation, and given the same input state and transaction, every node must produce the exact same output. This determinism is what makes decentralized consensus possible.

The mental model

I found it helpful to think of the EVM as a sandboxed calculator with three scratch spaces:

graph LR
    subgraph "EVM Machine State (μ)"
        PC["Program Counter"]
        S["Stack<br/>(1024 × U256)"]
        M["Memory<br/>(byte array, ephemeral)"]
    end
    subgraph "World State (σ)"
        ST["Storage<br/>(U256 → U256, persistent)"]
    end
    BC["Bytecode"] --> PC
    PC --> S
    S --> M
    S --> ST
  1. Stack — a LIFO array of 256-bit values, max depth 1024
  2. Memory — a byte-addressable, expandable byte array, wiped after each call
  3. Storage — a persistent key-value map (256-bit → 256-bit) tied to each contract account

The EVM reads bytecode — a sequence of single-byte opcodes, some followed by immediate data — and executes them one by one. The program counter (PC) advances linearly unless a JUMP or JUMPI instruction redirects it.

Why 256-bit words?

One of the first things I wondered was why Ethereum chose 256-bit (32-byte) words. There are a few reasons:

  • Keccak-256 is the hashing algorithm used throughout Ethereum. Its output is 32 bytes, so stack values can hold a full hash.
  • Addresses are 20 bytes — they fit comfortably in a 32-byte word.
  • Cryptographic security: 256-bit keys provide 128-bit security against brute-force attacks, which is the standard target for modern cryptography.

Rust Pattern: Newtype

The newtype pattern (struct U256([u8; 32])) gives us type safety — you can't accidentally pass a raw byte array where a U256 is expected — plus a namespace for methods and operator overloading.

Yellow Paper notation

The Ethereum Yellow Paper uses Greek letters for the machine state:

SymbolMeaningOur Rust type
(sigma)World state (all accounts)— (Module 3+)
Machine state— (the interpreter struct)
StackStack
MemoryMemory
Gas remainingu64 (Module 2)
Program counterusize (Module 2)

The execution model is a pure function:

(σ, μ, I) → (σ', μ', output)

Where I is the execution environment (caller, value, bytecode, etc.). Given the same inputs, the outputs are always the same. No randomness, no system calls, no I/O.

What we build in Module 1

In Module 1 we implement the three core data structures without writing a single opcode. These are the substrate the interpreter will live in:

ModuleFileWhat it is
U256src/types/u256.rs256-bit unsigned integer
Stacksrc/stack.rsBounded LIFO stack (max 1024)
Memorysrc/memory.rsByte-addressable linear memory

The U256 Type

Every value on the EVM stack is a 256-bit unsigned integer. I represent this as a newtype over [u8; 32] in big-endian byte order (most significant byte first).

Why big-endian?

The EVM specification defines stack values and storage keys in big-endian. When you PUSH32 0x00...01, byte index 0 is the most significant and byte index 31 is the least significant. Matching this convention internally means no byte-swapping is needed when reading from bytecode.

The newtype pattern

Rather than passing around raw [u8; 32] arrays, I wrap them in a struct:

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, Debug, Default, Hash, PartialEq, Eq)]
pub struct U256([u8; 32]);
}

This gives me:

  • Type safety — you can't accidentally pass a random byte array where a U256 is expected
  • Method namespace — I can implement U256::wrapping_add, U256::checked_div, etc.
  • Operator overloadingimpl Add for U256 lets me write a + b naturally

Constants

The three fundamental constants:

#![allow(unused)]
fn main() {
    /// The zero value (additive identity).
    pub const ZERO: Self = Self([0u8; 32]);

    /// The value one (multiplicative identity).
    pub const ONE: Self = {
        let mut bytes = [0u8; 32];
        bytes[31] = 1;
        Self(bytes)
    };

    /// The maximum value: 2^256 - 1.
    pub const MAX: Self = Self([0xFF; 32]);
}

In byte layout (big-endian — byte 0 is the most significant):

U256 byte layout: ZERO, ONE, MAX

Wrapping arithmetic

EVM arithmetic wraps on overflow — MAX + 1 = 0. This is not a bug; it's the specification. The EVM has no concept of overflow traps or panics at the arithmetic level.

Here's a concrete example of how addition works byte by byte:

  a = U256::from(255u64)     → [..., 0x00, 0xFF]
  b = U256::from(1u64)       → [..., 0x00, 0x01]
                                          ------
  byte 31: 0xFF + 0x01 = 0x100 → result[31] = 0x00, carry = 1
  byte 30: 0x00 + 0x00 + 1   = 0x01  → result[30] = 0x01, carry = 0
  ...
  result = U256::from(256u64) → [..., 0x01, 0x00]

And overflow wrapping:

  a = U256::MAX               → [0xFF, 0xFF, ..., 0xFF]
  b = U256::ONE               → [0x00, 0x00, ..., 0x01]
                                                  ------
  Every byte carries: 0xFF + carry → 0x00, carry = 1
  Final carry is discarded.
  result = U256::ZERO          → [0x00, 0x00, ..., 0x00]

The implementation adds from the least significant byte (index 31) toward the most significant (index 0), propagating the carry:

#![allow(unused)]
fn main() {
    /// Wrapping addition: (self + rhs) mod 2^256.
    pub fn wrapping_add(self, rhs: Self) -> Self {
        let mut result = [0u8; 32];
        let mut carry: u16 = 0;
        // Add from least significant byte (index 31) to most significant (index 0).
        for i in (0..32).rev() {
            let sum = self.0[i] as u16 + rhs.0[i] as u16 + carry;
            result[i] = sum as u8;
            carry = sum >> 8;
        }
        // carry is discarded — wrapping semantics.
        Self(result)
    }
}

Rust Pattern: Operator Overloading

By implementing impl std::ops::Add for U256, we can write a + b instead of a.wrapping_add(b). The + operator calls our wrapping_add under the hood. We do the same for Sub, Mul, BitAnd, BitOr, BitXor, and Not.

Division by zero

In most languages, dividing by zero is undefined or panics. In the EVM, . This is explicit in the Yellow Paper. Our checked_div method returns U256::ZERO when the divisor is zero.

Pitfall: EVM Division Semantics

The name checked_div might suggest it returns Option<U256> — but in the EVM, division by zero is defined to return zero, not an error. We name it checked_div to distinguish it from Rust's panicking / operator, but it never fails.

From conversions

I implement From<u64>, From<u128>, and From<[u8; 32]> so we can write:

#![allow(unused)]
fn main() {
let a = U256::from(42u64);        // places 42 in the lowest bytes
let b = U256::from(u128::MAX);    // fills the lower 16 bytes with 0xFF
let c = U256::from([0u8; 32]);    // raw byte array
}

The From<u64> conversion puts the 8-byte big-endian representation into bytes 24..32:

U256::from(0xDEAD_BEEFu64) → [0x00 × 28, 0xDE, 0xAD, 0xBE, 0xEF]
                                          ↑ byte 28

Exercises

  • 1.1 — Verify that U256::ZERO, U256::ONE, and U256::MAX have the correct byte patterns
  • 1.2 — Implement addition with wrapping overflow
  • 1.3 — Implement comparison (<, >, ==)
  • 1.4 — Implement From<u64>, From<u128>, From<[u8; 32]>, and to_u64() round-trip

Run: cargo test types::u256

The Stack

The first thing I needed was a stack — a Last-In-First-Out (LIFO) data structure holding up to 1024 values of type U256. Almost every opcode reads from or writes to the stack.

Why 1024?

The depth limit prevents stack-overflow attacks. Without it, a malicious contract could recurse or push indefinitely, consuming unbounded memory on every node. The limit of 1024 is a protocol constant — exceeding it causes the transaction to revert.

Operations

The stack supports four fundamental operations. Here is how I worked through each one with a concrete example.

push — place a value on top

push adds a value to the top of the stack. If the stack already has 1024 elements, it returns StackError::Overflow.

Before:  [ 10, 20 ]      ← 20 is on top
push(30)
After:   [ 10, 20, 30 ]  ← 30 is now on top

In the EVM, PUSH1 0x2A pushes the byte 0x2A (42) as a U256 onto the stack. Every arithmetic opcode pushes its result.

pop — remove and return the top value

pop removes the topmost value and returns it. If the stack is empty, it returns StackError::Underflow.

Before:  [ 10, 20, 30 ]  ← 30 is on top
v = pop()                 → v = 30
After:   [ 10, 20 ]      ← 20 is now on top

Most opcodes pop their operands. For example, ADD pops two values, adds them, and pushes the result:

Before:  [ 10, 3, 5 ]    ← 5 is on top
ADD: a = pop() → 5
     b = pop() → 3
     push(a + b) → push(8)
After:   [ 10, 8 ]       ← 8 is on top

Pitfall: Pop Order

The EVM pops the top element first. For SUB, this means pop(a), pop(b), push(a - b) — so PUSH 3, PUSH 10, SUB computes , not . The first value pushed is deeper; the second is on top and gets popped first.

swap — exchange the top with a deeper element

SWAP1 through SWAP16 exchange the top element with the element at depth N (1-indexed from the top, not counting the top itself).

Before:  [ 10, 20, 30 ]  ← 30 is on top
SWAP1 (depth=1): swap top with 1 below
After:   [ 10, 30, 20 ]  ← 20 is now on top, 30 moved down

A more complex example with SWAP2:

Before:  [ 10, 20, 30, 40 ]  ← 40 is on top
SWAP2 (depth=2): swap top with 2 below
After:   [ 10, 40, 30, 20 ]  ← 20 on top, 40 moved to position 2

dup — duplicate a stack element

DUP1 through DUP16 copy the element at depth N (1-indexed from the top) and push the copy on top. The original stays in place.

Before:  [ 10, 20, 30 ]  ← 30 is on top
DUP1 (depth=1): copy top element
After:   [ 10, 20, 30, 30 ]  ← copy of 30 on top

DUP2 copies the second element:

Before:  [ 10, 20, 30 ]  ← 30 is on top
DUP2 (depth=2): copy second-from-top
After:   [ 10, 20, 30, 20 ]  ← copy of 20 on top

Rust Pattern: Result for Fallible Operations

Every stack operation returns Result<T, StackError> instead of panicking. This models the EVM's behavior exactly — a stack underflow doesn't crash the VM, it reverts the transaction. Using Result forces the caller (the interpreter) to handle both paths explicitly.

Error types

There are exactly two failure modes:

#![allow(unused)]
fn main() {
/// Errors that can occur during stack operations.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StackError {
    /// Attempted to push onto a full stack (1024 elements).
    Overflow,
    /// Attempted to pop from or peek into an empty stack.
    Underflow,
}
}

Implementation

I use a Vec<U256> internally rather than a fixed [U256; 1024] array:

#![allow(unused)]
fn main() {
pub struct Stack {
    data: Vec<U256>,
}
}

Why Vec? I considered a fixed [U256; 1024] array, but:

  • Vec starts small (pre-allocate 64 slots) and grows as needed
  • A fixed [U256; 1024] array would allocate 32 KB on every stack creation, even for contracts that use 3 stack slots
  • The overflow check is a simple len >= 1024 comparison either way

The push/pop implementation:

#![allow(unused)]
fn main() {
    pub fn push(&mut self, value: U256) -> Result<(), StackError> {
        if self.data.len() >= MAX_STACK_DEPTH {
            return Err(StackError::Overflow);
        }
        self.data.push(value);
        Ok(())
    }

    /// Pop the top value from the stack.
    ///
    /// Returns `StackError::Underflow` if the stack is empty.
    pub fn pop(&mut self) -> Result<U256, StackError> {
        self.data.pop().ok_or(StackError::Underflow)
    }
}

Exercises

  • 1.5 — Push three values, pop them back in LIFO order; pop from empty returns Underflow
  • 1.6 — Push 1024 values successfully; the 1025th push returns Overflow

Run: cargo test stack

Memory

EVM memory is a byte-addressable, dynamically expanding linear array. Unlike storage (which persists between transactions), memory is ephemeral — it exists only for the duration of a single execution context. This was one of the distinctions I had to be careful to internalize early.

Expansion semantics

Memory starts empty. Any read or write beyond the current size causes automatic expansion:

  1. The new size is rounded up to the next 32-byte boundary (word-aligned)
  2. New bytes are initialized to zero
  3. Expansion has a gas cost (covered in Module 2): where is the word count.

The quadratic term makes large memory allocations progressively expensive. This prevents contracts from allocating gigabytes of memory on every node for free.

Key operations

MSTORE — write 32 bytes to memory

MSTORE pops an offset and a value from the stack, then writes the 32-byte big-endian representation of the value at that offset in memory.

Before:  Stack [ ..., 0x2A, 0x00 ]  ← 0x00 (offset) is on top
MSTORE: pop offset (0x00), pop value (0x2A)
        write 32 bytes of 0x2A at memory[0x00..0x20]
After:   Stack [ ... ]
         Memory[0x00..0x20] = 0x000...002A

MLOAD — read 32 bytes from memory

MLOAD pops an offset, reads 32 bytes from memory, and pushes the result as a U256.

Before:  Stack [ ..., 0x00 ]  ← offset on top
MLOAD: pop offset (0x00), read memory[0x00..0x20]
After:   Stack [ ..., 0x2A ]  ← value loaded from memory

MSTORE8 — write a single byte to memory

MSTORE8 pops an offset and a value, then writes only the lowest byte of the value at the offset.

Before:  Stack [ ..., 0xFF, 0x05 ]  ← 0x05 (offset) is on top
MSTORE8: pop offset (0x05), pop value (0xFF)
         write byte 0xFF at memory[0x05]
After:   Stack [ ... ]
         Memory[0x05] = 0xFF

Note that MLOAD and MSTORE always read/write exactly 32 bytes, but the offset can be any byte position — it does not need to be word-aligned. Reading from offset 1 reads bytes [1, 33) — a quirk that caught me off guard at first.

Pitfall: Memory Offset Alignment

A common mistake is assuming MLOAD/MSTORE require 32-byte-aligned offsets. They don't — MLOAD(1) reads bytes [1, 33), overlapping two "words." This matters when reasoning about memory layout.

Memory layout visualization

After MSTORE(0x00, 42) and MSTORE(0x20, 7), memory looks like:

EVM memory layout

Each row is 32 bytes (one word). MSTORE writes a U256 value starting at the given byte offset.

Yellow Paper Map

In the Yellow Paper, memory is . The memory size function is:

Where f is the byte offset, s is the size in bytes, and is the current word count. Our expansion_size method implements this.

Exercises

  • 1.7 — Store and load a U256 value; store individual bytes and verify they appear in the loaded word
  • 1.8 — Reading uninitialized memory returns zero; expansion always rounds to 32 bytes; large offsets expand correctly

Run: cargo test memory

Exercises — Module 1

Exercise Setup

Start from the stub branch:

git checkout day-1-start
git checkout -b my-day-1

The files src/types/u256.rs, src/stack.rs, and src/memory.rs contain stubs with todo_exercise!() bodies and all tests intact.

cargo test    # 33 tests fail with "Exercise not yet implemented"

Implement until all 37 tests pass, then compare:

git diff day-1 -- src/types src/stack.rs src/memory.rs

These exercises lock in the primitives you have just read about: U256, Stack, and Memory. For each exercise you will write or verify a small piece of code, run a focused test suite, and encounter a Rust pattern that you will see repeatedly throughout the rest of this tutorial.

How to work through these: read the exercise, then attempt an implementation before looking at the hint. The goal is productive struggle, not speed.


Exercise 1.1 — U256 Zero and One Constants

What to implement. Verify that U256::ZERO and U256::ONE have the exact bit patterns prescribed by the EVM: 32 zero bytes and 31 zero bytes followed by 0x01, respectively. Add assertions in a unit test inside src/types/u256.rs that inspect the raw byte array returned by to_bytes().

File: src/types/u256.rs

How to verify:

cargo test ex_1_1

The struct and the constants look like this in our codebase:

#![allow(unused)]
fn main() {
#[derive(Clone, Copy, Debug, Default, Hash, PartialEq, Eq)]
pub struct U256([u8; 32]);
}
#![allow(unused)]
fn main() {
    /// The zero value (additive identity).
    pub const ZERO: Self = Self([0u8; 32]);

    /// The value one (multiplicative identity).
    pub const ONE: Self = {
        let mut bytes = [0u8; 32];
        bytes[31] = 1;
        Self(bytes)
    };

    /// The maximum value: 2^256 - 1.
    pub const MAX: Self = Self([0xFF; 32]);
}

Hint. Call .to_bytes() on each constant, then index the array directly — bytes[0] is the most significant byte, bytes[31] is the least significant. For ONE, only bytes[31] should be non-zero.

Rust pattern — const evaluation. U256::ZERO and U256::MAX use const items: the compiler evaluates them at compile time and embeds the result as a read-only literal in the binary. U256::ONE goes further and uses a const block (const { … }) to run a loop that would ordinarily require mut — the compiler allows mutation inside a const context as long as nothing leaks out. This is how you build non-trivial constants without heap allocation or lazy initialisation.


Exercise 1.2 — U256 Wrapping Addition

What to implement. Write two test cases for wrapping_add:

  1. U256::from(2u64) + U256::from(3u64) should equal U256::from(5u64).
  2. U256::MAX + U256::ONE should equal U256::ZERO (overflow wraps to zero).

File: src/types/u256.rs

How to verify:

cargo test ex_1_2

The implementation you are testing:

#![allow(unused)]
fn main() {
    /// Wrapping addition: (self + rhs) mod 2^256.
    pub fn wrapping_add(self, rhs: Self) -> Self {
        let mut result = [0u8; 32];
        let mut carry: u16 = 0;
        // Add from least significant byte (index 31) to most significant (index 0).
        for i in (0..32).rev() {
            let sum = self.0[i] as u16 + rhs.0[i] as u16 + carry;
            result[i] = sum as u8;
            carry = sum >> 8;
        }
        // carry is discarded — wrapping semantics.
        Self(result)
    }
}

Hint. Use U256::from(n as u64) to build small test values. To build U256::MAX directly, use the constant. Inspect the .to_bytes() result if an assertion fails — the 32-byte dump is the clearest debugging aid.

Rust pattern — operator overloading via impl Add. Our U256 does not have a built-in + operator; it is provided by impl std::ops::Add for U256. The compiler rewrites a + b into a.add(b). This means wrapping_add is the single source of truth: the Add impl just delegates to it. When you need wrapping semantics everywhere (as the EVM does), centralising the logic this way eliminates a whole class of accidental checked-addition bugs.


Exercise 1.3 — U256 Comparison

What to implement. Add test cases that verify <, >, and == on U256 values:

  • U256::ZERO < U256::ONE is true.
  • U256::MAX > U256::ONE is true.
  • U256::from(42u64) == U256::from(42u64) is true.
  • U256::from(1u64) != U256::from(2u64) is true.

File: src/types/u256.rs

How to verify:

cargo test ex_1_3

Hint. U256 derives PartialEq and Eq. For ordering it implements PartialOrd and Ord manually, doing a big-endian byte-by-byte lexicographic comparison. Write your tests using standard Rust comparison operators (<, >); you do not need to call any special method.

Rust pattern — deriving vs manual Ord impl. #[derive(PartialEq, Eq)] is enough when the compiler-generated comparison (field-by-field, in declaration order) happens to be correct. For U256, it is correct because the inner [u8; 32] is already big-endian, and Rust compares arrays lexicographically. However, deriving Ord for a newtype only works when the inner type derives Ord. Knowing when to trust #[derive] and when to write a manual impl is a core Rust skill.


Exercise 1.4 — From Conversions

What to implement. Verify the three From impls that convert native integer types into U256:

  • U256::from(0u64) equals U256::ZERO.
  • U256::from(u64::MAX) has its eight least-significant bytes set and all upper bytes zero.
  • U256::from(1u128) equals U256::ONE.
  • U256::from([0xFF_u8; 32]) equals U256::MAX.

Write one test function per conversion.

File: src/types/u256.rs

How to verify:

cargo test ex_1_4

Hint. Use .to_bytes() to inspect the result. For u64::MAX, bytes 24–31 should all be 0xFF and bytes 0–23 should be 0x00. The [u8; 32] conversion is a no-op — it just wraps the array.

Rust pattern — the From/Into idiom. Implementing From<T> for U automatically gives you Into<U> for T for free via a blanket impl in the standard library. This means callers can write either U256::from(42u64) or 42u64.into() — the same underlying code runs. The convention in Rust is: implement From, get Into gratis. Never implement Into directly.


Exercise 1.5 — Stack Push and Pop

What to implement. Write a test that:

  1. Creates a fresh Stack.
  2. Pushes three distinct U256 values (e.g., 1, 2, 3).
  3. Pops them and checks that the order is LIFO (3 first, then 2, then 1).
  4. Verifies that popping an empty stack returns Err(StackError::Underflow).

File: src/stack.rs

How to verify:

cargo test ex_1_5

The push and pop implementations:

#![allow(unused)]
fn main() {
    pub fn push(&mut self, value: U256) -> Result<(), StackError> {
        if self.data.len() >= MAX_STACK_DEPTH {
            return Err(StackError::Overflow);
        }
        self.data.push(value);
        Ok(())
    }

    /// Pop the top value from the stack.
    ///
    /// Returns `StackError::Underflow` if the stack is empty.
    pub fn pop(&mut self) -> Result<U256, StackError> {
        self.data.pop().ok_or(StackError::Underflow)
    }
}

Hint. Store the return values of each pop() call in a let binding and unwrap() them in the test — a panic on unwrap is the most readable failure message. For the underflow check, use assert_eq!(stack.pop(), Err(StackError::Underflow)).

Rust pattern — Result<T, E> for fallible operations. push and pop both return Result. Callers must handle the error path — the compiler will warn about an unused Result. Inside tests you can call .unwrap(), but in the interpreter you must match on the error and propagate it. This forces you to think about every failure mode at the point of use, not after the fact.


Exercise 1.6 — Stack Depth Limit

What to implement. Write a test that pushes exactly 1024 values onto the stack (all successes), then asserts that the 1025th push returns Err(StackError::Overflow).

File: src/stack.rs

How to verify:

cargo test ex_1_6

The Stack struct and error type:

#![allow(unused)]
fn main() {
pub struct Stack {
    data: Vec<U256>,
}
}
#![allow(unused)]
fn main() {
/// Errors that can occur during stack operations.
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum StackError {
    /// Attempted to push onto a full stack (1024 elements).
    Overflow,
    /// Attempted to pop from or peek into an empty stack.
    Underflow,
}
}

Hint. Use a for loop: for i in 0..1024 { stack.push(U256::from(i as u64)).unwrap(); }. After the loop, do not call unwrap() on the 1025th push — match it or use assert_eq! with the expected error variant.

Rust pattern — custom error types. StackError is a plain enum that derives Debug, Clone, PartialEq, and Eq. Deriving PartialEq is what lets you write assert_eq!(result, Err(StackError::Overflow)) — without it, you would need a manual match. The Display impl gives a human-readable message for log output and the Error impl integrates with the broader Rust error ecosystem (Box<dyn Error>, anyhow, etc.).


Exercise 1.7 — Memory Load and Store

What to implement. Write a round-trip test:

  1. Create a Memory.
  2. Store U256::from(0xDEAD_BEEF_u64) at offset 0.
  3. Load a U256 back from offset 0 and assert it equals the stored value.
  4. Repeat at offset 32 (the second word) to confirm non-overlapping writes.

File: src/memory.rs

How to verify:

cargo test ex_1_7

Hint. Look at the store_word and load_word methods in memory.rs. Both take a usize offset. The store writes 32 bytes starting at that offset; the load reads 32 bytes and reconstructs a U256. If the second round-trip fails, check whether both stores triggered separate expansions.

Rust pattern — slice operations with copy_from_slice. copy_from_slice writes a source slice into a mutable destination slice of the same length. If the lengths differ, it panics at runtime. EVM memory operations always deal in 32-byte words, so slicing as &mut data[offset..offset+32] is always safe after expand_to has ensured the backing Vec is large enough. This pattern avoids unsafe pointer arithmetic while still being zero-copy.


Exercise 1.8 — Memory Expansion and Zero Padding

What to implement. Verify the expansion semantics:

  1. Start with an empty Memory (len() is 0).
  2. Write a U256 at offset 64 (the third word).
  3. Assert memory.len() is 96 (three 32-byte words: 0..32, 32..64, 64..96).
  4. Assert that bytes 0..64 are all zero — the two words before the write were implicitly zero-initialised.

File: src/memory.rs

How to verify:

cargo test ex_1_8

The expansion helper:

#![allow(unused)]
fn main() {
    fn expand_to(&mut self, size: usize) {
        if size > self.data.len() {
            // Round up to next 32-byte word boundary.
            let new_size = (size + 31) & !31;
            self.data.resize(new_size, 0);
        }
    }
}

Hint. After the write, call a method that exposes the raw byte slice (or iterate over memory.load_byte(i) for i in 0..64). The important property is that bytes you never wrote are zero — the EVM guarantees this.

Rust pattern — Vec::resize. Vec::resize(new_len, value) extends the vector to new_len elements, filling the new slots with value (here 0u8). If new_len is smaller than the current length it truncates. This single call replaces a manual loop that would push individual zeroes. It is the canonical way to zero-extend a byte buffer in Rust.


Yellow Paper Map

Yellow Paper symbolOur type / fieldNotes
(world state)Not yet modelledIntroduced in Module 3
(machine state)Fields of ExecutionContextCombined in the interpreter
(stack)StackExercises 1.5 and 1.6
(memory)MemoryExercises 1.7 and 1.8
(gas)u64 fieldIntroduced in Module 2
(program counter)usize fieldIntroduced in Module 2
Values on U256Exercises 1.1 – 1.4

All values in the range [0, 2²⁵⁶) correspond to U256. When the Yellow Paper writes "let x ≡ " it means "pop the top of the stack into x". When it writes "" it means "byte i in memory". These direct mappings are why we model the types so literally — the code reads like the spec.

Deep Dive — Module 1

Review questions for Module 1. Try answering before expanding.


Q: Why does Ethereum use 256-bit words instead of 64-bit?

Answer

Keccak-256 produces 32-byte hashes, and Ethereum uses Keccak pervasively — for storage keys, address derivation, and state root computation. A 256-bit word can hold a full hash without truncation. Additionally, 256-bit keys provide 128-bit security against brute-force preimage attacks, which is the standard cryptographic target.


Q: What happens when the EVM stack overflows?

Answer

If a contract pushes the 1025th element, the EVM triggers an exceptional halt — not a REVERT. All state changes are discarded and all remaining gas is consumed. This is a hard protocol limit (Yellow Paper §9.4.2), not a configurable parameter. Note the difference from REVERT, which refunds unused gas.


Q: Is EVM memory persistent between transactions?

Answer

No. Memory () is ephemeral — it is allocated fresh for each execution context and destroyed when execution completes. Storage () is the persistent key-value map that survives between transactions. Confusing the two is a common mistake in technical discussions.


Q: What does it mean for EVM execution to be deterministic?

Answer

Given the same world state and the same transaction T, every EVM implementation must produce the exact same output state , return data, gas usage, and logs. There is no randomness, no access to wall-clock time (TIMESTAMP comes from the block header, which is agreed upon by consensus), and no system calls. This determinism is what enables thousands of independent nodes to reach consensus on the state of a single chain.


Q: Why did you implement U256 as a newtype over [u8; 32] instead of using a library?

Answer

Pedagogical choice. In production (e.g., revm), you'd use a crate like ruint or alloy-primitives::U256 backed by [u64; 4] limbs for performance. But implementing from raw bytes teaches you exactly how big-endian layout works, how carries propagate in addition, and why wrapping is the correct semantics. You also learn the Rust newtype pattern, operator overloading via impl Add, and From/Into conversion idioms — all of which show up in real codebase work.

Bytecode and the Execution Loop

Today we bring the EVM to life. By the end of this chapter, I want our interpreter to execute arithmetic bytecode, track gas consumption, and return results — a complete fetch-decode-execute loop.

How bytecode works

EVM bytecode is a flat array of bytes. Each byte is either:

  • An opcode (a single-byte instruction), or
  • An immediate byte following a PUSH instruction

The program counter ( in the Yellow Paper) starts at 0 and advances linearly. When the interpreter encounters PUSH3, it needs to read the opcode byte, then consume the next 3 bytes as the immediate value, advancing the PC by 4 total (1 + 3).

Example bytecode: 60 03 60 05 01 00

Offset  Byte  Meaning
0x00    0x60  PUSH1
0x01    0x03    immediate: 3
0x02    0x60  PUSH1
0x03    0x05    immediate: 5
0x04    0x01  ADD
0x05    0x00  STOP

This pushes 3, pushes 5, adds them (result: 8 on stack), and stops.

The fetch-decode-execute loop

flowchart TD
    A["Fetch byte at PC"] --> B["Decode opcode"]
    B --> C{"Gas enough?"}
    C -- No --> OOG["OutOfGas error"]
    C -- Yes --> D["Deduct gas"]
    D --> E["Execute opcode"]
    E --> F{"STOP / RETURN?"}
    F -- No --> G["Advance PC"]
    G --> A
    F -- Yes --> H["Return result"]

I found that our interpreter follows the classic pattern:

loop {
    byte = bytecode[pc]
    opcode = decode(byte)         // Opcode::from_byte
    gas_remaining -= cost(opcode) // static gas deduction
    execute(opcode)               // match on opcode enum
}

The loop terminates when:

  • STOP is reached (return empty, success)
  • RETURN is reached (return memory slice, success)
  • REVERT is reached (return memory slice, reverted)
  • The PC goes past the end of bytecode (implicit STOP)
  • An error occurs (out of gas, invalid opcode, stack error)

Dispatch: match vs. HashMap

I went with a Rust match on a repr(u8) enum rather than a HashMap<u8, fn()>. Why?

  • The compiler generates a jump table — a single indexed memory access, O(1)
  • No heap allocation, no hashing overhead
  • The match is exhaustive — the compiler ensures every opcode variant is handled
  • In production EVMs like revm, dispatch is the hottest code path; every nanosecond matters

Beyond a simple enum: OpCodeInfo

Our Opcode enum maps each byte to a name, plus an immediate_size() method. A production EVM would attach more metadata per opcode — something like:

FieldTypePurpose
name&strDisassembly output
immediate_sizeu8Bytes following the opcode (0 for all but PUSH)
min_stacku8Minimum stack depth to execute
stack_increaseu8Net stack growth after execution
static_gasu32Base gas cost
dynamic_gasboolWhether runtime gas is also needed
indexu8N for PUSHN/DUPN/SWAPN/LOGN, 0 otherwise

This OpCodeInfo struct lets the interpreter validate stack depth and deduct gas before dispatching — no need to check inside each handler. We keep it simple here, but this is the direction revm and evmone take.

PUSH1 through PUSH32

The PUSH family is special: each opcode is followed by 1-32 immediate bytes that form the value to push. I noticed the bytes are read in big-endian order and right-aligned into a 32-byte U256.

All PUSH opcodes cost the same gas (3, the "very low" tier). The immediate bytes are not separate instructions — they're data embedded in the bytecode stream.

BYTE, SHL, SHR, and SAR — Bit-level access

These four opcodes sit in the 0x1A0x1D range, right after the logical operators (AND, OR, XOR, NOT). They all cost 3 gas ("very low" tier) and are pure stack-to-stack operations with no side effects. Despite being easy to implement, they show up constantly in real compiled code — understanding them early pays off.

Why these matter

Solidity's function dispatch relies on SHR. Every external call begins with the ABI selector: the first 4 bytes of calldata. The compiler emits CALLDATALOAD to read 32 bytes, then SHR 224 to right-shift away the lower 28 bytes, isolating the 4-byte selector. If you don't implement SHR, you can't run even the simplest Solidity contract with more than one function.

SHL appears in ABI encoding, address masking (AND with SHL 160 minus one), and packing multiple values into a single storage slot. BYTE is used when contracts inspect individual bytes of a word — for example, iterating over an address byte-by-byte in a checksum routine. SAR (arithmetic right shift) preserves the sign bit, which matters for signed integer division: x / 2 on a negative int256 compiles to SAR 1 rather than SHR 1.

BYTE — extract a single byte

BYTE pops an index (top) and a word (second), then pushes the -th byte of counting from the most significant end. Indices 0–31 are valid; any pushes zero.

PUSH32 0xFF00...00   # byte 0 is 0xFF, bytes 1-31 are 0x00
PUSH1  0x00          # index 0
BYTE                 # → 0xFF

The Yellow Paper defines this as for , and otherwise. Notice the big-endian convention — byte 0 is the most significant, matching Ethereum's standard byte ordering.

SHL — shift left

SHL pops a shift amount (top) and a value (second), then pushes . Bits shifted past position 255 are discarded (the result is mod ). A shift always yields zero.

PUSH1  0x01          # value: 1
PUSH1  0x08          # shift: 8 bits
SHL                  # → 0x100 (256)

SHR — logical shift right

SHR pops a shift amount (top) and a value (second), then pushes . Vacated high bits are filled with zeros. A shift always yields zero.

PUSH2  0x0100        # value: 256
PUSH1  0x08          # shift: 8 bits
SHR                  # → 0x01 (1)

The selector-extraction pattern we mentioned looks like this in bytecode:

PUSH1 0x00
CALLDATALOAD         # load 32 bytes from calldata offset 0
PUSH1 0xE0           # 224 in decimal
SHR                  # isolate the top 4 bytes → function selector

SAR — arithmetic shift right

SAR works like SHR but preserves the sign bit. If the most significant bit of the value is 1 (a negative two's-complement number), the vacated high bits are filled with ones instead of zeros.

# -2 in two's complement is 0xFFFF...FFFE
PUSH32 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
PUSH1  0x01          # shift: 1 bit
SAR                  # → 0xFFFF...FFFF (which is -1)

With SHR, the same shift would produce 0x7FFF...FFFF — a large positive number, not . This is why the Solidity compiler picks SAR for signed division by powers of two.

EIP-145: Bitwise Shifting

SHL, SHR, and SAR were added in the Constantinople upgrade (2019) via EIP-145. Before that, shifting required MUL/DIV by powers of two and EXP — far more expensive. The EIP benchmarked a 10x speedup for common bit operations. Every post-Constantinople EVM must support them.

Gas: The EVM's Resource Model

Gas is the EVM's mechanism for bounding computation. Every opcode has a cost, and every transaction starts with a finite gas budget. What I found surprising is how abruptly execution halts when gas runs out — there's no graceful winding down.

Why gas exists

Without gas, a contract could loop forever, and every Ethereum node would be stuck executing it. I think of gas as a pragmatic solution to the halting problem: you can run any computation you want, as long as you can pay for it.

Static vs. dynamic costs

Static costs are fixed per opcode:

TierCostExamples
Zero0STOP, JUMPDEST
Base2PUSH0, PC, MSIZE, GAS, POP
Very Low3ADD, SUB, LT, GT, EQ, AND, OR, NOT, BYTE, SHL, SHR, SAR, PUSH1-32, DUP, SWAP
Low5MUL, DIV, MOD, SIGNEXTEND
Mid8ADDMOD, MULMOD, JUMP
High10JUMPI

Dynamic costs depend on operand values:

  • Memory expansion: grows quadratically —
  • EXP: base cost + 50 per byte in the exponent
  • SLOAD/SSTORE: cold vs. warm access (Module 3)

The memory expansion formula

At small sizes, this is essentially linear (3 gas per 32-byte word). But the quadratic term kicks in at scale — I plotted a few values to feel it:

WordsBytesGas cost
1323
1032030
1003,200319
100032,0004,953
10000320,000225,312

Large memory allocations end up prohibitively expensive — and that's entirely by design.

Out-of-gas vs. REVERT

Both halt execution and discard state changes, but I had to work through how they differ in gas refunds:

Out-of-gasREVERT
State changesDiscardedDiscarded
Remaining gasAll consumedRefunded to caller
Return dataEmptyCan include error message

This distinction matters for smart contract design: I now appreciate why a well-written contract should REVERT with an error message rather than running out of gas silently.

Rust Pattern: Result<T, E>

We model execution outcomes as Result<ExecutionResult, EvmError>. A successful REVERT returns Ok(ExecutionResult { reverted: true, .. }) — it's not an Err, because the interpreter completed successfully; it's the contract that chose to revert.

Exercises

  • 2.1 — Decode known opcodes and reject unknown bytes
  • 2.2 — Verify PUSH immediate sizes
  • 2.3 — Execute ADD and verify the result
  • 2.4 — Verify exact gas accounting for a sequence of opcodes
  • 2.5 — MUL, SUB, DIV correctness
  • 2.6 — Division by zero returns 0
  • 2.7 — LT, GT, EQ produce 0 or 1
  • 2.8 — BYTE extracts individual bytes from a word
  • 2.9 — SHL and SHR shift operations
  • 2.10 — SAR preserves the sign bit
  • 2.11 — Out-of-gas detection
  • 2.12 — STOP returns empty output
  • 2.13 — RETURN copies a memory range to output

Run: cargo test opcode and cargo test interpreter

Exercises — Module 2

Exercise Setup

Create your working branch:

git checkout -b my-day-2 my-day-1    # branch from your completed Module 1

Create stub files — copy the type definitions and method signatures from the reference, replacing every function body with todo_exercise!():

  • src/opcode.rs — the Opcode enum with from_byte() and immediate_size()
  • src/gas.rs — gas cost constants and memory_gas_cost() / memory_expansion_cost()
  • src/interpreter.rs — the Interpreter struct and run() method

Add pub mod opcode; pub mod gas; pub mod interpreter; to src/lib.rs.

You can peek at the reference signatures with:

git show day-2:src/opcode.rs | head -160

Then implement until all tests pass and compare:

git diff day-2 -- src/opcode.rs src/gas.rs src/interpreter.rs

Module 2 introduced the fetch-decode-execute loop: how raw bytes become Opcode variants, how each opcode manipulates the stack, and how gas is consumed. These exercises walk the full pipeline from bytecode bytes to execution results.

Running the test suites. Two test filters cover most of Module 2:

cargo test opcode
cargo test interpreter

Run them together with cargo test opcode -- && cargo test interpreter -- to see the full picture.


Exercise 2.1 — Opcode Decoding

What to implement. Write tests that call Opcode::try_from(byte) (or the equivalent from_byte constructor) for several known opcodes and verify:

  • 0x00Ok(Opcode::STOP)
  • 0x01Ok(Opcode::ADD)
  • 0x60Ok(Opcode::PUSH1)
  • An unknown byte such as 0xEFErr(…) (invalid opcode)

File: src/opcode.rs

How to verify:

cargo test ex_2_1

Hint. The Opcode enum is #[repr(u8)], so every variant is stored as its byte value. TryFrom<u8> uses an exhaustive match (or a transmute guarded by a validity check) to convert. Check what from_byte actually returns — it may be Option<Opcode> or Result<Opcode, InvalidOpcode>.

Rust pattern — repr(u8) enum + TryFrom. Marking an enum #[repr(u8)] tells the compiler to store each variant as a single byte with the assigned discriminant. This lets you cast between the enum and u8 in both directions — safely via TryFrom, or unsafely via transmute. The TryFrom impl is the safe boundary: it checks that the byte is a valid discriminant before constructing the variant. Never use raw transmute from untrusted bytes; always go through TryFrom.


Exercise 2.2 — PUSH Immediate Sizes

What to implement. Write tests that call an Opcode method (e.g., immediate_size() or push_size()) and verify:

  • Opcode::PUSH1.immediate_size()1
  • Opcode::PUSH2.immediate_size()2
  • Opcode::PUSH32.immediate_size()32
  • Opcode::ADD.immediate_size()0 (no immediate bytes)

File: src/opcode.rs

How to verify:

cargo test ex_2_2

Hint. PUSH opcodes run from PUSH1 (0x60) through PUSH32 (0x7F). The immediate size is (opcode_byte - 0x5F). Look for the method in opcode.rs that computes this; if it does not exist, add it.

Rust pattern — enum methods. An enum is just a type in Rust; you can hang impl blocks on it exactly as you would on a struct. Adding fn immediate_size(&self) -> usize to Opcode keeps the size logic co-located with the enum definition. The alternative — a freestanding fn push_size(op: Opcode) in the interpreter — scatters knowledge about opcodes across multiple files and makes refactoring harder.


Exercise 2.3 — ADD Execution

What to implement. Assemble a minimal bytecode sequence by hand and run it through the interpreter:

PUSH1 0x03   # push 3
PUSH1 0x05   # push 5
ADD          # pop both, push 8
STOP

Assert that after execution the top of the stack is U256::from(8u64).

File: src/interpreter.rs (add a test at the bottom)

How to verify:

cargo test ex_2_3

Hint. Build the bytecode as vec![0x60, 0x03, 0x60, 0x05, 0x01, 0x00]. Construct an ExecutionContext (or whatever the interpreter entry point is) with this bytecode and a generous gas limit, then call run() or execute(). Inspect the stack after execution.

Rust pattern — the fetch-decode-execute loop. Every iteration of the main interpreter loop does three things: (1) read the byte at pc, (2) decode it into an Opcode, (3) execute the opcode's semantics. This separation is deliberate — fetch knows nothing about semantics, and the execute step does not touch pc directly. Keeping these responsibilities distinct makes each step testable in isolation and the whole loop easier to reason about.


Exercise 2.4 — Gas Accounting

What to implement. Using the same bytecode as Exercise 2.3, verify the exact gas consumed:

  • PUSH1 costs 3 gas (×2 = 6).
  • ADD costs 3 gas.
  • STOP costs 0 gas.
  • Total: 9 gas.

Assert that result.gas_used (or gas_limit - remaining_gas) equals 9.

File: src/interpreter.rs

How to verify:

cargo test ex_2_4

Hint. Look at what the interpreter returns — it is probably a struct with fields for the final stack state, remaining gas, output, and a status code. Use gas_limit - gas_remaining if there is no gas_used field directly.

Rust pattern — resource tracking. Gas is a u64 counter that the interpreter decrements before every opcode execution. Doing the subtraction before the operation (not after) is important: if the gas runs out mid-execution, the operation should not have happened. This "debit first, act second" pattern is the same pattern used for memory allocation, network quotas, and rate limiters in production Rust code.


Exercise 2.5 — MUL, SUB, DIV

What to implement. Write one test each for MUL, SUB, and DIV:

  • 6 * 7 should leave 42 on the stack.
  • 10 - 3 should leave 7.
  • 20 / 4 should leave 5.

Use PUSH1 + the opcode + STOP for each test.

File: src/interpreter.rs

How to verify:

cargo test ex_2_5

Hint. Remember stack order: PUSH1 a, PUSH1 b, SUB computes b - a (the second item pushed minus the first — i.e., top-of-stack is subtracted from the item below). Check the Yellow Paper's stack diagram for each opcode to confirm operand order.

Rust pattern — exhaustive match. The interpreter's dispatch is a match opcode { … } with one arm per opcode. If you add a new Opcode variant, the compiler will refuse to compile until you add a corresponding arm — there is no silent default that drops the case. This exhaustiveness check is one of the most powerful safety properties of Rust enums. The alternative (if/else if chains or a HashMap<u8, fn>) would not give you this guarantee.


Exercise 2.6 — Division by Zero

What to implement. Verify the EVM's specified behaviour: n / 0 = 0 for any n. Write a test that executes PUSH1 0x05, PUSH1 0x00, DIV, STOP and asserts the top of the stack is U256::ZERO.

File: src/interpreter.rs

How to verify:

cargo test ex_2_6

Hint. The EVM specification (Yellow Paper §9.4.2) explicitly defines x ÷ 0 ≡ 0. Your U256::checked_div or the interpreter's DIV arm must handle this. If the test panics instead of returning zero, find the division implementation and add a zero-divisor guard.

Rust pattern — domain-specific semantics vs panics. Rust's built-in / operator panics on integer division by zero in both debug and release builds — unlike C, Rust never produces undefined behaviour for integer arithmetic. The EVM intentionally diverges from Rust: division by zero is not an error, it is a well-defined operation returning zero. This is a case where the correct Rust idiom is not to use the standard operator — instead, the interpreter handles the zero case explicitly, returning U256::ZERO before calling any division primitive.


Exercise 2.7 — LT, GT, EQ

What to implement. Verify that comparison opcodes produce 0 or 1:

  • PUSH1 5, PUSH1 3, LT → top of stack is 1 (3 < 5, since 3 is on top).
  • PUSH1 3, PUSH1 5, LT → top of stack is 0 (5 is not < 3, since 5 is on top).
  • PUSH1 7, PUSH1 7, EQ → top of stack is 1.

Write at least one test for each of LT, GT, and EQ.

File: src/interpreter.rs

How to verify:

cargo test ex_2_7

Hint. Watch the operand order carefully. On the EVM, LT pops a (top), then b (second), and pushes 1 if a < b. Double-check which item is "top" in your test setup.

Rust pattern — boolean-to-integer conversion. In Rust, bool as u64 yields 1u64 for true and 0u64 for false. The interpreter can use this to convert a Rust boolean comparison directly into a U256: U256::from((a < b) as u64). This is more idiomatic than an if/else that manually returns U256::ONE or U256::ZERO, and the compiler optimises it to a single conditional move (cmov) with no branch.


Exercise 2.8 — BYTE

What to implement. Write tests that verify BYTE extracts individual bytes from a 256-bit word using big-endian indexing:

  • Store 0xFF00...00 (byte 0 is 0xFF, rest are zero). BYTE with index 00xFF.
  • Same word, BYTE with index 10x00.
  • BYTE with index 32 (out of range) → 0x00.

File: src/interpreter.rs

How to verify:

cargo test ex_2_8

Hint. Remember BYTE uses big-endian convention: index 0 is the most significant byte (leftmost in hex), not the least significant. Build a word with PUSH32 where only one byte is non-zero, then verify BYTE at that position returns the expected value.

Rust pattern — byte extraction from U256. The U256 type's .to_bytes() returns a [u8; 32] in big-endian order. BYTE(i, x) is equivalent to x.to_bytes()[i] when i < 32, which makes the implementation a straightforward index into the byte array. For out-of-range indices, return zero — no need for error handling.


Exercise 2.9 — SHL and SHR

What to implement. Write tests for the shift opcodes:

  • SHL: PUSH1 1, PUSH1 8, SHL0x100 (1 shifted left by 8 = 256).
  • SHR: PUSH2 0x0100, PUSH1 8, SHR0x01 (256 shifted right by 8 = 1).
  • Shifting by 256 or more yields zero: PUSH1 1, PUSH2 0x0100, SHL0.

File: src/interpreter.rs

How to verify:

cargo test ex_2_9

Hint. Watch the stack order: the shift amount is on top, the value is below. SHL pops shift (top) then value (second) and pushes value << shift. For shifts ≥ 256, the result must be zero — not a panic. Use U256::checked_shl or guard with a range check.

Rust pattern — guarded arithmetic. Rust's primitive shift operators panic when the shift amount exceeds the bit width. For U256, shifting by 256+ is well-defined in the EVM (result is zero), so the interpreter must guard the operation: check the shift amount first, and only call the shift primitive when it's in range. This is the same "check then act" pattern used for division by zero in Exercise 2.6.


Exercise 2.10 — SAR (Arithmetic Shift Right)

What to implement. Verify that SAR preserves the sign bit:

  • Positive value: SAR on 0x80 (128) by 1 → 0x40 (64). Same as SHR.
  • Negative value (two's complement): SAR on U256::MAX (all bits set, i.e. ) by 1 → U256::MAX (still , because the sign bit propagates).
  • SAR on 0xFFFFF...FFFE () by 1 → 0xFFFFF...FFFF ().

File: src/interpreter.rs

How to verify:

cargo test ex_2_10

Hint. The sign bit is bit 255 (the most significant). If it's set, vacated bits after the shift must be filled with ones, not zeros. One approach: compute SHR, then OR in a mask of ones for the vacated positions when the sign bit is set. Another: treat the top bit separately and build the result from there.

Rust pattern — two's complement awareness. Rust's i128 and i64 types have built-in arithmetic right shift (>>), but U256 is unsigned — it has no notion of sign. The EVM's SAR interprets the 256-bit value as a signed two's complement integer, shifts, then stores the result back as unsigned. This is a common pattern in low-level code: the same bit pattern has different meanings depending on the operation applied to it.


Exercise 2.11 — Out of Gas

What to implement. Run the ADD sequence from Exercise 2.3 with gas_limit = 1. The sequence needs 9 gas; with only 1 available it should fail. Assert that execution returns an Err (or a halted result) indicating out-of-gas, and that the output is empty.

File: src/interpreter.rs

How to verify:

cargo test ex_2_11

Hint. Look at the ExecutionResult or equivalent type that run() returns. There should be a variant or field for OutOfGas. If execution returns Ok even when gas is exhausted, the gas check is missing from the dispatch loop.

Rust pattern — early return with Result. The gas check inside the interpreter loop is a guard clause:

#![allow(unused)]
fn main() {
if self.gas < cost {
    return Err(ExecutionError::OutOfGas);
}
self.gas -= cost;
}

Returning early keeps the happy path unindented and makes the error condition visible at the top of the block. This "guard clause" style is idiomatic Rust and avoids deep nesting that obscures the main logic.


Exercise 2.12 — STOP

What to implement. Execute a single-byte program [0x00] (just STOP) with a generous gas limit. Assert:

  • Execution succeeds (no error).
  • The output bytes are empty.
  • Zero gas was consumed.

File: src/interpreter.rs

How to verify:

cargo test ex_2_12

Hint. STOP does not pop anything, does not push anything, and costs 0 gas. Its only effect is to halt execution normally. Check the output field of the result and verify it is an empty Vec<u8> or slice.

Rust pattern — unit-like result variants. STOP is the simplest possible opcode: it exists only to terminate. In the interpreter's match, the STOP arm does nothing except set a "halted" flag or break out of the loop. In the result type, ExecutionStatus::Success with empty output is the unit-like success case — analogous to Ok(()) in the standard library. Having a dedicated variant for "halted normally" is cleaner than encoding it as a zero-length return.


Exercise 2.13 — RETURN

What to implement. Assemble a program that:

  1. Uses MSTORE to write a value into memory at offset 0.
  2. Executes PUSH1 0x20, PUSH1 0x00, RETURN to return the 32-byte word.
  3. Asserts the returned bytes match the stored value.

File: src/interpreter.rs

How to verify:

cargo test ex_2_13

Hint. RETURN takes two stack arguments: offset and length. It copies length bytes from memory starting at offset into the output buffer and halts. With offset 0 and length 32, the output should be exactly the 32 bytes you wrote with MSTORE.

Rust pattern — memory slicing. The RETURN implementation does something like output.extend_from_slice(&memory[offset..offset+length]). The key invariant is that expand_to(offset + length) must be called before the slice, or the index will panic. In safe Rust all slice bounds are checked at runtime, so a missing expansion becomes a panic rather than a buffer overread — a much friendlier failure mode than in C.


Milestone: Return42

After completing Module 2, your interpreter can execute the Return42 Solidity contract — a real compiled contract that stores 42 in memory and returns it. Run cargo test --test solidity_integration return42 to verify.

Yellow Paper Map

Yellow Paper symbolMeaningOur representation
Gas remainingu64 field in the execution context
Program counterusize field, incremented by the fetch step
Stack items consumed by opcodeEncoded in opcode documentation
Stack items produced by opcodeEncoded in opcode documentation
Current opcode wordOpcode enum variant

Section 9.4 of the Yellow Paper defines the state transition for each opcode as a tuple — the new gas, program counter, stack, and memory after execution. Our interpreter loop computes exactly this tuple on each iteration. When you read an opcode's formal specification, map each primed symbol (′) to the field you update in the Rust implementation.

The gas cost function C from §9.4.1 decomposes into a base cost (fixed per opcode) plus a dynamic component for memory expansion. Module 2 covers the base costs; the memory expansion formula appears in Module 3.

Deep Dive — Module 2

Review questions for Module 2. Try answering before expanding.


Q: Walk me through exactly what happens when the EVM executes an ADD instruction.

Answer

The interpreter reads byte 0x01 at the current PC and decodes it as the ADD opcode. It needs to deduct the static gas cost (3 gas, "very low" tier) from the remaining gas — if insufficient gas remains, execution halts with an OutOfGas error. Then it pops two values from the stack; if the stack has fewer than 2 elements, it halts with a StackUnderflow error. It computes a + b with wrapping arithmetic (mod 2^256), pushes the result onto the stack, and advances the PC by 1.


Q: What is the gas cost of PUSH1 vs. PUSH32?

Answer

Both cost 3 gas. The immediate bytes (1 byte for PUSH1, 32 bytes for PUSH32) are data, not separate instructions — they're consumed by advancing the PC past them, but don't incur additional gas charges. I found this trips people up in technical discussions; it's easy to assume longer immediates cost more.


Q: How does out-of-gas differ from a REVERT?

Answer

Both discard all state changes made during the current call frame. The critical difference I kept coming back to is gas handling: out-of-gas consumes all remaining gas (the caller loses everything), while REVERT refunds unused gas to the caller. REVERT can also include return data (typically an ABI-encoded error message), whereas out-of-gas returns nothing.


Q: Why do we use a match on a repr(u8) enum rather than a HashMap for opcode dispatch?

Answer

Performance and correctness. The Rust compiler compiles the match into a jump table — a single indexed memory load, O(1) with zero allocation. A HashMap would require hashing, bucket lookup, and heap-allocated function pointers. When I thought about production EVMs where the dispatch loop executes billions of times, a few nanoseconds per dispatch really does matter. The exhaustiveness of match also appealed to me — the compiler verifies every opcode variant is handled, preventing "missing case" bugs that a HashMap would silently ignore.


Q: What happens when execution reaches the end of bytecode without encountering STOP?

Answer

This is treated as an implicit STOP — execution halts normally with empty return data. I was glad to find the EVM specification defines this explicitly. It's equivalent to the bytecode having a trailing 0x00 byte, and the gas consumed up to that point is still charged.

Storage and EIP-2929

Storage is the EVM's persistent memory. Unlike memory (), which is wiped after each call, storage () persists between transactions. I found it helpful to think of it as a contract's own key-value database — a mapping from 256-bit keys to 256-bit values, scoped per contract address.

SLOAD and SSTORE

SLOAD — read a value from storage

SLOAD pops a key from the stack and pushes the corresponding value from storage. I noticed that uninitialized slots return zero — no need to set a default.

Before:  Stack [ ..., 0x01 ]       ← key on top
         Storage { 0x01: 42 }
SLOAD: pop key (0x01), read storage[0x01]
After:   Stack [ ..., 42 ]         ← value from storage

SSTORE — write a value to storage

SSTORE pops a key and a value, then writes the value to storage at that key. Writing zero clears the slot — which is how Solidity "deletes" a mapping entry.

Before:  Stack [ ..., 42, 0x01 ]   ← key on top, value below
         Storage { }
SSTORE: pop key (0x01), pop value (42)
        write storage[0x01] = 42
After:   Stack [ ... ]
         Storage { 0x01: 42 }

Cold vs. Warm Access (EIP-2929)

Before the Berlin upgrade (EIP-2929), SLOAD cost 800 gas (since the Istanbul upgrade / EIP-1884; earlier it was 200, and originally 50). I was surprised to learn this was a security problem — an attacker could read many storage slots cheaply, while every node had to perform expensive disk I/O for each one.

EIP-2929 fixed this by introducing the accessed storage keys set:

Access typeSLOAD costDescription
ColdFirst access to a slot in this transaction
WarmSlot already accessed earlier in this transaction

The first time we touch a slot, the node must fetch it from disk (cold). After that, it's in memory (warm). The gas costs reflect this hardware reality.

EIP-2929: Breaking Change

EIP-2929 raised cold SLOAD costs ~2.6x (800 → 2100). Combined with the earlier EIP-1884 increase (200 → 800), contracts that hardcoded gas assumptions broke on mainnet. Always use the warm/cold model.

#![allow(unused)]
fn main() {
pub fn sload_gas(&self, key: &U256) -> u64 {
    if self.warm_slots.contains(key) {
        100   // WARM_STORAGE_READ_COST
    } else {
        2100  // COLD_SLOAD_COST
    }
}
}

SSTORE gas (simplified)

SSTORE has the most complex gas calculation I encountered in the EVM. Here's the simplified model we implement:

TransitionBase cost
Zero → non-zero20,000 gas (new slot allocation)
Non-zero → non-zero2,900 gas (modification)
+ cold access surcharge+2,100 gas

The full model (EIP-3529) also includes gas refunds for clearing storage, but I left that out here to keep things readable.

Exercises

  • 3.3 — Store a value, load it back, verify round-trip
  • 3.4 — Verify cold SLOAD costs 2100 and warm SLOAD costs 100

Run: cargo test storage and cargo test ex_3_3 and cargo test ex_3_4

Control Flow: JUMP, JUMPI, and JUMPDEST

The EVM has no structured control flow — no if, no for, no function calls at the bytecode level. I found this surprisingly elegant: everything compiles down to two jump instructions and a marker opcode.

JUMP — unconditional jump

JUMP pops a destination from the stack and sets the program counter to that offset. The destination must be a valid JUMPDEST — I'll explain why that matters below.

Before:  Stack [ ..., 0x0A ]     ← destination on top
         PC = 5
JUMP: pop destination (0x0A)
After:   Stack [ ... ]
         PC = 0x0A               ← execution continues at offset 10

JUMPI — conditional jump

JUMPI pops a destination and a condition. If the condition is nonzero, it jumps; otherwise, execution falls through to the next instruction. This is how all of Solidity's if statements reach us at the bytecode level.

Before:  Stack [ ..., 1, 0x0A ]  ← destination on top, condition below
         PC = 5
JUMPI: pop destination (0x0A), pop condition (1)
       condition ≠ 0 → jump
After:   Stack [ ... ]
         PC = 0x0A               ← branch taken
Before:  Stack [ ..., 0, 0x0A ]  ← destination on top, condition = 0
         PC = 5
JUMPI: pop destination (0x0A), pop condition (0)
       condition = 0 → fall through
After:   Stack [ ... ]
         PC = 8                  ← next instruction (5 + 3 bytes for JUMPI args)

JUMPDEST — jump target marker

JUMPDEST is a no-op that costs 1 gas. Its only purpose is to mark a valid landing site for JUMP and JUMPI. I learned this the hard way: jumping to any byte that is not a JUMPDEST causes an InvalidJump error.

JUMPDEST validation

This is a critical security property. We can only jump to a byte that:

  1. Is the JUMPDEST opcode (0x5B), and
  2. Is not part of a PUSH immediate sequence

Without rule 2, an attacker could craft a PUSH32 with 0x5B embedded in its 32 immediate bytes, then jump into the middle of that PUSH — executing arbitrary code that the compiler never intended. That was one of those "oh no" moments when I read the spec.

We precompute valid JUMPDEST positions at interpreter startup:

#![allow(unused)]
fn main() {
fn compute_jumpdests(bytecode: &[u8]) -> Vec<bool> {
    let mut valid = vec![false; bytecode.len()];
    let mut i = 0;
    while i < bytecode.len() {
        if bytecode[i] == 0x5B { valid[i] = true; }
        // Skip PUSH immediate bytes
        if let Some(op) = Opcode::from_byte(bytecode[i]) {
            i += 1 + op.immediate_size();
        } else {
            i += 1;
        }
    }
    valid
}
}

A loop in bytecode

DUP and SWAP Families

DUP1..DUP16 duplicate the Nth stack element (counting from the top, 1-indexed) and push the copy. SWAP1..SWAP16 swap the top element with the (N+1)th element. Both cost 3 gas ("very low" tier).

Here's how a Solidity while (i < 5) { i++; } compiles to bytecode — I traced through this manually before running it:

 0: PUSH1 0        ; i = 0
 2: JUMPDEST       ; loop_start
 3: PUSH1 5        ; push limit
 5: DUP2           ; copy i to top
 6: LT             ; i < 5?
 7: PUSH1 18       ; loop_body address
 9: JUMPI          ; jump if true
10: PUSH1 0        ; offset for MSTORE
12: MSTORE         ; mem[0] = i
13: PUSH1 32       ; return length
15: PUSH1 0        ; return offset
17: RETURN         ; exit: return i
18: JUMPDEST       ; loop_body
19: PUSH1 1        ; push 1
21: ADD            ; i++
22: PUSH1 2        ; loop_start address
24: JUMP           ; back to top

This is Exercise 3.10 — the "wow moment" where a loop actually runs in our interpreter.

Context opcodes

At this point our interpreter has access to the full execution context ( in the Yellow Paper):

OpcodeReturnsYellow Paper
CALLERAddress of the immediate caller
ORIGINAddress of the original EOA sender
CALLVALUEWei sent with this call
CALLDATALOAD32 bytes of calldata at offset
CALLDATASIZELength of calldata
ADDRESSAddress of the executing contract

The distinction between CALLER and ORIGIN tripped me up at first. CALLER changes with each nested call, but ORIGIN always points to the EOA that initiated the transaction — a subtle but important difference for security.

Exercises

  • 3.5 — Jump to a non-JUMPDEST byte returns InvalidJump
  • 3.6 — JUMP to a valid JUMPDEST works correctly
  • 3.7 — JUMPI with nonzero condition takes the branch
  • 3.8 — JUMPI with zero condition falls through
  • 3.9 — CALLDATALOAD reads 32 bytes from calldata
  • 3.10 — A complete loop in bytecode runs and returns the correct value

Run: cargo test ex_3_5 through cargo test ex_3_10

Exercises — Module 3

Exercise Setup

git checkout -b my-day-3 my-day-2

Create stub files with todo_exercise!() bodies:

  • src/storage.rsStorage struct with load(), store(), sload_gas()
  • src/context.rsAddress, ExecutionContext, ContextBuilder

Add pub mod storage; pub mod context; to src/lib.rs.

Extend src/interpreter.rs with new opcode match arms (SLOAD, SSTORE, CALLDATALOAD, etc.) — use todo_exercise!() in each arm initially.

Peek at signatures: git show day-3:src/storage.rs | head -100

Compare when done: git diff day-3 -- src/storage.rs src/context.rs src/interpreter.rs

Module 3 covers persistent storage, memory-expansion gas, control flow, and the execution environment passed into a call. These exercises are longer than Module 1 and Module 2: several require assembling multi-opcode bytecode sequences by hand. Work through them in order — each builds on the previous one.

Running the test suite:

cargo test storage
cargo test interpreter

Exercise 3.1 — MLOAD / MSTORE Round-Trip via the Interpreter

What to implement. Write an integration test that executes:

PUSH32 0xDEADBEEF_CAFEBABE_12345678_9ABCDEF0_DEADBEEF_CAFEBABE_12345678_9ABCDEF0
PUSH1  0x00      # offset = 0
MSTORE           # store word to memory[0..32]
PUSH1  0x00      # offset = 0
MLOAD            # load word from memory[0..32]
STOP

Assert that after execution the top of the stack equals the value you stored.

File: src/interpreter.rs

How to verify:

cargo test ex_3_1

Hint. PUSH32 is opcode 0x7F followed by 32 immediate bytes. Build your test bytecode as a Vec<u8> starting with [0x7F], then append the 32 bytes of your test value, then continue with the remaining opcodes. Use U256::to_bytes() to get the raw bytes of your test value.

Rust pattern — integration testing. A unit test checks one function in isolation; an integration test exercises a full pipeline. This test is an integration test: it touches the bytecode decoder, the opcode dispatcher, memory expansion, and the stack — all in one run. In Rust, integration tests live in the tests/ directory or in #[cfg(test)] blocks inside lib.rs, but here it is fine to keep them close to the code they test. The distinguishing feature is that the test does not mock anything — it runs the real interpreter on real bytecode.


Exercise 3.2 — Memory Expansion Gas

What to implement. Run two executions that each do a single MSTORE but at different offsets, and compare the gas consumed:

  1. MSTORE at offset 0 — expands memory from 0 to 32 bytes.
  2. MSTORE at offset 1024 — expands memory from 0 to 1056 bytes (34 words).

Assert that the second execution consumes more gas than the first, and that the difference matches the memory expansion formula: .

File: src/interpreter.rs

How to verify:

cargo test ex_3_2

Hint. Compute the expected cost by hand:

  • 1 word: gas.
  • 33 words: gas. The memory expansion cost for a single MSTORE is new_cost - old_cost (where old_cost is 0 for an empty memory). Add the base cost of MSTORE (3 gas) and the PUSH1/PUSH32 costs to get the total.

Rust pattern — property-based reasoning. Instead of just checking one magic number, this exercise asks you to verify a relationship between inputs and outputs: larger offset → higher gas. This is the key idea behind property-based testing (libraries like proptest or quickcheck): rather than enumerating specific cases, you describe invariants that must hold for any input. Even when writing ordinary unit tests, thinking in terms of properties ("more memory always costs more gas") produces more robust tests than thinking in terms of specific values.


Exercise 3.3 — SSTORE / SLOAD Round-Trip

What to implement. Write a test that executes:

PUSH32 <value>
PUSH32 <key>
SSTORE         # storage[key] = value
PUSH32 <key>
SLOAD          # push storage[key]
STOP

Assert the top of the stack equals <value>.

File: src/interpreter.rs

How to verify:

cargo test ex_3_3

Hint. Use a simple key like U256::from(1u64) and a recognisable value like U256::from(0xCAFE_BABEu64). SSTORE pops the key first (top of stack), then the value (second item), so push the value before the key.

Rust pattern — HashMap as a storage model. Storage wraps a HashMap<U256, U256>. This is a faithful model of the EVM specification, which defines account storage as a mapping from 256-bit keys to 256-bit values ( in Yellow Paper notation). HashMap gives O(1) average-case reads and writes, just like a trie lookup in a real client. The difference is that our HashMap lives only for the duration of a test; a production EVM persists it to a Merkle Patricia Trie on disk.


Exercise 3.4 — Cold vs Warm SLOAD Cost

What to implement. Execute two consecutive SLOADs of the same storage slot within one execution context:

  • The first access is "cold" and should cost 2100 gas (EIP-2929).
  • The second access is "warm" and should cost 100 gas.

Assert the gas consumed after each SLOAD using gas_limit - remaining_gas.

File: src/interpreter.rs

How to verify:

cargo test ex_3_4

The gas function in storage:

#![allow(unused)]
fn main() {
    /// Get the gas cost for an SLOAD operation.
    pub fn sload_gas(&self, key: &U256) -> u64 {
        if self.warm_slots.contains(key) {
            WARM_STORAGE_READ_COST
        } else {
            COLD_SLOAD_COST
        }
    }
}

Hint. Structure your bytecode as: PUSH32 <key>, SLOAD, PUSH32 <key>, SLOAD, STOP. Snapshot gas_remaining after each SLOAD to compute the per-instruction cost. You may need to extend the ExecutionResult type to expose intermediate gas readings, or use a sufficiently large gap between gas_limit checkpoints.

Rust pattern — HashSet for tracking state. The Storage struct holds a HashSet<U256> called warm_slots. On the first SLOAD for a given key, contains returns false (cold access, 2100 gas) and insert adds the key. On the second access, contains returns true (warm access, 100 gas). HashSet gives O(1) contains/insert and models the "accessed addresses" set from EIP-2929 precisely. The set is reset between transactions — in our tests, creating a new Storage simulates a new transaction.


Exercise 3.5 — JUMPDEST Validation

What to implement. Write two tests:

  1. A program with a valid JUMPDEST at offset 3: PUSH1 0x03, JUMP, JUMPDEST, STOP — should succeed.
  2. A program that tries to jump to offset 1 (the immediate byte of PUSH1, not an actual JUMPDEST) — should return an error (InvalidJumpDest or similar).

File: src/interpreter.rs

How to verify:

cargo test ex_3_5

Hint. The interpreter pre-scans bytecode to build a Vec<bool> (or HashSet<usize>) of valid jump destinations before execution starts. Only bytes with opcode 0x5B (JUMPDEST) that are not inside a PUSH immediate are valid targets. Jumping into the middle of a PUSH32 immediate must fail even though the byte value there might happen to be 0x5B.

Rust pattern — precomputation with Vec<bool>. Scanning once at load time and building a jump-validity table is O(n) in bytecode size, while checking validity at runtime during every JUMP is also O(1) per check. The alternative — re-scanning from the start on each JUMP — would be O(n) per jump and would make loops quadratic. A Vec<bool> indexed by program counter is the standard approach; bitvec is used in production clients for tighter memory usage.


Exercise 3.6 — JUMP (Unconditional)

What to implement. Assemble a program that uses JUMP to skip over a PUSH1 0xFF instruction:

offset 0:  PUSH1 0x05    # destination
offset 2:  JUMP          # jump to offset 5
offset 3:  PUSH1 0xFF    # skipped — never executes
offset 5:  JUMPDEST      # landing pad
offset 6:  STOP

Assert execution succeeds and that U256::from(0xFFu64) is not on the stack (proving the skipped instruction never ran).

File: src/interpreter.rs

How to verify:

cargo test ex_3_6

Hint. Build the bytecode as [0x60, 0x05, 0x56, 0x60, 0xFF, 0x5B, 0x00]. After execution the stack should be empty (JUMP consumed the destination value, the skipped PUSH never ran, and STOP produces nothing).

Rust pattern — program counter manipulation. After a JUMP, the interpreter sets pc = destination instead of pc += 1 (the normal increment). In a safe Rust implementation this is just self.pc = destination as usize; — no unsafe needed, because pc is a plain usize field. The key invariant is that the jump destination is validated before pc is updated, so an invalid jump never leaves pc pointing into the middle of an immediate.


Exercise 3.7 — JUMPI True Branch

What to implement. Assemble a program where JUMPI takes the branch because the condition is non-zero:

PUSH1 0x01     # condition = 1 (non-zero → take branch)
PUSH1 0x07     # destination
JUMPI
PUSH1 0xFF     # skipped
JUMPDEST       # offset must be correct — count bytes carefully
STOP

Assert execution succeeds and 0xFF is not on the stack.

File: src/interpreter.rs

How to verify:

cargo test ex_3_7

Hint. Count offsets carefully: PUSH1 0x01 occupies offsets 0–1, PUSH1 destination occupies offsets 2–3, JUMPI is at offset 4, PUSH1 0xFF is at offsets 5–6, and JUMPDEST is at offset 7. So the destination should be 7.

Rust pattern — conditional control flow. JUMPI is the EVM's only conditional branch. In the interpreter, it is implemented as: if condition != U256::ZERO { self.pc = dest } else { self.pc += 1 }. This maps directly to an if expression in Rust with no mutable temporaries. The pattern of "check a condition, update a single field, continue the loop" is the fundamental building block of all higher-level control flow in EVM bytecode (loops, if/else, function dispatch).


Exercise 3.8 — JUMPI False Branch (Fall-Through)

What to implement. Repeat the previous exercise but with a zero condition:

PUSH1 0x00     # condition = 0 (zero → fall through)
PUSH1 0x07     # destination (not taken)
JUMPI
PUSH1 0x42     # this DOES execute now
JUMPDEST
STOP

Assert execution succeeds and U256::from(0x42u64) is on the top of the stack.

File: src/interpreter.rs

How to verify:

cargo test ex_3_8

Hint. The bytecode is identical to Exercise 3.7 except the condition byte changes from 0x01 to 0x00. Reuse your offset calculation from Exercise 3.7. When JUMPI falls through, pc advances normally to the next instruction.

Rust pattern — testing both branches. A JUMPI has two observable behaviours: take the branch, or fall through. A test suite that only covers one branch gives false confidence. The Rust community has a strong culture of pairing positive tests (the happy path) with negative tests (the error or alternative path). For every conditional in your interpreter, ask: "Do I have a test where the condition is true and a test where it is false?" This discipline catches off-by-one errors in condition evaluation that a single test would miss.


Exercise 3.9 — CALLDATALOAD

What to implement. Construct an execution context with non-empty calldata — for example, the 32-byte value U256::from(0xBEEFu64) encoded as big-endian bytes. Execute:

PUSH1 0x00     # byte offset into calldata
CALLDATALOAD   # push calldata[0..32] as U256
STOP

Assert the top of the stack equals U256::from(0xBEEFu64).

File: src/interpreter.rs

How to verify:

cargo test ex_3_9

Hint. The execution environment struct (often called CallContext, Environment, or similar) has a calldata: Vec<u8> field. Populate it with U256::from(0xBEEFu64).to_bytes().to_vec() before running the interpreter. If calldata is shorter than 32 bytes past the offset, the remaining bytes are zero-padded.

Rust pattern — builder pattern for context. Constructing an execution context with many optional fields benefits from the builder pattern: a CallContextBuilder struct with setter methods, each returning Self, terminated by a .build() call. This avoids positional argument confusion (ExecutionContext::new(bytecode, calldata, caller, value, …)) where you might swap two arguments of the same type. Even if the codebase does not use a formal builder today, recognising the pattern prepares you to refactor toward it as the struct grows.


Exercise 3.10 — Loop in Bytecode (Capstone)

What to implement. Assemble a bytecode program that computes the sum 1 + 2 + 3 + 4 + 5 = 15 using a loop. The program should:

  1. Initialise a counter i = 1 and an accumulator sum = 0 in memory or on the stack.
  2. Loop: add i to sum, increment i, break when i > 5.
  3. Leave sum (= 15) on the top of the stack at STOP.

This is the hardest exercise in the book so far. Take it one step at a time: write out the intended bytecode in a comment, then encode the hex bytes.

File: src/interpreter.rs

How to verify:

cargo test ex_3_10

Hint. A minimal loop skeleton in EVM bytecode:

; sum = 0 (already on stack from PUSH1 0x00)
; i = 1
PUSH1 0x01     ; i
PUSH1 0x00     ; sum (under i — push sum first so it stays deep)
<loop_top: JUMPDEST>
; stack: [sum, i]  (i on top)
DUP1           ; [sum, i, i]
PUSH1 0x06     ; limit + 1
GT             ; i > 5?  → [sum, i, flag]
PUSH1 <exit>
JUMPI          ; if flag != 0, jump to exit
; still looping: sum += i
SWAP1          ; [i, sum]
DUP2           ; [i, sum, i]
ADD            ; [i, sum+i]
SWAP1          ; [sum+i, i]
; increment i
PUSH1 0x01
ADD            ; [sum+i, i+1]
PUSH1 <loop_top>
JUMP
<exit: JUMPDEST>
; stack: [sum_final, i_final] — pop i
POP
STOP

Encode this by hand, counting offsets carefully each time you add or remove an instruction.

Rust pattern — bytecode as data (assembling by hand). Writing bytecode as a Vec<u8> literal (vec![0x60, 0x01, …]) makes you the assembler. This is uncomfortable at first but teaches you two things: (1) exactly what the interpreter sees — no magic, no abstraction; and (2) why assemblers and compilers exist. Production EVM tooling (solc, Vyper, Huff) automates offset calculation and label resolution. Once you have done it by hand once, you will never again take label resolution for granted.


Milestone: Real Solidity Contracts

After completing Module 3, your interpreter can execute four real Solidity contracts: Store42 (storage), Loop (control flow), and Add (ABI dispatch with function selectors, calldata decoding, and checked arithmetic). Run cargo test --test solidity_integration to verify all of them.

Yellow Paper Map

Yellow Paper symbolMeaningOur representation
Execution environment (a tuple)Execution context struct
Address of the account being executedaddress: Address field
Sender (caller) addresscaller: Address field
Origin of the transactionorigin: Address field
Input data (calldata)calldata: Vec<u8> field
Value transferred with the callvalue: U256 field
Bytecode being executedbytecode: Vec<u8> field
Current block headerblock: BlockContext struct
Call depthdepth: u32 field

Section 4.1 of the Yellow Paper defines the execution environment as a nine-tuple. Each element has a direct counterpart in our context struct. Module 3 fills in (calldata) and connects (account storage) to our Storage type. Future modules will add (value) and (block information) for the full CALL and environment opcodes.

The formula for memory expansion gas referenced in Exercise 3.2 is:

where is the number of 32-byte memory words active and . This quadratic term makes very large memory expansions prohibitively expensive, bounding the memory a contract can consume within a single transaction.

Deep Dive — Module 3

Review questions for Module 3. Try answering before expanding.


Q: Why can't you jump to an arbitrary byte offset in EVM bytecode?

Answer

JUMPDEST validation prevents landing inside a PUSH immediate sequence. Without it, an attacker could encode a PUSH32 containing 0x5B (JUMPDEST) in its data bytes, then jump to that embedded byte — executing unintended code. The validator scans the bytecode linearly, skipping PUSH immediates, and marks only true JUMPDEST opcodes as valid targets.


Q: What is the gas cost difference between a cold and warm SLOAD?

Answer

Cold SLOAD costs 2100 gas, warm costs 100 gas. This was introduced in EIP-2929 (Berlin upgrade). The rationale is that the first access to a storage slot requires disk I/O, while subsequent accesses hit an in-memory cache. The 21x cost difference reflects the performance gap between disk and memory.


Q: What is the difference between CALLER and ORIGIN?

Answer

CALLER () is the address that directly invoked the current execution context — it changes with each nested CALL. ORIGIN () is the externally-owned account (EOA) that initiated the transaction — it never changes regardless of call depth. Using ORIGIN for authorization is a security anti-pattern because a malicious contract could trick a user into signing a transaction that calls through it, making ORIGIN appear as the user's address.


Q: How does memory expansion gas work?

Answer

The total memory cost up to words 32-byte words is: . The gas actually charged on each expansion is the delta: . The linear term (3 gas/word) makes small allocations cheap. The quadratic term makes large allocations progressively expensive — 320 KB (10,000 words) costs ~225,312 gas total, making it impractical to allocate unbounded memory. This prevents DoS attacks where contracts allocate gigabytes on every node.


Q: Why does the EVM use JUMP/JUMPI instead of structured control flow?

Answer

The EVM was designed as a minimal, low-level execution target — closer to assembly than to a high-level language. Structured control flow (if/else, for, while) is handled by the compiler (Solidity → bytecode). The EVM provides just the primitive jumps, which are simpler to implement correctly in consensus-critical code and easier to gas-meter. This design also makes the bytecode verifiable — you can statically analyze JUMPDEST positions without executing the code.

Contracts, Deployment, and the Host Trait

Today I crossed the boundary from a single-contract interpreter to a system that can create contracts, emit logs, and interact with external state.

CREATE and CREATE2

When a contract creates another contract, the EVM needs to determine the new contract's address. Here is what I found:

CREATE:

The address depends on the sender and their nonce (a counter that increments with each CREATE). The address is predictable only if you know the current nonce.

CREATE2:

CREATE2 (EIP-1014) adds a salt parameter, making the address deterministic — you can compute it before deployment. This enables:

  • Factory patterns: deploy contracts at known addresses
  • Counterfactual instantiation: reference a contract before it exists
  • Account Abstraction: pre-compute wallet addresses

Initcode vs. Runtime Code

When deploying a contract, you don't send the runtime bytecode directly. You send initcode — a program that executes once and whose RETURN value becomes the deployed code.

Transaction: { to: null, data: initcode }
  → EVM executes initcode
  → initcode calls RETURN(offset, size)
  → bytes at memory[offset..offset+size] become the deployed code

This is why Solidity constructors can run arbitrary logic — they are part of the initcode.

The Host Trait

The interpreter needs to access external state: other contracts' code, account balances, etc. I used the Host trait to decouple the interpreter from state management:

#![allow(unused)]
fn main() {
pub trait Host {
    fn code(&self, address: &Address) -> Vec<u8>;
    fn code_size(&self, address: &Address) -> usize;
    fn balance(&self, address: &Address) -> U256;
    fn emit_log(&mut self, log: LogEntry);
    fn logs(&self) -> &[LogEntry];
}
}

For testing, I implemented MockHost with HashMaps. In production, a real Host would read from a Merkle trie.

Rust Pattern: Trait Objects

The Host trait enables inversion of control. The interpreter calls host.code(addr) without knowing whether it's backed by a HashMap (testing) or a Merkle trie (production). This is the same pattern revm uses.

LOG Opcodes

LOG0 through LOG4 — emit event logs

LOGn pops an offset and size from the stack, then pops n topic values. It reads size bytes from memory starting at offset and creates a log entry.

Before:  Stack [ ..., topic1, 32, 0x00 ]  ← offset on top, size below, then topics
         Memory[0x00..0x20] = <event data>
LOG1: pop offset (0x00), pop size (32), pop topic (topic1)
      emit log { topics: [topic1], data: memory[0x00..0x20] }
After:   Stack [ ... ]

Gas cost:

OpcodeTopicsGas (32 bytes data)
LOG00
LOG11
LOG22
LOG33
LOG44

Logs are not accessible from within the EVM — they are write-only. Off-chain clients read them via the JSON-RPC eth_getLogs API. I found this asymmetry easy to overlook at first.

Exercises

  • 4.1 — CREATE address derivation is deterministic and varies with sender/nonce
  • 4.2 — CREATE2 address derivation varies with salt and initcode
  • 4.3 — MockHost compiles and implements all Host methods
  • 4.4 — EXTCODESIZE returns correct values via Host
  • 4.5 — LOG0 gas cost matches the formula
  • 4.6 — LOG2 emits correct topics

Run: cargo test contract and cargo test host

Exercises — Module 4

Exercise Setup

git checkout -b my-day-4 my-day-3

Create stub files:

  • src/contract.rscreate_address(), create2_address(), RLP helpers
  • src/host.rsHost trait, LogEntry, MockHost

Add pub mod contract; pub mod host; to src/lib.rs.

Peek at signatures: git show day-4:src/host.rs | head -50

Compare when done: git diff day-4 -- src/contract.rs src/host.rs

These exercises cover contract creation address derivation, the Host trait, and LOG opcode gas accounting.

Host Trait (reference)

#![allow(unused)]
fn main() {
/// The Host trait abstracts external state access for the interpreter.
pub trait Host {
    /// Get the code deployed at an address. Returns empty for EOAs.
    fn code(&self, address: &Address) -> Vec<u8>;

    /// Get the code size at an address.
    fn code_size(&self, address: &Address) -> usize {
        self.code(address).len()
    }

    /// Get the balance of an address (in wei).
    fn balance(&self, address: &Address) -> U256;

    /// Emit a log entry.
    fn emit_log(&mut self, log: LogEntry);

    /// Get all emitted logs (for testing).
    fn logs(&self) -> &[LogEntry];
}
}

Exercise 4.1 — CREATE address derivation

What to do. Write a test that calls your create_address helper with two different sender/nonce pairs and asserts the outputs differ, then calls it twice with the same pair and asserts the outputs are equal.

How to verify.

cargo test ex_4_1

Hint. The Yellow Paper formula is address = keccak256(rlp([sender, nonce]))[12:]. Use the Address newtype so the compiler catches raw-byte mistakes at the type boundary.

Rust pattern — newtype for Address. Wrap [u8; 20] in a tuple struct (pub struct Address([u8; 20])). This prevents accidentally passing a [u8; 20] hash where an address is expected.


Exercise 4.2 — CREATE2 address derivation

What to do. Write a test that calls create2_address with two different salts and asserts the results differ, then two different initcodes and asserts the results differ, then a fixed (sender, salt, initcode) triple twice and asserts equality.

How to verify.

cargo test ex_4_2

Hint. The preimage is exactly 0xff ++ sender ++ salt ++ keccak256(initcode). Build it as a Vec<u8> by concatenating byte slices; the length is always 1 + 20 + 32 + 32 = 85.

Rust pattern — preimage construction. Use Vec::with_capacity(85) and extend_from_slice to assemble multi-part byte strings without heap reallocations.


Exercise 4.3 — MockHost compiles and implements Host

What to do. Confirm that MockHost satisfies all five methods of the Host trait. Deploy code at an address, check code_size, read a balance, and verify an EOA has zero code size and zero balance.

How to verify.

cargo test ex_4_3

Hint. The default method code_size is already provided by the trait — you get it for free once code is implemented. Use this in your test to confirm default methods are inherited.

Rust pattern — trait objects and inversion of control. Store the host as Box<dyn Host> in integration tests. Swapping MockHost for a real host requires no changes to the interpreter code.


Exercise 4.4 — EXTCODESIZE returns correct values

What to do. Deploy 100 bytes of code at address 0x42. Assert code_size returns 100. Then call code_size on Address::ZERO (an EOA) and assert it returns 0.

How to verify.

cargo test ex_4_4

Hint. The default code_size implementation delegates to self.code(address).len(). You only need to implement code — the rest follows.

Rust pattern — default trait methods. Providing code_size as a default method avoids duplicating logic in every implementor, while still allowing overrides for performance-critical cases (e.g., reading only the length from a trie node).


Exercise 4.5 — LOG0 gas cost matches the formula

What to do. Write a pure arithmetic test that verifies for two inputs: 0 bytes (cost = 375) and 32 bytes (cost = 631). Do not execute any bytecode — test the formula directly.

How to verify.

cargo test ex_4_5

Hint. Yellow Paper Section 9.2 defines , , . For LOG0 there are no topics, so the topic term is zero.

Rust pattern — formula verification in tests. Small arithmetic unit tests pin protocol constants early and catch regressions if a constant is refactored.


Exercise 4.6 — LOG2 topics have correct structure

What to do. Construct a LogEntry with two topics (U256::from(0xAA) and U256::from(0xBB)) and 10 bytes of data. Emit it through MockHost::emit_log, then assert logs()[0].topics.len() == 2 and both topic values are correct.

How to verify.

cargo test ex_4_6

Hint. Topics are stored as Vec<U256> — index them with [0] and [1] directly.

Rust pattern — Vec<T> for variable-length fields. The number of topics is 0–4 per opcode (LOG0–LOG4). Using Vec<U256> instead of [Option<U256>; 4] keeps the representation honest: absent topics are simply absent, not None.


Yellow Paper Map

SymbolMeaningSection
TTransaction§4
ΞContract creation function§7
G_log / G_logdata / G_logtopicLOG gas schedule§9
L_A(s, n)CREATE address derivation§7.1
CREATE2 preimage0xff ++ s ++ ζ ++ keccak(i)§7.1

Run all Module 4 tests: cargo test contract and cargo test host

Deep Dive — Module 4

Review questions for Module 4. Try answering before expanding.


Q: How is a CREATE2 address computed, and why is it useful?

Answer

address = keccak256(0xff ++ sender ++ salt ++ keccak256(initcode))[12:]. The 0xff prefix prevents collisions with CREATE addresses. The salt is a user-chosen 32-byte value. This makes the address deterministic — you can compute it before deploying. Use cases: factory patterns (e.g., Uniswap pair creation), counterfactual instantiation (reference a contract before it's deployed), and Account Abstraction (pre-compute wallet addresses for gasless onboarding).


Q: What is the Host trait pattern and why is it important?

Answer

The Host trait separates EVM execution semantics from state management. The interpreter calls host.code(addr) to read other contracts' code, host.balance(addr) to check balances, and host.emit_log() to write events. In tests, a MockHost uses HashMaps. In production, a real Host reads from disk-backed Merkle tries. This inversion of control makes the interpreter testable, portable, and backend-agnostic — the same pattern revm uses.


Q: What is the gas cost of emitting a LOG4 with 100 bytes of data?

Answer

gas. The base cost is 375 (G_log), each byte of data costs 8 gas (G_logdata), and each topic costs 375 gas (G_logtopic). See evm.codes — LOG4 for the interactive breakdown.


Q: What is the difference between CODECOPY and EXTCODECOPY?

Answer

CODECOPY copies the current contract's own bytecode into memory. EXTCODECOPY copies another account's bytecode. Both are used in proxy patterns — a proxy reads the implementation contract's code to understand its ABI. CODECOPY is also used in initcode to copy the runtime bytecode into memory before returning it.


Q: Why does contract creation use initcode instead of deploying runtime code directly?

Answer

Initcode allows constructors — code that runs once at deployment to initialize storage, validate parameters, or compute runtime code dynamically. The separation also enables the proxy pattern: initcode can deploy minimal proxy bytecode that delegates to an implementation contract. Without initcode, every contract would need to be fully static at deployment time.

The Call Stack

When one contract calls another, the EVM creates a new call frame — a fresh execution environment with its own stack, memory, and program counter. The original frame is suspended until the sub-call returns. Working through this was where things started clicking for me.

sequenceDiagram
    participant A as Contract A
    participant EVM as EVM
    participant B as Contract B

    A->>EVM: CALL(B, gas, value, input)
    EVM->>EVM: Create new frame<br/>(fresh stack, memory)
    EVM->>B: Execute B's bytecode
    B-->>EVM: RETURN(output)
    EVM-->>A: Resume A<br/>(success=1, returndata=output)

Call types

Opcodemsg.senderStorageState modsValue transfer
CALLCaller's addressCallee'sAllowedYes
DELEGATECALLOriginal callerCaller'sAllowedNo
STATICCALLCaller's addressCallee'sForbiddenNo
CALLCODECaller's addressCaller'sAllowedYes (deprecated)

CALL — standard call

This creates a new execution context. msg.sender is the calling contract, storage access is scoped to the callee, and ETH can be transferred.

Before:  Stack [ ..., retSize, retOff, argSize, argOff, value, addr, gas ]
CALL: pop all 7 args → create new frame → execute callee bytecode
After:   Stack [ ..., success ]  ← 1 if callee returned, 0 if it reverted

DELEGATECALL — call with caller's context

This is the foundation of the proxy pattern. The callee's code runs in the caller's storage context. msg.sender and msg.value are preserved from the original call. I found this is how upgradeable contracts work — the proxy's storage is modified by the implementation's logic.

Before:  Stack [ ..., retSize, retOff, argSize, argOff, addr, gas ]
DELEGATECALL: pop 6 args → new frame with CALLER's storage and msg.sender
After:   Stack [ ..., success ]  ← 1 or 0

Pitfall: DELEGATECALL Storage Collisions

When Contract A DELEGATECALLs Contract B, B's code writes to A's storage slots. If A and B have different storage layouts, B will overwrite A's variables at the wrong slots. This is the most common proxy bug.

STATICCALL — read-only call

A read-only call. Any opcode that modifies state (SSTORE, LOG0LOG4, CREATE, CREATE2, CALL with non-zero value, and SELFDESTRUCT) will revert the sub-call. I use this for view functions and safe external reads.

Before:  Stack [ ..., retSize, retOff, argSize, argOff, addr, gas ]
STATICCALL: pop 6 args → new frame with is_static=true
After:   Stack [ ..., success ]  ← reverts if callee tries to write state

The 63/64 Rule (EIP-150)

Before EIP-150, a contract could forward all its remaining gas to a sub-call. This enabled gas-exhaustion attacks: a malicious contract could create deeply nested calls, each consuming almost all gas, leaving the caller with nothing. Once I understood this, the fix made immediate sense.

EIP-150 limits forwarded gas to 63/64 of remaining gas:

#![allow(unused)]
fn main() {
let max_forward = gas_remaining - gas_remaining / 64;
let actual = gas_requested.min(max_forward);
}

This ensures the caller always retains at least 1/64 of its gas after a sub-call, which makes gas accounting predictable.

Depth limit

The call depth is limited to 1024. The 1025th nested call fails immediately. Combined with the 63/64 rule, I noticed this makes deep call tree attacks doubly impractical.

Call frame lifecycle

  1. Push frame: save current state, create new frame with fresh stack/memory
  2. Execute: run the callee's bytecode in the new frame
  3. Pop frame: restore caller's state, copy return data
  4. Result: caller reads success/failure and return data

If the sub-call reverts, the caller's state is untouched — only the sub-call's changes are discarded. Implementing this correctly took me a couple of iterations.

Precompiles

Addresses 0x1 through 0x9 are precompiled contracts — built-in functions with fixed gas costs:

AddressFunctionGas
0x1ecrecover (signature recovery)3000
0x2SHA-25660 + 12/word
0x3RIPEMD-160600 + 120/word
0x4identity (memcpy)15 + 3/word

Precompiles look like regular CALLs but are handled specially by the interpreter — no bytecode is executed. I wired them in as a dispatch table in the host layer.

Exercises

  • 5.1 — Construct CallFrame for each call type, verify fields
  • 5.2 — Verify MAX_CALL_DEPTH constant
  • 5.4 — Gas forwarding via the 63/64 rule

Run: cargo test call

Exercises — Module 5

Exercise Setup

git checkout -b my-day-5 my-day-4

Create stub file:

  • src/call.rsCallFrame, CallType, gas_to_forward()

Add pub mod call; to src/lib.rs.

Peek: git show day-5:src/call.rs | head -130

Compare: git diff day-5 -- src/call.rs

These exercises cover call frame construction, the depth limit, and the 63/64 gas-forwarding rule (EIP-150).

63/64 Rule (reference)

#![allow(unused)]
fn main() {
/// Calculate the gas to forward to a sub-call using the 63/64 rule (EIP-150).
///
/// The caller can specify a gas amount, but at most 63/64 of remaining gas
/// is forwarded. This prevents gas exhaustion attacks through deep call trees.
pub fn gas_to_forward(gas_remaining: u64, gas_requested: u64) -> u64 {
    let max_forward = gas_remaining - gas_remaining / 64; // 63/64 of remaining
    gas_requested.min(max_forward)
}
}

Exercise 5.1 — CallFrame construction

What to do. Construct one frame for each of CALL, DELEGATECALL, and STATICCALL using the provided constructors. For each frame, assert the correct call_type, code_address, context_address, caller, and is_static flag.

Key invariants to verify:

Typecontext_addressis_staticvalue
CALLsame as code_addressfalseany
DELEGATECALLcaller's addressfalsepreserved
STATICCALLsame as code_addresstrueU256::ZERO

How to verify.

cargo test ex_5_1

Hint. For DELEGATECALL the storage context belongs to the caller, not the callee. Pass context_address = caller_addr to new_delegatecall. Asserting frame.context_address != frame.code_address is a good extra check.

Rust pattern — enum variants with constructors. Each new_* function encodes the invariants for its call type. Using CallType::DelegateCall in a match later lets the compiler remind you to handle all variants.


Exercise 5.2 — Depth limit is exactly 1024

What to do. Assert that the exported constant MAX_CALL_DEPTH equals 1024. Also write a comment explaining why this limit exists.

How to verify.

cargo test ex_5_2

Hint. The 1025th call must fail with a call-depth exceeded error, not a stack overflow in the host language. The limit is a Yellow Paper protocol constant, not a Rust implementation detail.

Rust pattern — protocol constants. Naming constants with pub const in the module that owns their semantics (call.rs) keeps the definition co-located with the enforcement logic and makes them importable everywhere without repetition.


Exercise 5.4 — Gas forwarding follows the 63/64 rule

What to do. Call gas_to_forward with at least four cases and assert the results:

  1. gas_remaining = 6400, gas_requested = 100006300 (capped at 63/64)
  2. gas_remaining = 6400, gas_requested = 10001000 (requested is below cap)
  3. gas_remaining = 64, gas_requested = 100063
  4. gas_remaining = 1, gas_requested = 10001

How to verify.

cargo test ex_5_4

Hint. Integer division truncates in Rust: gas_remaining / 64 rounds down, so is always at least . The min call picks whichever is smaller: the cap or the request.

Rust pattern — min/max arithmetic. a.min(b) is idiomatic Rust for if a < b { a } else { b }. Prefer it over manual branches to keep the forwarding formula on one line and easy to audit against the Yellow Paper.


Yellow Paper Map

SymbolMeaningSection
ΘMessage call function§8
Gas stipend on the stack§8.1
EIP-15063/64 gas forwarding rule§8
nCall depth counter§8
MAX_CALL_DEPTHProtocol constant (1024)§8

Run all Module 5 tests: cargo test call

Deep Dive — Module 5

Review questions for Module 5. Try answering before expanding.


Q: Explain the 63/64 gas rule.

Answer

EIP-150 mandates that at most 63/64 of the remaining gas can be forwarded to a sub-call. This ensures the caller always retains at least 1/64 of its gas after the call returns. Before this rule, an attacker could create deeply nested calls where each level consumed nearly all remaining gas, making the gas cost of the outer call unpredictable and enabling denial-of-service attacks. The formula is: forwarded = min(requested, remaining - remaining/64).


Q: What is the difference between CALL and DELEGATECALL?

Answer

CALL creates a new execution context where msg.sender is the calling contract and storage access is scoped to the callee's address. DELEGATECALL runs the callee's code but in the caller's context — storage reads/writes go to the caller's slots, and msg.sender is preserved (it's the original external caller, not the proxy). This is the foundation of upgradeable proxy contracts: the proxy's storage holds the state, and DELEGATECALL to the implementation contract provides the logic.


Q: Why can't reentrancy be prevented at the EVM protocol level?

Answer

The EVM protocol is correct — it accurately executes what the bytecode instructs. Reentrancy is a logic bug in Solidity code, not a protocol flaw. When Contract A calls Contract B, and B calls back into A before A finishes updating its state, the protocol faithfully executes both calls. The fix is at the application level: the Checks-Effects-Interactions (CEI) pattern, or a reentrancy guard that prevents re-entering a function while it's still executing. Making this a protocol rule would break legitimate use cases like flash loans and callback patterns.


Q: What does STATICCALL enforce and at what layer?

Answer

STATICCALL sets an is_static flag on the call frame. The interpreter checks this flag before executing any state-modifying opcode — SSTORE, LOG0..LOG4, CREATE, CREATE2, CALL with non-zero value, and SELFDESTRUCT. If a modification is attempted, execution reverts immediately with a StateModificationInStaticCall error. This enforcement happens at the interpreter layer, not the compiler layer — the bytecode can contain SSTORE instructions, but they trap at runtime if called via STATICCALL.


Q: How does the call depth limit interact with the 63/64 rule?

Answer

They're complementary defenses. The depth limit (1024) provides a hard cap on call nesting. The 63/64 rule provides an economic cap — even with unlimited depth, gas would be exhausted after about 63 levels of full-gas forwarding (since , you'd retain roughly 37% of original gas after 63 levels, but the absolute gas budget is finite). Together, they make deep call tree attacks both practically impossible (gas) and theoretically bounded (depth).

From Bytecode to IR: Control Flow Graphs

In Module 6 I shifted from executing bytecode to analyzing it. We lift the flat byte stream into a structured intermediate representation (IR) of basic blocks connected by control flow edges.

Why analyze bytecode?

Direct interpretation has limits. To optimize, we need structure:

  • Which code is reachable? (Dead code elimination)
  • Which values are used? (Constant folding, redundant load removal)
  • What does the control flow look like? (Loop detection, inlining)

The control flow graph (CFG) is the compiler's primary IR for answering these questions.

Basic blocks

A basic block is a maximal sequence of instructions with:

  • One entry point (the first instruction)
  • One exit point (the last instruction)
  • No internal jumps — if the first instruction executes, all instructions in the block execute

Why basic blocks matter

Because all instructions in a block execute together, we can reason about the block's effect as a single atomic operation: "this block pops 2 values and pushes 1." This net-effect summary is what enables optimization passes to work at scale without analyzing individual instructions across control flow boundaries.

A leader (the first instruction of a block) is identified by three rules:

  1. The first byte of the bytecode (offset 0)
  2. Any JUMPDEST instruction
  3. The instruction immediately after a JUMP, JUMPI, STOP, RETURN, REVERT, or INVALID

Worked example

Given bytecode PUSH1 4, JUMP, INVALID, JUMPDEST, STOP:

Offset  Byte  Opcode     Leader?   Why
0x00    0x60  PUSH1      ✓ (1)     First instruction
0x01    0x04  (imm)
0x02    0x56  JUMP
0x03    0xFE  INVALID    ✓ (3)     After JUMP
0x04    0x5B  JUMPDEST   ✓ (2)     JUMPDEST
0x05    0x00  STOP

Leaders: {0x00, 0x03, 0x04} → 3 basic blocks.

CFG edges

Edges represent possible control flow between blocks:

Last instructionOutgoing edges
Fall-through (no terminator)1 edge to next block
JUMP1 edge to target (if statically resolvable)
JUMPI2 edges: true branch (target) + false branch (fall-through)
STOP / RETURN / REVERT / INVALID0 edges (terminal)

Here's the CFG for the loop bytecode from Module 3 (i = 0; while i < 5: i++; return i):

graph TD
    B0["Block 0<br/>PUSH1 0"]
    B1["Block 1 (JUMPDEST)<br/>PUSH1 5, DUP2, LT<br/>PUSH1 &lt;body&gt;, JUMPI"]
    B2["Block 2<br/>MSTORE, RETURN"]
    B3["Block 3 (JUMPDEST)<br/>PUSH1 1, ADD<br/>PUSH1 &lt;loop&gt;, JUMP"]

    B0 --> B1
    B1 -- "i < 5 (true)" --> B3
    B1 -- "i >= 5 (false)" --> B2
    B3 --> B1

    style B3 stroke:#e65100,stroke-width:2px,stroke-dasharray: 5 5

Note the back-edge B3→B1: this is a loop. A back-edge points to a block that dominates the source — block B1 dominates B3 because every path from entry to B3 goes through B1.

Static jump resolution

EVM jumps are technically dynamic — the destination is a value popped from the stack at runtime. But in practice:

SourceJump resolution
Solidity/Vyper compiledAlmost all PUSH+JUMP — statically resolvable
Function dispatchPUSH+DUP+EQ+PUSH+JUMPI chain — resolvable with pattern matching
Handwritten (Huff)May compute destinations dynamically — requires abstract interpretation
EOF (EVM Object Format)RJUMP/RJUMPI with relative offsets — always static, no JUMPDEST needed

My CFG builder looks for the simplest pattern — a PUSH immediately before JUMP/JUMPI:

#![allow(unused)]
fn main() {
fn resolve_static_jump(&self, block_id: BlockId) -> Option<usize> {
    let instructions = &self.blocks[&block_id].instructions;
    let push_inst = &instructions[instructions.len() - 2];
    if !push_inst.opcode.is_push() { return None; }
    // Convert immediate bytes to offset
    ...
}
}

Limitation: Non-Adjacent PUSH

If a DUP or SWAP separates the PUSH from the JUMP (common in function dispatch tables), my resolver returns None and the edge is missing from the CFG. A production analyzer would track the stack symbolically to resolve these cases.

Stack height analysis

I track the stack height at the entry and exit of each block — a forward dataflow analysis.

The algorithm

  1. Initialize: entry block has height 0
  2. For each block on the worklist, simulate all instructions:
    • for each instruction
  3. Propagate exit height to all successors
  4. If a successor hasn't been visited, add it to the worklist

Why the worklist terminates

Each block is processed at most once (we skip if already in entry_height). With blocks, the worklist drains in at most iterations. This is an algorithm where is the average instructions per block.

At join points

When two blocks both flow into the same successor, the successor should have the same stack height from both paths — the EVM specification requires this. My simplified analysis takes the first height it sees and ignores conflicts. A production verifier would check that all incoming heights agree and reject the bytecode if they don't.

Use/def sets

For each block, we compute two values:

  • uses — how many stack slots does this block consume from the entry stack?
  • defs — how many stack slots does this block produce beyond what it consumed?

The computation tracks the minimum net height during execution:

where is the cumulative net stack effect after instruction .

Example: PUSH1 1, PUSH1 2, ADD, STOP

  • After PUSH1:
  • After PUSH1:
  • After ADD: (consumed 2, produced 1)
  • After STOP:
  • (never went below 0), so uses = 0, defs = 1

Example: POP, STOP

  • After POP:
  • After STOP:
  • , so uses = 1, defs = 0

Liveness (block reachability)

My "liveness" analysis determines which blocks are live — reachable from entry AND can reach an exit. This is two BFS passes:

  1. Forward BFS from entry → reachable_from_entry
  2. Backward BFS from exit blocks (STOP/RETURN/REVERT) → can_reach_exit
  3. Live = intersection

Dead blocks (unreachable, or can't reach any exit) are safe to eliminate.

Block-level vs. value-level liveness

This is block reachability, not true value-level liveness. True liveness tracks which stack slots are live at each program point — needed for removing dead PUSHes within reachable blocks. Our simplified model only removes entire unreachable blocks, not dead instructions within live blocks.

Connection to real-world tools

ToolWhat it doesRelation to our code
revm (analysis pass)Pre-validates JUMPDEST table, caches gasOur find_leaders + compute_jumpdests
revmc (JIT/AOT)Lifts to LLVM IR, compiles to nativeOur CFG → Module 7 optimization → (would need LLVM backend)
evm_mlirLambdaClass's MLIR-based AOT compilerFull CFG + dominator tree + SSA
EOFNew bytecode format (EIP-3540 family)Eliminates dynamic jumps entirely — CFG is trivial

Exercises

  • 6.1 — Leader identification on sample bytecode
  • 6.2 — Block extraction: correct number of blocks
  • 6.3 — Fall-through edges for sequential code
  • 6.4 — JUMP edge resolution
  • 6.5 — JUMPI produces two outgoing edges
  • 6.6 — Entry and exit block identification
  • 6.7 — Stack height tracking (forward dataflow)
  • 6.8 — Use/def set computation
  • 6.9 — Liveness worklist convergence

Run: cargo test compiler

Exercises — Module 6

Exercise Setup

git checkout -b my-day-6 my-day-5

Create the compiler module:

mkdir -p src/compiler
  • src/compiler/mod.rspub mod ir; pub mod cfg; pub mod analysis;
  • src/compiler/ir.rsBlockId, Instruction, BasicBlock
  • src/compiler/cfg.rsCfg, find_leaders(), edge computation
  • src/compiler/analysis.rscompute_stack_heights(), compute_use_def(), compute_liveness()

Add pub mod compiler; to src/lib.rs.

Peek: git show day-6:src/compiler/cfg.rs | head -100

Compare: git diff day-6 -- src/compiler/

These exercises build the control flow graph (CFG) step by step: identify leaders, extract blocks, add edges, and run dataflow analyses.


Exercise 6.1 — Leader identification

What to do. Given the bytecode [PUSH1 0x04, JUMP, INVALID, JUMPDEST, STOP], call your leader-identification function and assert that offsets 0, 3 (instruction after JUMP), and 4 (JUMPDEST) are all marked as leaders.

How to verify.

cargo test ex_6_1

Hint. Three rules produce leaders: offset 0 is always a leader; any JUMPDEST is a leader; the instruction immediately following JUMP, JUMPI, STOP, RETURN, or REVERT is a leader (even if that offset is INVALID).

Rust pattern — BTreeSet for ordered collections. Use BTreeSet<usize> to store leader offsets. It deduplicates automatically and iterates in ascending order, which matches how you walk the bytecode left-to-right to form blocks.


Exercise 6.2 — Basic block extraction

What to do. Using the same bytecode as 6.1, run your block-extraction pass and assert that exactly three basic blocks are produced. Assert that each block's start offset matches one of the three leader offsets identified in 6.1.

How to verify.

cargo test ex_6_2

Hint. A block spans from its leader to (but not including) the next leader. The last block spans to the end of the bytecode. Build a sorted Vec<usize> of leaders to compute block boundaries with a simple pairwise iteration.

Rust pattern — BTreeMap<BlockId, BasicBlock>. Keying blocks by a BlockId (a usize or newtype wrapper) gives O(log n) lookup and deterministic iteration order, which simplifies debugging and makes test assertions order-independent.


Exercise 6.3 — Fall-through edges for sequential code

What to do. Build a CFG from bytecode that has no explicit jump (PUSH1 1, PUSH1 2, ADD, STOP). Assert that block 0 has zero outgoing edges (it ends with STOP) and that no fall-through edge is added past a terminal instruction.

Then use bytecode with two sequential blocks where block 0 does not end with a terminal (PUSH1 0, POP, JUMPDEST, STOP). Assert that block 0 has exactly one outgoing edge pointing to block 1.

How to verify.

cargo test ex_6_3

Hint. A fall-through edge is added from block B to its successor when B's last instruction is not a terminal (JUMP, JUMPI, STOP, RETURN, REVERT, INVALID).

Rust pattern — graph representation. Store edges as HashMap<BlockId, Vec<BlockId>> on the Cfg struct. Each key is a source block; the value is the list of successor block IDs.


Exercise 6.4 — JUMP edges are statically resolved

What to do. Build a CFG from PUSH1 0x04, JUMP, INVALID, JUMPDEST, STOP. Assert that block 0 (containing the PUSH + JUMP) has exactly one outgoing edge pointing to the block that starts at offset 0x04 (the JUMPDEST).

How to verify.

cargo test ex_6_4

Hint. Look for the pattern PUSH* immediate, JUMP in the instruction list. The immediate value is the jump target. Verify the target is a valid JUMPDEST before adding the edge; otherwise the jump is invalid and should produce no edge (or an error edge).

Rust pattern — pattern matching on instruction sequences. Use a sliding window (instructions.windows(2)) or a manual two-pointer walk to find the [Push(n), Jump] pattern. Destructuring in the match arm keeps the logic readable.


Exercise 6.5 — JUMPI produces two outgoing edges

What to do. Build a CFG from PUSH1 0x01, PUSH1 0x07, JUMPI, STOP, INVALID, JUMPDEST, STOP. The push order is: condition first (1, deeper), then destination (0x07, on top) — JUMPI pops the destination from TOS and the condition second. Assert that the block ending with JUMPI has exactly two outgoing edges: one to the JUMPDEST target (offset 7) and one fall-through to the instruction immediately after JUMPI.

How to verify.

cargo test ex_6_5

Hint. The true branch goes to the resolved target (if the condition is non-zero); the false branch falls through. Both edges must always be present in the static CFG, even though at runtime only one is taken.

Rust pattern — conditional edges in CFGs. Model both edges as plain entries in the successor list. Downstream analyses (liveness, DCE) correctly propagate along both edges without needing an "edge condition" annotation.


Exercise 6.6 — Entry and exit block identification

What to do. Build a CFG with at least three blocks. Assert that exactly one block has block_id == 0 (entry). Assert that the blocks with no outgoing edges (ending with STOP, RETURN, REVERT) are the exit blocks. Also assert that the entry block is not an exit block (it has at least one outgoing edge in the test bytecode).

How to verify.

cargo test ex_6_6

Hint. Entry is always the block at offset 0. Exit blocks are those whose successor list is empty after edge construction.

Rust pattern — filtering with iterators. Use cfg.blocks.values().filter(|b| b.successors.is_empty()) to collect exit blocks without a manual loop. This composes naturally with .count(), .map(), or .collect().


Exercise 6.7 — Stack height tracking (forward dataflow)

What to do. Build a CFG for PUSH1 1, PUSH1 2, ADD, PUSH1 3, STOP. Run the stack height analysis and assert:

  • Entry height of block 0 is 0.
  • After PUSH1 1: height = 1. After PUSH1 2: height = 2. After ADD: height = 1. After PUSH1 3: height = 2.
  • Exit height of block 0 is 2.

How to verify.

cargo test ex_6_7

Hint. Each instruction adjusts height by outputs - inputs. For PUSH1: +1. For ADD: 2 inputs, 1 output → -1. Propagate the exit height of a block to the entry height of each successor.

Rust pattern — worklist algorithm. Initialize a VecDeque with the entry block. Pop a block, compute its exit height, and push any successor whose entry height has changed. Repeat until the queue is empty.


Exercise 6.8 — Use/def sets (net stack effects)

What to do. For a block containing DUP1, ADD, POP (net effect: pops 1, pushes 0), assert that uses = 1 (DUP1 reads 1 slot from the entry stack; ADD consumes the two copies DUP1 produced, not additional entry-stack slots) and defs = 0.

Also test PUSH1 x, PUSH1 y, ADD — this block produces one value from constants only, so uses = 0 and defs = 1.

How to verify.

cargo test ex_6_8

Hint. Walk the instruction list tracking a virtual stack height starting at 0. When an input comes from below height 0, it is a "use" of the entry stack. Each output pushed above the high-water mark is a "def".

Rust pattern — isize for signed arithmetic. Use isize for the running virtual stack height so underflows (reading from the entry stack) produce negative values that you can count as uses, without special-casing.


Exercise 6.9 — Liveness (forward + backward reachability)

What to do. Build a CFG with one unreachable block: block A → block B → block C, where there is also block D with no predecessors. Run the liveness pass and assert:

  • Blocks A, B, C are live.
  • Block D is not live (unreachable from entry).

Also verify that a block only reachable from entry but having no path to any exit is not live (backward unreachable).

How to verify.

cargo test ex_6_9

Hint. Forward reachability: BFS/DFS from entry, marking blocks reachable. Backward reachability: BFS/DFS from all exits in the reversed graph. A block is live iff it appears in both sets.

Rust pattern — set intersection (BTreeSet). Compute forward and backward reachable sets separately, then use forward.intersection(&backward) to obtain the live set in one expression.


Run all Module 6 tests: cargo test compiler

Deep Dive — Module 6

Review questions for Module 6. Try answering before expanding.


Q: What is a basic block and why is it the unit of compiler analysis?

Answer

A basic block is a maximal straight-line sequence of instructions — one entry point, one exit point, no internal branches. If the first instruction executes, all instructions execute. This property lets us summarize the block's effect as a single net operation (e.g., "pops 2, pushes 1, writes storage slot X") without reasoning about conditional paths. Every optimization pass — from constant folding to register allocation — operates on or between basic blocks.

See Dragon Book §8.4 for the classic textbook treatment.


Q: Walk me through constructing a CFG from EVM bytecode.

Answer

Three steps:

  1. Find leaders — scan linearly, marking offset 0, every JUMPDEST, and every instruction after a terminator (JUMP/JUMPI/STOP/RETURN/REVERT). We must skip PUSH immediate bytes to avoid misidentifying a 0x5B byte inside a PUSH32 as a JUMPDEST.

  2. Extract blocks — split the bytecode at leader boundaries. Each block gets an ID, a start/end offset, and a list of decoded instructions.

  3. Add edges — for each block's last instruction: JUMP → resolve the target (look for PUSH before JUMP); JUMPI → two edges (target + fall-through); STOP/RETURN → no edges; anything else → fall-through to next block.

The critical subtlety is step 1: JUMPDEST validation. A byte 0x5B that appears inside the immediate data of PUSH32 0x...5B... is not a valid jump target. This is a security property — without it, an attacker could craft a PUSH with embedded code and jump into the middle of it.


Q: Why is EVM bytecode harder to analyze than x86 or LLVM IR?

Answer

Three reasons:

  1. Dynamic jumps: the destination is a runtime stack value. x86 has jmp rax but the register set is bounded and trackable; EVM's stack is unbounded and any computation can produce a jump target. This makes the CFG potentially incomplete.

  2. No explicit function boundaries: Solidity's function dispatch is a chain of CALLDATALOAD, SHR, EQ, JUMPI — the compiler encodes function selectors as conditional jumps, not call/ret instructions. Recovering function boundaries requires pattern matching.

  3. Stack machine: no named registers, no SSA form. The "liveness" of a value depends on its depth in the stack, which changes with every PUSH/POP/DUP/SWAP. Converting to SSA requires stack-to-register promotion — essentially, simulating the stack symbolically.

The EOF family of EIPs solves all three: EIP-4200 introduces RJUMP/RJUMPI (static relative jumps), EIP-4750 adds CALLF/RETF (explicit function calls), and EIP-663 adds DUPN/SWAPN (larger operand range). The container format itself is defined in EIP-3540. With EOF, CFG construction becomes trivial — no JUMPDEST scanning, no dynamic jump resolution.


Q: What are dominators and why do they matter for loops?

Answer

Block A dominates block B if every path from the entry to B must pass through A. The entry block dominates everything. The immediate dominator of B is the closest strict dominator.

Dominators matter because they define natural loops: a back-edge B→A where A dominates B identifies a loop with header A. The loop body is all blocks that can reach B without going through A, plus A itself.

Our toyevm doesn't compute dominators, but a production optimizer (like revm's analysis or LLVM) needs them for:

  • Loop-invariant code motion (hoist computations out of loops)
  • SSA construction (placing φ-functions at dominance frontiers)
  • Loop unrolling and strength reduction

The dominators can be computed in time using the Lengauer-Tarjan algorithm, where is the number of blocks.


Q: Explain the difference between block-level and value-level liveness.

Answer

Block-level (what we implement): a block is live if it's reachable from entry AND can reach an exit. Used for dead block elimination — removing unreachable code.

Value-level: a specific value (stack slot, variable) is live at a program point if there exists a path from that point to a use of that value. Used for:

  • Dead store elimination: if an SSTORE's value is overwritten before being read, remove it
  • Register allocation: a register holding a dead value can be reused
  • Dead code elimination: a PUSH whose value is never consumed can be removed

Value-level liveness requires tracking individual stack slots through DUP/SWAP operations, which is significantly more complex than block reachability. In the EVM stack machine, the "name" of a value is its depth, which shifts with every operation.


Q: How does revm speed up execution without compiling to native code?

Answer

revm applies CFG-level analysis once at contract load time, then interprets with reduced per-opcode overhead:

  1. JUMPDEST bitmap: pre-scanned, so JUMP validation is a single bit check instead of a linear scan
  2. Gas pre-computation: static gas costs are cached per instruction, avoiding the dispatch-to-cost lookup on every step
  3. Padded bytecode: the bytecode array is extended with STOP bytes so the interpreter doesn't need bounds checks on every PC advance
  4. repr(u8) dispatch: the opcode match compiles to a jump table — O(1) branch prediction-friendly dispatch

These are all techniques I discovered by profiling: the interpreter loop is the hottest code in an Ethereum node, and every nanosecond per opcode matters when processing millions of transactions.


Q: What would it take to go from our CFG to native code?

Answer

Four additional passes:

  1. Stack-to-register promotion: simulate the stack, assign each slot to a virtual register. This is the hardest step for EVM because DUP/SWAP create aliasing.

  2. SSA construction: insert φ-functions at join points using the dominator tree. Each value gets a unique name, enabling powerful optimizations (GVN, SCCP, LICM).

  3. Optimization: constant propagation, dead code elimination, loop invariant code motion, common subexpression elimination — all standard compiler passes, now applicable because we have SSA.

  4. Code generation: lower SSA IR to machine code (x86/ARM) via LLVM or Cranelift. Gas metering is compiled as a counter decrement at each basic block entry.

This is exactly what projects like evm_mlir do. LambdaClass reports 3–7x speedups over interpretation, with further gains possible as the pipeline matures.

Optimization Passes

In Module 7 I closed the loop: we take the CFG from Module 6 and transform it. Two classic optimization passes — constant folding and dead code elimination — demonstrate the fundamental techniques that production compilers like evm_mlir use at scale. I also explored how to make the interpreter itself faster — the techniques that separate a naive EVM from a production one.

The OptimizationPass trait

Every optimization conforms to a common interface:

#![allow(unused)]
fn main() {
pub trait OptimizationPass {
    fn run(&self, cfg: &mut Cfg) -> usize;  // returns number of changes
    fn name(&self) -> &str;
}
}

This enables fixed-point iteration: run all passes in a loop until none of them make changes. The trait is the standard pattern for composable compiler passes — LLVM, GCC, and Cranelift all use variations of this interface.

Rust Pattern: Trait Objects for Passes

We can collect passes into a Vec<Box<dyn OptimizationPass>> and iterate over them generically. This decouples the pass manager from the individual passes — adding a new optimization requires no changes to the driver loop.

Constant Folding

The simplest optimization: if both operands of an arithmetic operation are compile-time constants, evaluate it now and replace the three instructions with a single PUSH.

Before:

PUSH1 3    ; 3 gas
PUSH1 5    ; 3 gas
ADD        ; 3 gas
           ; total: 9 gas, 3 instructions

After:

PUSH1 8    ; 3 gas
           ; total: 3 gas, 1 instruction

This saves 6 gas per fold — a 67% reduction for this pattern. Across a contract with many constant expressions (e.g., ABI offset calculations, storage slot computations), the savings accumulate significantly.

Chained folding

Consider: PUSH1 2, PUSH1 3, MUL, PUSH1 4, MUL. The first pass folds (2, 3, MUL)PUSH1 6. The second pass sees (PUSH1 6, PUSH1 4, MUL) and folds to PUSH1 24. This is why we need fixed-point iteration — a single pass isn't enough.

The fixed-point loop terminates because each iteration strictly reduces the instruction count (three instructions become one). With instructions, at most folds can occur, so the loop terminates in iterations.

Pitfall: Division by Zero in Folding

When folding PUSH 5, PUSH 0, DIV, the result must match the EVM specification: division by zero returns zero (not a panic). The evaluate_binop function must handle this case explicitly.

Dead Code Elimination

DCE removes blocks that are unreachable from the entry or cannot reach any exit. This uses the liveness analysis from Module 6.

A block is dead if:

  • It's unreachable from the entry block (forward), OR
  • It cannot reach any exit block (backward)

Example:

Block 0: PUSH1 4, JUMP → Block 2
Block 1: INVALID        ← unreachable! (DCE removes this)
Block 2: JUMPDEST, STOP

Why We Don't Remove the Entry Block

Even if static analysis cannot prove the entry block reaches an exit (e.g., all outgoing jumps are dynamically resolved), the entry block must be preserved — removing it would destroy the program.

Making the interpreter faster

My interpreter works correctly, but it is far from optimal. Production EVMs like revm use several techniques to minimize per-opcode overhead. These are the techniques that matter most, ordered by impact.

1. JUMPDEST bitmap pre-computation

Every JUMP/JUMPI must validate that the target is a JUMPDEST. A naive approach scans the bytecode on each jump — per jump. Instead, pre-compute a bitmap at contract load time:

#![allow(unused)]
fn main() {
fn compute_jumpdest_bitmap(bytecode: &[u8]) -> Vec<bool> {
    let mut bitmap = vec![false; bytecode.len()];
    let mut i = 0;
    while i < bytecode.len() {
        if bytecode[i] == 0x5B { // JUMPDEST
            bitmap[i] = true;
        }
        // Skip PUSH immediates
        if (0x60..=0x7F).contains(&bytecode[i]) {
            i += (bytecode[i] - 0x5F) as usize;
        }
        i += 1;
    }
    bitmap
}
}

Jump validation becomes a single array lookup: instead of .

2. Padded bytecode (bounds-check elimination)

The interpreter's hot loop checks pc < bytecode.len() on every iteration. By appending STOP bytes to the bytecode (padding to the next cache-line boundary), the bounds check becomes redundant — running past the end hits a STOP and halts normally.

#![allow(unused)]
fn main() {
let mut padded = bytecode.to_vec();
padded.resize(padded.len() + 33, 0x00); // 33 for max PUSH32 + STOP
}

This eliminates a branch from the hottest loop in the program. At billions of iterations per block, even one fewer branch per iteration matters.

3. Gas pre-computation

Instead of looking up gas costs in a match table on every opcode dispatch, pre-compute the static gas cost for each instruction at analysis time and store it alongside the opcode:

#![allow(unused)]
fn main() {
struct AnalyzedOp {
    opcode: Opcode,
    static_gas: u64,
    // immediate bytes already decoded
}
}

The interpreter can then deduct gas with a single subtraction, no dispatch needed. Dynamic gas (memory expansion, SLOAD cold/warm) still requires runtime logic, but static gas covers the majority of opcodes.

4. repr(u8) jump table dispatch

Our opcode enum already uses #[repr(u8)], which Rust compiles into a jump table — a single indexed memory load rather than a chain of if-else comparisons. This is dispatch with good branch-predictor behavior.

The key detail: the match must be exhaustive and the enum values must be dense (no large gaps). Both conditions hold for EVM opcodes. This is the same technique Python's CPython ceval loop and the HotSpot JVM interpreter use.

5. Stack representation

Our stack uses Vec<U256>, which is heap-allocated. A production interpreter might use a fixed-size array on the stack frame:

#![allow(unused)]
fn main() {
struct FastStack {
    data: [U256; 1024],  // fixed-size, no heap allocation
    top: usize,
}
}

This avoids heap allocation entirely and improves cache locality — the entire stack fits in L1/L2 cache.

Benchmark before optimizing

These techniques are well-known, but their impact depends on the workload. Always profile with realistic contracts before optimizing. The evm.codes playground and Foundry's forge test --gas-report are useful for generating realistic benchmarks.

Superinstructions (bonus concept)

A superinstruction fuses a common sequence into a single compound opcode. For example, PUSH1 x, PUSH1 y, ADD could become PUSHADD x y — one dispatch instead of three. This reduces interpreter loop overhead and can enable further constant folding.

Common EVM superinstruction candidates (from profiling real contract bytecode):

PatternFrequencySavings
PUSH1, PUSH1, ADD/MULVery high (offset math)2 dispatches
DUP1, PUSH1, LT/GTHigh (loop bounds)1 dispatch
PUSH1, MLOAD/MSTOREHigh (memory access)1 dispatch
ISZERO, PUSH, JUMPIHigh (conditional branch)2 dispatches

Production interpreters like CPython and LuaJIT use superinstructions extensively. In the EVM context, revm's analysis phase effectively does this by pre-computing gas costs and skipping redundant checks.

Speculative pre-execution over the CFG

With the CFG from Module 6 in hand, we can go beyond static analysis: speculatively execute both arms of a conditional branch before the condition is resolved. The idea is borrowed from CPU branch prediction — but applied at the bytecode level.

The problem

At a JUMPI, the interpreter must evaluate the condition to decide which block to enter. Until the condition is known, the pipeline stalls. In a tight loop with expensive setup (e.g., loading storage values), this stall happens on every iteration.

The idea

Given a JUMPI with two outgoing CFG edges (true → block , false → block ), we can:

  1. Fork the machine state — snapshot the stack, memory, and gas counter
  2. Execute both successors on their own snapshot
  3. When the condition resolves, discard the wrong path and adopt the correct one

If the speculated path has no side effects (no SSTORE, LOG, CALL, etc.), we gain a head start on whichever branch is taken.

When it helps

Speculative pre-execution is profitable when:

  • The condition depends on a value that is expensive to compute (e.g., an SLOAD or a deep arithmetic chain)
  • Both successor blocks are short and side-effect-free (pure stack/memory computation)
  • The CFG shows a diamond pattern (both branches reconverge quickly)
graph TD
    A["Block A<br/>..., JUMPI"] --> B_t["Block Bt (true)<br/>pure computation"]
    A --> B_f["Block Bf (false)<br/>pure computation"]
    B_t --> C["Block C (merge)"]
    B_f --> C
    style B_t stroke:#2e7d32,stroke-width:2px,stroke-dasharray: 5 5
    style B_f stroke:#2e7d32,stroke-width:2px,stroke-dasharray: 5 5

In this diamond, blocks and are both speculated. Once the condition resolves, we pick the correct result and continue into block C.

When it does not help

Speculation is wasted (or dangerous) when:

  • A successor block has side effectsSSTORE, LOG, CALL, CREATE, SELFDESTRUCT. These cannot be rolled back.
  • The block is long or gas-heavy — speculating a 500-instruction block costs real gas even if the result is discarded.
  • The CFG has deep divergence — if the two paths don't reconverge quickly, the speculated state grows unboundedly.

Safety check using the CFG

The CFG gives us the information we need to decide whether speculation is safe:

#![allow(unused)]
fn main() {
fn is_safe_to_speculate(cfg: &Cfg, block_id: BlockId) -> bool {
    let block = &cfg.blocks[&block_id];
    block.instructions.iter().all(|inst| {
        !matches!(
            inst.opcode,
            Opcode::SSTORE
                | Opcode::LOG0 | Opcode::LOG1 | Opcode::LOG2
                | Opcode::LOG3 | Opcode::LOG4
                | Opcode::CALL | Opcode::DELEGATECALL | Opcode::STATICCALL
                | Opcode::CREATE | Opcode::CREATE2
                | Opcode::SELFDESTRUCT
        )
    })
}
}

This is a conservative check: if every instruction in the block is pure (stack, memory, arithmetic), we can safely speculate it.

Connection to real systems

SystemTechniqueSimilarity
CPU branch predictionExecute both pipeline paths, flush the wrong oneSame idea, hardware level
JVM HotSpotSpeculative inlining based on profilingSpeculates on call targets, deoptimizes on miss
Trace-based JITs (LuaJIT)Record a trace for the hot path, fall back on deviationSpeculates the entire loop body
Database query enginesSpeculative prefetch based on predicate selectivitySame fork-and-choose pattern

In the EVM context, speculative pre-execution is most useful in an AOT compiler or analysis tool — not in a production interpreter (where the overhead of forking state per JUMPI would dominate). But as a learning exercise, it ties together everything from Modules 6 and 7: the CFG gives you the graph, liveness tells you which blocks matter, and the side-effect check tells you which blocks are safe.

Exercise idea

Extend the CFG with a method speculative_candidates(&self) -> Vec<(BlockId, BlockId)> that returns all JUMPI blocks where both successors are safe to speculate. Run it on the loop bytecode from Module 3 and verify that the loop body qualifies.

AOT Compilation (discussion)

My optimizer works on the IR but still outputs bytecode for the interpreter. The next step would be ahead-of-time compilation: transpile the CFG into native machine code.

This is exactly what LambdaClass's evm_mlir does:

  1. Lift EVM bytecode → MLIR dialect (each opcode becomes an MLIR operation)
  2. Optimize with standard MLIR passes (constant folding, DCE, LICM)
  3. Lower to LLVM IR
  4. Codegen via LLVM backend → native x86/ARM

Their benchmarks show 3–7x speedup over interpretation (varying by contract and hardware). The key insight: gas metering is compiled as a counter decrement at each basic block entry, not checked per-opcode.

flowchart LR
    A["EVM Bytecode"] --> B["CFG"]
    B --> C["MLIR Dialect"]
    C --> D["LLVM IR"]
    D --> E["Native Code"]
    style E stroke:#2e7d32,stroke-width:2px,stroke-dasharray: 5 5

JIT vs. AOT in blockchain context

A JIT compiler compiles code on-the-fly during execution; an AOT compiler compiles before execution begins.

JITAOT
StartupSlow (compile on first call)Slow (compile at deploy)
Steady-stateFast (native code)Fast (native code)
DeterminismHarder (optimization choices vary)Easier (compile once, cache)
CacheMust invalidate on upgradeStored with contract

Blockchain requires deterministic execution — every node must produce the same result. JIT is risky because different optimization levels could produce different gas measurements. AOT with a fixed compilation pipeline is safer.

evm_mlir takes the AOT approach for this reason.

Why not JIT?

A question I kept coming back to: if native code is faster, why don't production EVMs just JIT-compile everything?

It turns out the EVM workload is a poor fit for JIT compilation. Several factors work against it:

  1. Compilation latency. Even at -O0, LLVM takes milliseconds per contract. A blockchain node processes thousands of transactions per block — if each contract is compiled on first call, the compilation overhead dominates. A contract called once never amortizes the cost.

  2. Code size and cache pressure. JIT-compiled native code is larger than the bytecode it replaces. Thousands of compiled contracts fill the instruction cache. Cache misses on x86 cost 100+ cycles each — enough to wipe out the gains from native execution.

  3. Branch prediction. A well-tuned interpreter has a single hot dispatch loop that the CPU's branch predictor learns quickly. JIT-compiled code scatters control flow across generated functions the predictor has never seen. The interpreter's predictable pattern can outperform "faster" native code that the CPU mispredicts.

  4. Cold contracts. JIT compilation only pays off for contracts called many times. Most contracts on Ethereum are called rarely — the median contract receives fewer than 10 transactions over its lifetime. For these, JIT warm-up is pure overhead.

  5. Determinism. Ensuring bit-identical gas accounting across different hardware (x86 vs. ARM), OS versions, and compiler versions is extremely difficult. Any divergence causes consensus failures.

Tail-call interpreters

The approach the community converged on is the tail-call interpreter — sometimes called a "threaded interpreter" or "must-tail dispatch." Instead of a central match loop, each opcode handler ends with a tail call directly to the next handler:

#![allow(unused)]
fn main() {
// Conceptual — each handler is a separate function
fn op_add(ctx: &mut Context) -> ! {
    let a = ctx.stack.pop();
    let b = ctx.stack.pop();
    ctx.stack.push(a + b);
    ctx.pc += 1;
    // Tail call to the next opcode's handler
    let next = DISPATCH_TABLE[ctx.bytecode[ctx.pc]];
    musttail next(ctx)
}
}

This eliminates the dispatch loop entirely. Each handler jumps directly to the next one:

  • No dispatch overhead — zero branches between opcodes
  • Better branch prediction — the CPU sees ADD → next opcode, not ADD → match → next opcode
  • No call stack growth — tail calls reuse the same stack frame
  • Deterministic — same bytecode always produces the same execution trace

Rust and must-tail calls

Rust does not yet have a #[musttail] attribute (there is an accepted RFC). In the meantime, revm uses a match loop that LLVM optimizes into a jump table — close to tail-call dispatch in practice. C compilers support __attribute__((musttail)) today, and projects like evmone (C++ EVM) use it directly.

The lesson I took from this: for the EVM's workload — short, diverse, rarely-repeated contracts — a very fast interpreter beats a compiler. AOT compilation has a niche for hot contracts, but the baseline must be a low-overhead interpreter.

References

  • revm source code — production Rust EVM with the optimization techniques described above
  • evm_mlir blog post — AOT compilation pipeline and benchmarks
  • EOF EIPs — EIP-3540 (container format), EIP-4200 (static jumps), EIP-4750 (functions), EIP-663 (DUPN/SWAPN)

Exercises

  • 7.1 — OptimizationPass trait compiles for both passes
  • 7.2 — Constant folding: PUSH 3, PUSH 5, ADD → PUSH 8
  • 7.3 — Chained folding: MUL chain → single PUSH
  • 7.4 — Folding reduces gas cost
  • 7.5 — DCE removes unreachable blocks
  • 7.6 — DCE removes block after unconditional JUMP
  • 7.7 — DCE preserves all reachable blocks
  • 7.8 — Fixed-point iteration with multiple passes

Run: cargo test fold and cargo test dce

Exercises — Module 7

Exercise Setup

git checkout -b my-day-7 my-day-6

Create optimization pass files:

  • src/compiler/fold.rsOptimizationPass trait, ConstantFold, run_to_fixed_point()
  • src/compiler/dce.rsDeadCodeElim

Add pub mod fold; pub mod dce; to src/compiler/mod.rs.

Peek: git show day-7:src/compiler/fold.rs | head -50

Compare: git diff day-7 -- src/compiler/fold.rs src/compiler/dce.rs

These exercises implement constant folding and dead code elimination (DCE) as composable optimization passes over the CFG from Module 6.

OptimizationPass trait (reference)

#![allow(unused)]
fn main() {
/// Trait for optimization passes that transform a CFG in place.
pub trait OptimizationPass {
    /// Run the pass. Returns the number of changes made.
    fn run(&self, cfg: &mut Cfg) -> usize;

    /// Name of the pass (for logging).
    fn name(&self) -> &str;
}
}

Exercise 7.1 — OptimizationPass trait compiles for both passes

What to do. Instantiate both ConstantFold and DeadCodeElim (dead code elimination) as Box<dyn OptimizationPass> values. Call pass.name() on each and assert the returned strings are non-empty. The test should compile and pass without running any bytecode.

How to verify.

cargo test ex_7_1

Hint. If the trait bound is satisfied, Box<dyn OptimizationPass> lets you store both passes in a Vec<Box<dyn OptimizationPass>> and iterate over them uniformly — this is the exact shape of a production optimizer pipeline.

Rust pattern — trait objects (Box<dyn OptimizationPass>). Dynamic dispatch here is intentional: the pipeline doesn't know (or care) which concrete pass it holds at compile time. The trade-off is one vtable lookup per run call, which is negligible compared to the CFG traversal cost.


Exercise 7.2 — Constant fold ADD

What to do. Build a single-block CFG containing exactly PUSH1 3, PUSH1 5, ADD. Run ConstantFold. Assert that:

  1. The pass returns changes > 0.
  2. The block now contains exactly one instruction.
  3. That instruction pushes the value 8.

How to verify.

cargo test ex_7_2

Hint. After folding, the three instructions (PUSH1 3, PUSH1 5, ADD) are replaced by a single PUSH32 8 (the folder always uses PUSH32 to avoid recomputing the minimal push width). Check instruction.immediate == U256::from(8).

Rust pattern — peephole pattern matching. The folder uses a sliding window of three instructions. Matching [Instruction::Push(a), Instruction::Push(b), Instruction::BinOp(op)] in a match arm is cleaner than three nested if let chains.


Exercise 7.3 — Fold a MUL chain to fixed point

What to do. Build a single-block CFG with PUSH1 2, PUSH1 3, MUL, PUSH1 4, MUL. Run ConstantFold once. Assert the block has two instructions (the first fold reduces 2 * 3 to 6, leaving PUSH32 6, PUSH1 4, MUL). Run ConstantFold again. Assert the block has exactly one instruction with value 24.

How to verify.

cargo test ex_7_3

Hint. One pass of the folder is not enough for chained arithmetic. The pass must be repeated until it returns 0. Use a while pass.run(cfg) > 0 {} loop (the fixed-point loop from exercise 7.8) or call it manually twice in this test to observe the two-step reduction.

Rust pattern — iterative refinement. Each call to run is idempotent for already-folded instructions: it only changes things that can still be improved. Counting returned changes drives the fixed-point loop without needing a separate "did anything change" flag.


Exercise 7.4 — Folding reduces gas cost

What to do. Compute the gas cost of [PUSH1 3, PUSH1 5, ADD] (three PUSH/ADD instructions at 3 gas each = 9 gas). After folding, compute the gas cost of the resulting single PUSH32 8 (3 gas). Assert cost_after < cost_before and that the difference is exactly 6.

How to verify.

cargo test ex_7_4

Hint. You do not need to execute bytecode — just multiply instruction_count * 3 for base-cost instructions. In a real optimizer you would sum per-instruction gas costs from the gas schedule table.

Rust pattern — reasoning about costs. Writing a cost-delta assertion alongside the transformation test ties correctness ("the value is right") to efficiency ("the cost went down"). If a future refactor breaks the fold, both assertions fail and the reason is immediately clear.


Exercise 7.5 — DCE removes unreachable blocks

What to do. Build a CFG with three blocks: block 0 jumps to block 2, block 1 has no predecessors (unreachable), and block 2 is a terminal. Run the DeadCodeElim pass. Assert that the returned changes > 0 and that block 1 no longer exists in cfg.blocks.

How to verify.

cargo test ex_7_5

Hint. The DeadCodeElim pass relies on the liveness analysis from Module 6. Run liveness first (or have DeadCodeElim::run call it internally) to determine which blocks are dead, then remove them from cfg.blocks and drop their edges.

Rust pattern — graph pruning. Collect dead block IDs into a Vec, then cfg.blocks.remove(&id) in a second loop. Avoid mutating the map while iterating over it — collect first, remove second.


Exercise 7.6 — DCE removes dead code after an unconditional JUMP

What to do. Build a CFG where block 0 ends with PUSH1 target, JUMP to block 2, and block 1 (at offset between the JUMP and the JUMPDEST) contains unreachable instructions. Run DeadCodeElim. Assert block 1 is removed and blocks 0 and 2 remain.

How to verify.

cargo test ex_7_6

Hint. Block 1 is unreachable because the CFG builder adds no edges into it — the JUMP skips over it and the block has no JUMPDEST. DCE's forward-reachability pass will never visit it.

Rust pattern — liveness as a prerequisite. DCE cannot run correctly without knowing which blocks are reachable. Express this dependency by calling the liveness pass inside DeadCodeElim::run, or by requiring a LivenessInfo argument. Encoding prerequisites in the type signature prevents calling DCE on a stale CFG.


Exercise 7.7 — DCE preserves all reachable blocks (no false positives)

What to do. Build a CFG where every block is reachable from the entry and every block can reach an exit (a diamond-shaped CFG: block 0 → blocks 1 and 2 via JUMPI → block 3). Run DeadCodeElim. Assert that changes == 0 and that all four blocks still exist.

How to verify.

cargo test ex_7_7

Hint. This is the "no false positive" invariant: DCE must not remove live blocks. If your liveness analysis is incorrect (e.g., it misses a backward-reachability path), this test catches it before the optimizer silently corrupts correct code.

Rust pattern — testing invariants. Testing that an optimization does nothing on a valid input is as important as testing that it does something on an invalid input. Add both to every optimization pass.


Exercise 7.8 — Fixed-point iteration with combined passes

What to do. Build a CFG with a MUL chain in block 0 and an unreachable block 1. Combine both passes in a loop:

#![allow(unused)]
fn main() {
let fold = ConstantFold;
let dce = DeadCodeElim;
let passes: Vec<&dyn OptimizationPass> = vec![&fold, &dce];
run_to_fixed_point(&mut cfg, &passes);
}

Assert that after the loop exits, the MUL chain is fully folded and block 1 is gone. Assert the loop terminates (runs at most a small fixed number of times in practice).

How to verify.

cargo test ex_7_8

Hint. Constant folding may expose new dead blocks (if folding changes a conditional branch to a constant), and DCE may enable further folding (by removing instructions that were only live due to a dead successor). The fixed-point loop handles both cases automatically.

Rust pattern — loop until stable. Summing changes across all passes per iteration and looping while total > 0 is the standard fixed-point driver. It is guaranteed to terminate because each iteration either reduces the instruction count or the block count — both are finite and non-negative.


Run all Module 7 tests: cargo test fold and cargo test dce

Deep Dive — Module 7

Review questions for Module 7. Try answering before expanding.


Q: Walk me through implementing constant folding on an IR.

Answer

Scan each basic block for patterns where two PUSH instructions are immediately followed by a binary operation (ADD, MUL, SUB, etc.). Extract the constant values from both PUSHes, evaluate the operation at compile time, and replace all three instructions with a single PUSH of the result. Repeat until no more folds are possible (fixed-point iteration), because one fold may create new folding opportunities (e.g., PUSH 2, PUSH 3, MUL folds to PUSH 6, which combined with a subsequent PUSH 4, MUL folds to PUSH 24).

Edge cases to handle: division by zero must return zero (not panic), shifts greater than 255 return zero, and modulo by zero returns zero — all per the Yellow Paper.


Q: Why do we need liveness analysis before dead code elimination?

Answer

Liveness tells us which values and blocks are actually needed. Without it, we might delete an instruction whose result is consumed much later (through a complex control flow path), or we might keep a block that's reachable but produces only dead values. Liveness combines forward reachability (can we reach this block from the entry?) with backward reachability (can this block reach an exit?). Only blocks satisfying both conditions are "live."


Q: What makes EVM optimization different from x86 optimization?

Answer

The objective function is gas cost, not wall-clock time. Storage accesses (SLOAD: 2100 gas cold) dominate — a single SLOAD costs as much as 700 ADD instructions. This means optimizations that reduce storage access (caching loads, combining writes) have outsized impact. Additionally, the EVM has no registers — only a stack — so register allocation passes don't apply. Instead, stack scheduling (minimizing DUP/SWAP) is the analog. Finally, determinism is mandatory: every optimization must preserve exact gas semantics, not just functional correctness.


Q: How does evm_mlir compile EVM bytecode to native code?

Answer

It lifts EVM bytecode into an MLIR dialect where each opcode becomes an MLIR operation. Standard MLIR passes (constant folding, dead code elimination, loop invariant code motion) optimize the IR. Gas metering is inserted as a counter decrement at each basic block entry — this replaces per-opcode gas checks with a single comparison per block. The optimized MLIR is lowered to LLVM IR, and the LLVM backend produces native machine code. The result is a function that takes (state, input) and returns (state', output, gas_used) — the same pure function model as interpretation, but significantly faster (LambdaClass reports 3–7x speedups).


Q: What is a superinstruction and when would I design one?

Answer

A superinstruction is a compound opcode that implements a common multi-opcode sequence as a single dispatch. Example: PUSH1 x, PUSH1 y, ADDPUSHADD x y. Benefits: reduces dispatch loop iterations (each dispatch has overhead from the fetch-decode-match cycle), enables the compiler to optimize the combined operation (e.g., avoid pushing x just to pop it immediately), and reduces code size. I'd design one when profiling shows a sequence appears frequently in real contracts and the dispatch overhead is measurable. CPython, LuaJIT, and Java's HotSpot all use superinstructions.


Q: What are the most impactful interpreter-level optimizations for an EVM?

Answer

In order of typical impact:

  1. JUMPDEST bitmap — pre-scan the bytecode once at load time, building a Vec<bool> or bitset. Jump validation drops from per jump to . This is the single highest-impact optimization because jumps are extremely frequent (every loop iteration, every function call dispatch).

  2. Padded bytecode — append STOP bytes so the interpreter doesn't need bounds checks on every PC advance. Eliminates a branch from the hottest loop.

  3. Gas pre-computation — cache the static gas cost per instruction at analysis time. The interpreter deducts gas with a single subtraction instead of dispatching to a cost lookup.

  4. Fixed-size stack — use [U256; 1024] instead of Vec<U256>. No heap allocation, better cache locality, zero-cost push/pop (just increment a pointer).

  5. repr(u8) jump table — ensure the opcode match compiles to a jump table (O(1) dispatch) rather than a chain of comparisons. Rust does this automatically with #[repr(u8)] on a contiguous enum.

These are the techniques revm uses. Combined, they bring the interpreter within 2–5x of native-compiled execution speed.


Q: Why is JIT compilation not the right approach for EVM execution?

Answer

The EVM workload is a poor fit for JIT compilation for several reinforcing reasons:

  • Compilation latency is never amortized — most contracts are called rarely (median: fewer than 10 transactions over their lifetime). LLVM at -O0 already takes milliseconds per contract, too slow for a node processing thousands of transactions per block.
  • Instruction cache pressure — JIT-compiled native code is larger than bytecode. Thousands of compiled contracts fill the I-cache, and cache misses on x86 cost 100+ cycles, wiping out the speedup.
  • Branch prediction favors the interpreter — a single hot dispatch loop is predictable; scattered generated code is not.
  • Determinism — ensuring bit-identical gas accounting across x86/ARM, OS versions, and compiler versions is extremely difficult. Any divergence is a consensus failure.

The alternative that emerged is the tail-call interpreter: each opcode handler ends with a direct jump to the next handler, eliminating the dispatch loop. This gives near-native speed without compilation overhead: no dispatch branches, better branch prediction, no call stack growth, and full determinism. Rust does not yet have #[musttail] (there is an accepted RFC), but a match with #[repr(u8)] compiles to a jump table that approaches the same performance. C++ EVMs like evmone use __attribute__((musttail)) directly.

AOT compilation still has a niche for hot contracts (e.g., via evm_mlir), but the baseline for production EVMs is a fast interpreter — not a compiler.


Q: How would speculative pre-execution work in practice, and what are its limits?

Answer

The core mechanism is state forking: at a JUMPI, snapshot the stack and memory, then execute both successor blocks in parallel (or sequentially) on their own copies. When the branch condition resolves, discard the wrong snapshot.

The practical limits are:

  1. State forking cost. Cloning the stack (1024 × 32 bytes = 32 KB max) and memory (which can be megabytes) on every JUMPI is expensive. A production implementation would use copy-on-write semantics — only clone the pages that are actually modified.

  2. Cascading speculation. If a speculated block itself ends with JUMPI, you'd need to speculate recursively. This creates an exponential blowup: nested JUMPIs produce speculative paths. In practice, you'd limit speculation depth to 1 or 2 levels.

  3. Side-effect boundaries. Any block that touches storage, emits logs, or makes external calls cannot be speculated — the effects can't be undone. In real Solidity contracts, most interesting branches (the ones that diverge based on user input) lead to side-effecting code almost immediately.

  4. Gas accounting. Speculated paths consume real computation but the gas must only be charged for the path actually taken. This requires careful bookkeeping to avoid over- or under-charging.

The technique is most useful in static analysis tools (exploring all paths through a contract) and AOT compilers (pre-computing both branches at compile time for simple cases). For a runtime interpreter, the overhead of forking state typically exceeds the savings from avoiding the branch stall.


Q: Why is determinism a constraint on EVM optimization?

Answer

Every Ethereum node must independently execute every transaction and arrive at the exact same state. This means:

  • Gas consumption must be identical across all implementations. An optimization that changes gas semantics (e.g., caching an SLOAD result so the second access is "free") would cause consensus failures.
  • Arithmetic must be bit-identical. Reordering operations that affect wrapping behavior (even if mathematically equivalent for unbounded integers) can produce different 256-bit results.
  • Memory expansion costs are based on the high-water mark, not actual usage. An optimization that delays memory writes doesn't reduce the gas charged.

This is why EVM optimizations focus on reducing dispatch overhead and pre-computing static analysis (which doesn't affect gas semantics) rather than rewriting the computation itself.