Bytecode and the Execution Loop
Today we bring the EVM to life. By the end of this chapter, I want our interpreter to execute arithmetic bytecode, track gas consumption, and return results — a complete fetch-decode-execute loop.
How bytecode works
EVM bytecode is a flat array of bytes. Each byte is either:
- An opcode (a single-byte instruction), or
- An immediate byte following a PUSH instruction
The program counter ( in the Yellow Paper) starts at 0 and advances linearly. When the interpreter encounters PUSH3, it needs to read the opcode byte, then consume the next 3 bytes as the immediate value, advancing the PC by 4 total (1 + 3).
Example bytecode: 60 03 60 05 01 00
Offset Byte Meaning
0x00 0x60 PUSH1
0x01 0x03 immediate: 3
0x02 0x60 PUSH1
0x03 0x05 immediate: 5
0x04 0x01 ADD
0x05 0x00 STOP
This pushes 3, pushes 5, adds them (result: 8 on stack), and stops.
The fetch-decode-execute loop
flowchart TD
A["Fetch byte at PC"] --> B["Decode opcode"]
B --> C{"Gas enough?"}
C -- No --> OOG["OutOfGas error"]
C -- Yes --> D["Deduct gas"]
D --> E["Execute opcode"]
E --> F{"STOP / RETURN?"}
F -- No --> G["Advance PC"]
G --> A
F -- Yes --> H["Return result"]
I found that our interpreter follows the classic pattern:
loop {
byte = bytecode[pc]
opcode = decode(byte) // Opcode::from_byte
gas_remaining -= cost(opcode) // static gas deduction
execute(opcode) // match on opcode enum
}
The loop terminates when:
- STOP is reached (return empty, success)
- RETURN is reached (return memory slice, success)
- REVERT is reached (return memory slice, reverted)
- The PC goes past the end of bytecode (implicit STOP)
- An error occurs (out of gas, invalid opcode, stack error)
Dispatch: match vs. HashMap
I went with a Rust match on a repr(u8) enum rather than a HashMap<u8, fn()>. Why?
- The compiler generates a jump table — a single indexed memory access, O(1)
- No heap allocation, no hashing overhead
- The
matchis exhaustive — the compiler ensures every opcode variant is handled - In production EVMs like
revm, dispatch is the hottest code path; every nanosecond matters
Our Opcode enum maps each byte to a name, plus an immediate_size() method. A production EVM would attach more metadata per opcode — something like:
| Field | Type | Purpose |
|---|---|---|
name | &str | Disassembly output |
immediate_size | u8 | Bytes following the opcode (0 for all but PUSH) |
min_stack | u8 | Minimum stack depth to execute |
stack_increase | u8 | Net stack growth after execution |
static_gas | u32 | Base gas cost |
dynamic_gas | bool | Whether runtime gas is also needed |
index | u8 | N for PUSHN/DUPN/SWAPN/LOGN, 0 otherwise |
This OpCodeInfo struct lets the interpreter validate stack depth and deduct gas before dispatching — no need to check inside each handler. We keep it simple here, but this is the direction revm and evmone take.
PUSH1 through PUSH32
The PUSH family is special: each opcode is followed by 1-32 immediate bytes that form the value to push. I noticed the bytes are read in big-endian order and right-aligned into a 32-byte U256.
All PUSH opcodes cost the same gas (3, the "very low" tier). The immediate bytes are not separate instructions — they're data embedded in the bytecode stream.
BYTE, SHL, SHR, and SAR — Bit-level access
These four opcodes sit in the 0x1A–0x1D range, right after the logical operators (AND, OR, XOR, NOT). They all cost 3 gas ("very low" tier) and are pure stack-to-stack operations with no side effects. Despite being easy to implement, they show up constantly in real compiled code — understanding them early pays off.
Why these matter
Solidity's function dispatch relies on SHR. Every external call begins with the ABI selector: the first 4 bytes of calldata. The compiler emits CALLDATALOAD to read 32 bytes, then SHR 224 to right-shift away the lower 28 bytes, isolating the 4-byte selector. If you don't implement SHR, you can't run even the simplest Solidity contract with more than one function.
SHL appears in ABI encoding, address masking (AND with SHL 160 minus one), and packing multiple values into a single storage slot. BYTE is used when contracts inspect individual bytes of a word — for example, iterating over an address byte-by-byte in a checksum routine. SAR (arithmetic right shift) preserves the sign bit, which matters for signed integer division: x / 2 on a negative int256 compiles to SAR 1 rather than SHR 1.
BYTE — extract a single byte
BYTE pops an index (top) and a word (second), then pushes the -th byte of counting from the most significant end. Indices 0–31 are valid; any pushes zero.
PUSH32 0xFF00...00 # byte 0 is 0xFF, bytes 1-31 are 0x00
PUSH1 0x00 # index 0
BYTE # → 0xFF
The Yellow Paper defines this as for , and otherwise. Notice the big-endian convention — byte 0 is the most significant, matching Ethereum's standard byte ordering.
SHL — shift left
SHL pops a shift amount (top) and a value (second), then pushes . Bits shifted past position 255 are discarded (the result is mod ). A shift always yields zero.
PUSH1 0x01 # value: 1
PUSH1 0x08 # shift: 8 bits
SHL # → 0x100 (256)
SHR — logical shift right
SHR pops a shift amount (top) and a value (second), then pushes . Vacated high bits are filled with zeros. A shift always yields zero.
PUSH2 0x0100 # value: 256
PUSH1 0x08 # shift: 8 bits
SHR # → 0x01 (1)
The selector-extraction pattern we mentioned looks like this in bytecode:
PUSH1 0x00
CALLDATALOAD # load 32 bytes from calldata offset 0
PUSH1 0xE0 # 224 in decimal
SHR # isolate the top 4 bytes → function selector
SAR — arithmetic shift right
SAR works like SHR but preserves the sign bit. If the most significant bit of the value is 1 (a negative two's-complement number), the vacated high bits are filled with ones instead of zeros.
# -2 in two's complement is 0xFFFF...FFFE
PUSH32 0xFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFE
PUSH1 0x01 # shift: 1 bit
SAR # → 0xFFFF...FFFF (which is -1)
With SHR, the same shift would produce 0x7FFF...FFFF — a large positive number, not . This is why the Solidity compiler picks SAR for signed division by powers of two.
SHL, SHR, and SAR were added in the Constantinople upgrade (2019) via EIP-145. Before that, shifting required MUL/DIV by powers of two and EXP — far more expensive. The EIP benchmarked a 10x speedup for common bit operations. Every post-Constantinople EVM must support them.