Why I Built an 8-Bit CPU Emulator (And Why You Should Too)

Luis Lopez-Echeto

Jan 2, 2026 • 12 min read

A microelectronics engineer's journey back to the basics...

There's something ironic about my career path.

I studied microelectronics engineering. I learned how transistors work, how to design digital circuits, how electrons actually move through silicon. I could draw you a CMOS inverter from memory and explain the physics of semiconductor doping.

Then I spent years in enterprise support, troubleshooting SaaS platforms and backend systems.

Don't get me wrong - I love what I do. There's something deeply satisfying about being the person who figures out why a customer's production system is down. Database locks? Memory leaks? Race conditions in distributed systems? API integration failing in ways that shouldn't be possible? That's my world. I've debugged thousands of issues across the full stack, often while the customer is waiting on a call.

Plus running a coding bootcamp taught me how to explain complex technical concepts to people who are just starting out.

But here's the thing: somewhere along the way, I realized I'd forgotten something fundamental.

I'd forgotten how computers actually work.

The Disconnect

When you're troubleshooting a production issue at 2 AM, you're operating at such a high level of abstraction that it's easy to lose sight of what's really happening underneath.

"The API is timing out." Why? Database query taking too long. Why? Missing index. Fix it, deploy, customer happy.

But what's actually happening? SQL gets parsed, a query plan is generated, that compiles to instructions, the CPU executes billions of operations fetching data from RAM, and somehow the right bytes end up in the HTTP response.

I could troubleshoot these systems. I could read stack traces, analyze heap dumps, trace distributed requests across microservices. But somewhere deep down, I realized I was pattern-matching more than truly understanding.

It just... works. Magic.

Except it's not magic. It's physics and logic gates and machine code. I used to know this stuff. I had studied it. But years of grep -r "ERROR" and kubectl logs and SQL query optimization had created this weird gap between what I learned in university and what I do every day.

So I decided to build a bridge.

Enter Simple8: My 8-Bit CPU Emulator

The project started simple (hence the name): build an 8-bit CPU emulator from scratch using Python and Test-Driven Development.

No libraries. No shortcuts. Just me, Python, pytest, and the fundamental question: How does a CPU actually execute code?

You can follow along or check out the code here: https://github.com/elchicha/8-bit-cpu-emulator

The specs are modest:

8-bit data path (processes values 0-255)
16-bit address space (64KB of memory)
Inspired by classic CPUs like the 6502, Z80, and 8080

But the learning? That's been immense.

What I've Built So Far

Let me walk you through the journey, component by component.

Registers: Where It All Starts

First stop: registers. These are the CPU's scratchpad - tiny, fast storage right where the action happens.

I implemented two types:

8-bit registers (for data):

class Register8:
    def set(self, value):
        self._value = value & 0xFF  # Wraps at 8-bit boundary

That & 0xFF mask? That's not just a programming trick. That's exactly what happens in real hardware. When you have 8 wires, you can only represent 256 different states. Overflow doesn't crash the system - it wraps around. Just like your car's odometer.

16-bit registers (for addresses):

class Register16:
    def set(self, value):
        self._value = value & 0xFFFF  # 16-bit address space

Here's the first "aha!" moment: 8-bit CPUs aren't limited to 256 bytes of memory. The "8-bit" refers to the data width - how many bits you process at once. But the address bus? That can be wider. The 6502 had an 8-bit data bus but a 16-bit address bus, giving it 64KB of addressable memory.

This is why the original NES could have large games despite being "8-bit."

Memory: The Big Array

Memory is conceptually simple - it's just a giant array of bytes:

class Memory:
    def __init__(self, size: int = 0x10000):  # 64KB
        self._data = bytearray(size)
    
    def write_byte(self, address: int, value: int):
        if not (0 <= address < len(self._data)):
            raise MemoryAccessError(f"Invalid address: 0x{address:04X}")
        self._data[address] = value & 0xFF

But here I made a design choice that taught me something important.

Real hardware would just wrap around if you try to access an invalid address (address 0x10000 becomes 0x0000). But I decided to raise an error instead. Why? Because this is an educational tool. I want to catch bugs, not silently let them happen.

This is the difference between building something for learning vs. building something for accuracy. Both are valid, but they optimize for different things.

The ALU: Where Math Happens

The Arithmetic Logic Unit is the CPU's calculator. It performs operations and - crucially - sets flags that tell the rest of the CPU what happened.

class ALU:
    def add(self, a, b):
        result = a + b
        self.carry_flag = result > 0xFF  # Check BEFORE masking!
        result = result & 0xFF
        self.zero_flag = (result == 0x00)
        self.negative_flag = (result & 0x80) != 0x00
        return result

Those flags - Zero, Carry, Negative - they're what make a CPU "programmable." Without them, you can't have loops or conditionals. They're the foundation of all control flow.

Second "aha!" moment: The carry flag has to be set BEFORE you mask the result to 8 bits. Otherwise, how would you know there was overflow? This is the kind of detail that seems obvious in retrospect but took me actual debugging to figure out.

The CPU: Bringing It All Together

The CPU coordinates everything through the fetch-decode-execute cycle:

def step(self):
    # FETCH: Read instruction at program counter
    opcode = self.memory.read_byte(self.program_counter.get())
    
    # DECODE & EXECUTE: What instruction is this?
    if opcode == 0xA9:    # LDA immediate
        self._execute_lda_immediate()
    elif opcode == 0x8D:  # STA absolute
        self._execute_sta_absolute()
    # ... and so on

This is it. This is how computers run programs. Over and over, billions of times per second:

Fetch the next instruction
Decode what it means
Execute it
Repeat

Every program you've ever run - from Hello World to Chrome - comes down to this loop.

The Instructions: Teaching Silicon to Think

So far, I've implemented 8 instructions:

NOP - Do nothing (useful for timing)
LDA - Load accumulator
STA - Store accumulator to memory
ADD - Add to accumulator
SUB - Subtract from accumulator
JMP - Jump to address
BNE - Branch if not equal (not zero)
BEQ - Branch if equal (zero)

That's it. Just 8 instructions. But here's what blows my mind: with these 8 instructions, you can write loops.

Here's a countdown program in Simple8 assembly:

      LDA #$05      ; Load 5 into accumulator
LOOP: SUB #$01      ; Subtract 1
      STA $0200     ; Store current value
      BNE LOOP      ; Branch back if not zero

This runs! The CPU executes it! A actual loop, running on an emulated CPU I built from scratch!

When I first got this working and watched it count down from 5 to 0, storing each value in memory... I won't lie, I felt like a kid again.

The Test-Driven Journey

I built all of this using Test-Driven Development. Write the test first, watch it fail, then make it pass.

Here's how that looked in practice:

# Started simple
def test_register_initializes_to_zero():
    reg = Register8()
    assert reg.get() == 0

# Built up complexity
def test_alu_sets_zero_flag():
    alu = ALU()
    result = alu.add(0x00, 0x00)
    assert alu.zero_flag == True

# Eventually wrote full programs
def test_countdown_loop():
    # Load program into memory
    # Execute multiple steps
    # Verify loop behavior
    # Check final memory state

28 passing tests later, I have confidence that every component works. More importantly, TDD forced me to think through the behavior before implementing it. What should happen when a register overflows? When you branch past the end of memory? When two negative numbers add together?

These aren't theoretical questions. They're design decisions that real CPU architects had to make in 1975.

What This Taught Me (And Why It Matters)

1. Modern abstractions hide beautiful simplicity

When I'm troubleshooting why a customer's Java application is consuming 32GB of RAM, I'm looking at heap dumps, garbage collection logs, thread dumps. When I'm tracking down why their API gateway is returning 503s, I'm checking load balancers, connection pools, circuit breakers.

There are so many layers. The application framework. The JVM. The operating system. The hypervisor. The container runtime. And somewhere at the bottom of all this: the CPU, executing instructions.

Building Simple8 stripped all that away. At the bottom, it's just:

Load a byte
Do something with it
Store the result
Move to the next instruction

That's it. Everything else - every framework, every library, every distributed system, every enterprise platform I've ever troubleshot - is built on top of this.

And you know what? Understanding this has made me better at my job. When I see a CPU pegged at 100%, I now think about what that actually means: the fetch-decode-execute cycle running as fast as it can. When I see memory pressure, I think about addresses and byte storage. The mental model is clearer.

2. Two's complement finally makes sense

I learned about two's complement in university. I passed the exam. I understood the how.

But building branch instructions made me understand the why. When you have 8 bits and you want to represent both positive and negative offsets, two's complement is elegant:

offset = self.memory.read_byte(pc + 1)
if offset >= 0x80:  # Values 128-255 are negative
    offset = offset - 0x100  # Convert to -128 to -1

This isn't an arbitrary choice. It's the most efficient way to represent signed numbers in binary. Addition and subtraction work the same whether the number is positive or negative. Pure genius.

3. Flags are everything

Without the zero flag, you can't write a loop that counts down to zero. Without the carry flag, you can't detect overflow. Without the negative flag, you can't tell if a result was negative.

These tiny 1-bit values - these flags - are what separate a calculator from a computer. They enable decision-making. They enable programming.

4. Little-endian makes sense (sort of)

The 6502 stores multi-byte addresses in little-endian format (low byte first):

Address 0x1234 is stored as: [0x34, 0x12]
                             low   high

Why? Because when the CPU reads sequentially, it gets the low byte first, which it needs to start calculating the address. It's a hardware optimization.

Is it confusing? Yes. Does it make sense when you're thinking about how circuits actually work? Also yes.

5. TDD works for complex systems

I've used TDD for web applications. But using it to build a CPU emulator? That was next-level.

Every component was built with confidence. Every edge case was caught. Every refactoring was safe. And the tests became documentation - they show exactly how each component should behave.

If you've never tried TDD on a complex project, I can't recommend it enough.

The Challenges (And What I Learned)

Challenge #1: When is a register 8 bits vs 16 bits?

I initially used an 8-bit register for the Program Counter. Then I tried to jump to address 0x0600 and... it wrapped to 0x00.

Lesson: The Program Counter needs to hold addresses, not just data. It needs to be 16 bits. "8-bit CPU" describes the data width, not every register.

Challenge #2: Signed vs unsigned integers

Python doesn't have "unsigned" integers the way C does. Everything is signed. So implementing two's complement for branch offsets required explicit conversion.

Lesson: Different languages make different trade-offs. Python's flexibility comes with the need to be more explicit about data types.

Challenge #3: Order of operations in the ALU

My first ALU implementation checked the carry flag after masking the result to 8 bits. This meant I could never detect overflow.

Lesson: In low-level programming, order matters. Check for overflow before you throw away the information about overflow!

Challenge #4: Relative vs absolute addressing

Jump instructions use absolute 16-bit addresses (3 bytes). Branch instructions use relative 8-bit offsets (2 bytes). And the offset is calculated from the next instruction, not the current one.

Lesson: Every byte matters when you have 64KB total. Compact encoding is essential. The 6502 designers thought through every detail.

What's Next

Simple8 is ongoing. There's so much more I want to build:

More instructions:

Index registers (X and Y)
Increment/Decrement (INX, DEX)
Logical operations (AND, OR, XOR)
Bit shifts and rotations
Stack operations (PUSH, POP)
Compare instruction (CMP)

Better tooling:

An assembler (write assembly, not hex bytes)
A disassembler (show what program is loaded)
A debugger/visualizer (step through execution)
Maybe even a simple program loader

Bigger programs:

Fibonacci sequence
Bubble sort
String operations
Maybe even a tiny game?

Why You Should Try This

Look, I get it. You're busy. You've got production issues to solve. You've got customers waiting. You've got a backlog of tickets that never seems to shrink.

But here's what I'll tell you from my experience - both as someone who taught at a coding bootcamp and as someone who's troubleshot thousands of enterprise issues:

Understanding how computers actually work makes you a better troubleshooter.

When you know that every if statement compiles down to a comparison and a conditional branch...

When you know that every variable is just a memory address...

When you know that "thread blocked" means the CPU literally stopped executing that instruction sequence...

When you know that a "memory leak" is just addresses that never get freed...

You debug differently. You ask better questions. You understand performance bottlenecks. You stop seeing the computer as a black box and start seeing it as a system you can reason about.

I've literally had moments troubleshooting production issues where I thought, "Oh, this is just like when my branch instruction was calculating the offset wrong." The mental models transfer.

And honestly? It's just fun. After a day of dealing with enterprise support tickets and production incidents, there's something therapeutic about building something from first principles. About writing a program that executes other programs. About seeing a loop run on a CPU that you built with your own hands (well, keyboard).

It's the difference between being a mechanic who replaces parts and a mechanic who understands engines.

The Full Circle

I started this post talking about the disconnect between my microelectronics background and my enterprise support career.

Building Simple8 has bridged that gap. I'm using the circuit design intuition I developed in university to understand why CPUs work the way they do. And I'm using the troubleshooting methodology I developed in support to actually build it properly - with tests that verify behavior, with careful debugging when things don't work, with systematic problem-solving.

Plus, the teaching experience from running a bootcamp? That's helped me document this in a way that others can follow.

It's funny how careers work. You think you're moving away from something, and then years later you circle back to it with new perspective and new skills.

The patient debugging I do when a customer's system is misbehaving? That's the same mindset I need when my emulator's flags aren't being set correctly.

The systematic approach to isolating issues in distributed systems? That's exactly how I approached building this - one component at a time, tested in isolation, then integrated.

The ability to explain complex technical concepts to non-technical customers? That's what I'm trying to do here - make CPU architecture accessible.

If you're curious about computer architecture, if you've ever wondered how code really runs, if you want to understand the machine underneath all the abstractions you troubleshoot every day - I encourage you to try building your own CPU emulator.

Start small. 8-bit is a great place to begin. Use TDD. Write tests first. Build it up piece by piece.

And then, when you get that first loop working, when you watch your CPU count down from 5 to 0, when you see the values changing in memory...

You'll feel it. That moment of understanding. That connection between theory and practice. That "holy shit, this is how computers work" moment.

That's worth more than any course or textbook.

Let's Connect

Want to follow along with the project? Check it out on GitHub: https://github.com/elchicha/8-bit-cpu-emulator

Have questions? Built your own emulator? Hit me up on Twitter or LinkedIn. I love talking about this stuff.

And if this post helped you understand something about computers, share it. More developers should understand how these machines actually work.

Until next time, happy coding (or should I say, happy emulating?).

- Luis @ EchetoTech

P.S. - Yes, I know there are already excellent CPU emulators out there. py65 is great. VICE is amazing. But building your own is different. It's the difference between reading about how to swim and jumping in the pool. Both are valuable, but only one gets you wet.

Resources If You Want to Dive Deeper

For understanding the 6502:

http://www.6502.org - The definitive resource
"Programming the 65816" by David Eyes - Goes deep into the architecture
Visual6502 - See a real 6502 simulated transistor by transistor

For building your own:

"Code: The Hidden Language" by Charles Petzold - Best introduction to computer architecture
"But How Do It Know?" by J. Clark Scott - Explains CPUs from NAND gates up
Ben Eater's YouTube series - Build a real 8-bit computer on breadboards

For TDD:

"Test-Driven Development by Example" by Kent Beck - The original
My code on GitHub - See how I applied TDD to this project

Technical Notes

If you want to peek at some key implementation details:

Memory addressing (little-endian):

def read_word(self, address: int) -> int:
    """Read 16-bit word in little-endian format"""
    low_byte = self.read_byte(address)
    high_byte = self.read_byte(address + 1)
    return (high_byte << 8) | low_byte

Branch calculation:

def _execute_bne(self):
    """Branch if not equal (if zero flag is false)"""
    offset = self.memory.read_byte(self.program_counter.get() + 1)
    
    # Convert to signed offset
    if offset >= 0x80:
        offset = offset - 0x100
    
    # Advance PC past instruction first
    self.program_counter.set(self.program_counter.get() + 2)
    
    # Then apply branch if condition is met
    if not self.alu.zero_flag:
        new_pc = self.program_counter.get() + offset
        self.program_counter.set(new_pc)

Flag setting in ALU:

def add(self, a: int, b: int) -> int:
    """Add with flags"""
    result = a + b
    
    # Check carry BEFORE masking
    self.carry_flag = result > 0xFF
    
    # Mask to 8 bits
    result = result & 0xFF
    
    # Set other flags AFTER masking
    self.zero_flag = (result == 0x00)
    self.negative_flag = (result & 0x80) != 0x00
    
    return result

Full code with all tests: https://github.com/elchicha/8-bit-cpu-emulator