80386 Protection

I'm building an 80386-compatible core in SystemVerilog and blogging the process. In the previous post, we looked at how the 386 reuses one barrel shifter for all shift and rotate instructions. This time we move from real mode to protected and talk about protection.

The 80286 introduced "Protected Mode" in 1982. It was not popular. The mode was difficult to use, lacked paging, and offered no way to return to real mode without a hardware reset. The 80386, arriving three years later, made protection usable -- adding paging, a flat 32-bit address space, per-page User/Supervisor control, and Virtual 8086 mode so that DOS programs could run inside a protected multitasking system. These features made possible Windows 3.0, OS/2, and early Linux.

The x86 protection model is notoriously complex, with four privilege rings, segmentation, paging, call gates, task switches, and virtual 8086 mode. What's interesting from a hardware perspective is how the 386 manages this complexity on a 275,000-transistor budget. The 386 employs a variety of techniques to implement protection: a dedicated PLA for protection checking, a hardware state machine for page table walks, segment and paging caches, and microcode for everything else.

Read more →

80386 Barrel Shifter

I’m currently building an 80386-compatible core in SystemVerilog, driven by the original Intel microcode extracted from real 386 silicon. Real mode is now operational in simulation, with more than 10,000 single-instruction test cases passing successfully, and work on protected-mode features is in progress. In the course of this work, corners of the 386 microcode and silicon have been examined in detail; this series documents the resulting findings.

In the previous post, we looked at multiplication and division -- iterative algorithms that process one bit per cycle. Shifts and rotates are a different story: the 386 has a dedicated barrel shifter that completes an arbitrary multi-bit shift in a single cycle. What's interesting is how the microcode makes one piece of hardware serve all shift and rotate variants -- and how the complex rotate-through-carry instructions are handled.

Read more →

80386 Multiplication and Division

When Intel released the 80386 in October 1985, it marked a watershed moment for personal computing. The 386 was the first 32-bit x86 processor, increasing the register width from 16 to 32 bits and vastly expanding the address space compared to its predecessors. This wasn't just an incremental upgrade—it was the foundation that would carry the PC architecture for decades to come.

Read more →

New Site Design

Happy New Year! The site has a fresh new design. I’ve replaced Hugo with a minimal, custom static site generator that suits my needs much better.

Read more →

z8086: Rebuilding the 8086 from Original Microcode

After 486Tang, I wanted to go back to where x86 started. The result is z8086: a 8086/8088 core that runs the original Intel microcode. Instead of hand‑coding hundreds of instructions, the core loads the recovered 512x21 ROM and recreates the micro‑architecture the ROM expects.

z8086 is compact and FPGA‑friendly: it runs on a single clock domain, avoids vendor-specific primitives, and offers a simple external bus interface. Version 0.1 is about 2000 lines of SystemVerilog, and on a Gowin GW5A device, it uses around 2500 LUTs with a maximum clock speed of 60 MHz. The core passes all ISA test vectors, boots small programs, and can directly control peripherals like an SPI display. While it doesn’t boot DOS yet, it’s getting close.

Read more →

8086 Microcode Browser

Since releasing 486Tang, I’ve been working on recreating the 8086 with a design that stays as faithful as possible to the original chip. That exploration naturally led me deep into the original 8086 microcode — extracted and disassembled by Andrew Jenner in 2020.

Like all microcoded CPUs, the 8086 hides a lot of subtle behavior below the assembly layer. While studying it I kept extensive notes, and those eventually evolved into something more useful: an interactive browser for the entire 8086 microcode ROM.

Read more →

486Tang - 486 on a credit-card-sized FPGA board

Yesterday I released 486Tang v0.1 on GitHub. It’s a port of the ao486 MiSTer PC core to the Sipeed Tang Console 138K FPGA. I’ve been trying to get an x86 core running on the Tang for a while. As far as I know, this is the first time ao486 has been ported to a non-Altera FPGA. Here’s a short write‑up of the project.

Read more →

MCU for Better FPGA Gaming on Tang Console

A year ago, I added a softcore CPU to SNESTang, to make FPGA gaming cores easier to use. Over the past months, this allowed me to implement features like an improved menu system and core switching. While the softcore served its purpose, its limitations—slow performance, inability to handle complex peripherals like USB, and FPGA resource consumption—became apparent. Now is again the time to introduce some changes. After extensive collaboration with the Sipeed team, we've finally found a way to tap the Tang boards' onboard MCU (a Bouffalo BL616 chip) to address these challenges. The result is TangCore 0.6, along with all four gaming cores. In this post, I'll discuss integrating the MCU with the Tang gaming cores.

Read more →

Script to Add a Title Page to PDFs

I've recently found myself frequently using the "ChatGPT to PDF" Chrome extension to convert ChatGPT conversations into PDF documents. The Deep Research discussions in particular contain valuable info worth preserving in ebook format. However they lack proper title pages. So here's a quick Python script to add a simple title page to PDF documents.

Read more →

UART in Verilog with Fractional Clock Dividers

Universal Asynchronous Receiver-Transmitter (UART) modules are basic components in embedded systems, enabling serial communication between devices. While there are many free implementations available online, a new challenge arose during my work on the independent software stack for the Tang Console: non-integer clock multiples. This issue surfaced when FPGA cores running on clocks of different frequencies need to communicate with an MCU via UART. Unlike SPI, where the master dictates the clock, UART demands both sides to adhere to a pre-agreed-upon baud rate (1Mbps in my case). Traditional integer clock dividers in this case yield imprecise baud rates and communication errors. In this post, I’ll explore a nice solution using a fractional clock divider technique.

Read more →