z386: An Open-Source 80386 Built Around Original Microcode
This is the fifth installment of the 80386 series. The FPGA CPU is now far enough along to run real software, and this post is about how it works. z386 is a 386-class CPU built around the original Intel microcode, in the same spirit as z8086.
The core is not an instruction-by-instruction emulator in RTL. The goal is to recreate enough of the original machine that the recovered 386 control ROM can drive it. Today z386 boots DOS 6 and DOS 7, runs protected-mode programs like DOS/4GW and DOS/32A, and plays games like Doom and Cannon Fodder. Here are some rough numbers against ao486:
| Metric | z386 | ao486 |
|---|---|---|
| Lines of code (cloc) | 8K | 17.6K |
| ALUTs | 18K | 21K |
| Registers | 5K | 6.5K |
| BRAM | 116K | 131K |
| FPGA clock | 85MHz | 90MHz |
| 3DBench FPS | 34 | 43 |
| Doom (original) FPS, max details | 16.5 | 21.0 |
In current builds, z386 performs like a fast (~70MHz) cached 386-class machine, or a low-end 486. It runs at a much higher clock than historical 386 CPUs, but with somewhat worse CPI (cycles per instruction). The current cache is a 16 KB, 4-way set-associative unified L1, chosen partly to keep the clock high. Real high-end 386 systems often used larger external caches, typically in the 32 KB to 128 KB range.
Much of this 386 microarchitecture archaeology has already been covered in the previous four posts: the multiplication/division datapath, the barrel shifter, protection and paging, and the memory pipeline. z386 tries to be both an educational reconstruction and a usable FPGA CPU. It keeps many 386-like structures: a 32-entry paging TLB, a barrel shifter shaped like the original, ROM/PLA-style decoding, the Protection PLA model, and most importantly the 37-bit-wide, 2,560-entry microcode ROM. At the same time, it uses FPGA-friendly shortcuts where they make sense, such as DSP blocks for multiplication and the small fast L1 cache.
In this post, I will fill in the rest of the design: instruction prefetch, decode, the microcode sequencer, cache design, testing, how z386 differs from ao486, and some lessons from the bring-up.