A deep dive into the 80386's memory pipeline, exploring how Intel achieved ~1.5-clock address translation despite the apparent complexity of segmentation, paging, and virtual memory. Covers descriptor caches, parallel limit checking, the Early Start optimization (and its POPAD bug), TLB fast paths, bus interface design, and the 82385 cache controller. Also discusses how these historical microarchitectural techniques map onto a modern FPGA 386 core implementation running at 75 MHz on DE10-Nano, including tradeoffs around latch-vs-register design, two-phase clocking, and L1 cache implementation.
Table of contents
Microcode for memory accessesEfficient segmentationEarly startPaging fast pathBus interface and cachingPutting it togetherMapping the memory pipeline to an FPGA 386ConclusionSort: