Zilog Z80 CPU – Modern, free and open source silicon clone
362 points
15 days ago
| 12 comments
| github.com
| HN
Renaud
15 days ago
[-]
What Tiny Tapeout is doing is amazing. Who would have thought that makers and students could have their own chip design made real for so little money?

The tools look amazing as well. You'll won't design the next Intel CPU on that 130nm process but to think that the Z80 will fit on 0.064 mm2 is just amazing.

It's great that there will still be an alternative to the official chip now that it won't be manufactured any more.

Now I want that gorgeous mauve ceramic package with a gold-plated cover over the chip...

https://twitter.com/l_vanek/status/1783557817133039738/photo...

https://tinytapeout.com/

reply
ashleyn
15 days ago
[-]
130nm process puts them at roughly Pentium III era. Not bad!
reply
skywal_l
15 days ago
[-]
Actually more of a Pentium IV.

    Pentium III: 250 nm to 130 nm [0]
    Pentium VI: 180 nm to 65 nm [1]
Which is indeed amazing.

[0] https://en.wikipedia.org/wiki/Pentium_III

[1] https://en.wikipedia.org/wiki/Pentium_4

reply
rvense
15 days ago
[-]
That's wild. A Pentium III would still be useful in a pinch. How big was a P3 die, though?
reply
Someone
15 days ago
[-]
https://en.wikipedia.org/wiki/Pentium_III#Katmai:

“The Katmai contains 9.5 million transistors, not including the 512 Kbytes L2 cache (which adds 25 million transistors), and has dimensions of 12.3 mm by 10.4 mm (128 mm²). It is fabricated in Intel's P856.5 process, a 250 nm complementary metal–oxide–semiconductor (CMOS) process with five levels of aluminum interconnect”

That’s 2,000 times the area of this 0.064 mm² Z80.

https://en.wikipedia.org/wiki/Pentium_III#Tualatin:

“The third revision, Tualatin (80530), was a trial for Intel's new 130 nm process”

I can’t easily find the die size if that.

reply
unnah
15 days ago
[-]
On that kind of process, you could make a 1024-core Z80 machine, leaving half the area for memory, interconnect and I/O. With suitably smart programming and an embarrassingly parallel problem, it might even be able to beat a Pentium III in performance... although it looks like the single-core Pentium III can run 128-bit SSE instructions at 2 cycles per instruction.
reply
Someone
14 days ago
[-]
Suitably smart programming and a problem that suits the hardware. I doubt there are many of the latter.

A Z80 has a 4-bit ALU (https://en.wikipedia.org/wiki/Zilog_Z80#Microarchitecture), making even integer addition take quite a few cycles (15 for 16-bit addition, reading http://www.z80.info/z80time.txt)

And then there’s the clock speed difference. The first Pentium III ran at 450MHz, the fastest Z80 at 50MHz (https://en.wikipedia.org/wiki/Zilog_eZ80)

I think those two combined already will cost you a factor of around 100 in speed versus that pipelined Z80, much more versus a Z80 proper.

Things get worse if you want to add or subtract 32- or 64-bit integers (another factor of 2 or 4, ballpark)

If you want to do integer multiplication and division of any size and all floating point operations you will have to do those in software, and likely lose whatever speed advantage you might still have.

O, and each core will be limited to 64kB of memory. Those interconnects better be fast and use DMA, so you can keep computing while you shuffle data around.

reply
unnah
14 days ago
[-]
Yes, there are definitely challenges, and there must be a reason no one is seriously selling 1024-core versions of old 8-bit processors... but perhaps on an Intel 130 nm process you could make a faster Z80 than just 50 MHz. Quick googling didn't reveal what process Zilog is currently using for eZ80.
reply
chx
15 days ago
[-]
To save a click

> 160 x 100 um tile + ASIC + demonstration board: The standard price is $300 plus shipping.

> However, Efabless is sponsoring a special early bird offer of $150 (plus shipping), limited to one order per person.

> Each extra tile is $50, and extra analog pins start from $40 per pin.

Unless I am badly mistaken 160 x 100 um is .16 x .1 mm which means the tile is 0.016 mm2 meaning a 0.064 mm2 die takes four slots?

reply
rowanG077
15 days ago
[-]
I could not find the pin capabilities. Is it possible to build an sdram controller or even drive ddr?
reply
mk_stjames
15 days ago
[-]
Yes it is taking up a 2x2 tile on the tinytapeout.

https://app.tinytapeout.com/projects/668

reply
robxorb
15 days ago
[-]
For those wondering, the 6502 and various derivatives are still being manufactured, by one of its original creators [0] - so I'd guess an equivalent development in the world of the Z80's nemesis is unlikely anytime soon.

[0] https://www.westerndesigncenter.com/wdc/chips.php

reply
monocasa
15 days ago
[-]
Interestingly the classic z80 was end of lifed just two weeks ago.

https://hackaday.com/2024/04/19/end-of-life-for-z80-cpu-and-...

reply
Fatnino
14 days ago
[-]
What will the ti-83/4 calculator use now?
reply
UncleSlacky
14 days ago
[-]
reply
ksherlock
15 days ago
[-]
65*C*02, which might fail if your 6502 code depends on illegal opcodes or incidental memory access or BCD math cycle timing or BCD math flags or perhaps decimal mode being set in an interrupt routine.
reply
grumpyprole
15 days ago
[-]
The 65C02 came out in 1983, so I think we've had plenty of time to document and workaround these issues!
reply
giantrobot
15 days ago
[-]
Hey, don't touch my space heater![0]

[0] https://xkcd.com/1172/

reply
polpo
15 days ago
[-]
It could happen at any time to the 65C02. The Z80 was only EOLed a few weeks ago because they couldn’t get wafers from their fab any more. Any chip on an old process is at risk of this.
reply
sitkack
15 days ago
[-]
reply
RetroTechie
15 days ago
[-]
It would be interesting to know Zilog's sale volumes for discrete Z80s (say, over the past decade). What uses they were purchased for, and DIP/PLCC/flatpack ratios.

There must be millions floating out there. But with distributors like Mouser or Farnell gone, for anyone looking to buy some, it's eBay & co which tends to be a crapshoot.

reply
lelanthran
14 days ago
[-]
> It would be interesting to know Zilog's sale volumes for discrete Z80s (say, over the past decade).

Not the past decade, but two decades ago (2005) the z80 was still popular. At work, I was working on a product based on, IIRC, a Rabbit Semiconductor product, which was a module with on-chip ethernet. It was a Z80 running at 40Mhz.

Personally, I also had a little siemens organiser thing, that also was z80 based (not sure of the actual specs). I recall trying to write programs for it and failing (may not have been open; no way to reprogram or download new code to it, maybe).

[EDIT: The organiser was a siemens IC35]

reply
Fatnino
14 days ago
[-]
Aren't ti-83 calculators still sold today using z80?
reply
ndiddy
14 days ago
[-]
Any Z80 based (not eZ80) TI calculators on the market today have the Z80 core built into an ASIC instead of a discrete chip, meaning that they wouldn’t be impacted by parts availability.
reply
zczc
15 days ago
[-]
And eZ80 (binary compatible, but not pin-compatible with Z80) is still being manufactured by Zilog.
reply
belter
15 days ago
[-]
Z80 was the CPU of the ZX Spectrum. Oh the memories...

https://en.wikipedia.org/wiki/ZX_Spectrum

reply
mattl
15 days ago
[-]
So many good machines: the Amstrad CPC range, a whole slew of Sega consoles, the early MSX stuff and of course the Tatung Einstein. 3 inch disk machines of the world unite!
reply
flohofwoe
15 days ago
[-]
All 8-bit computers manufactured in East Germany too (via the reverse enginered Z80 clone U880). For instance the KC85/2..4:

https://floooh.github.io/virtualkc/p010_kc85.html

An "Adrian's Digital Basement" episode about the KC85/3:

https://www.youtube.com/watch?v=At9UNYFHuaE

reply
rwmj
15 days ago
[-]
Not to mention the world of CP/M business machines which was surprisingly large in the UK well into the late 80s, again thanks to Amstrad: https://en.wikipedia.org/wiki/Amstrad_PCW
reply
mchannon
15 days ago
[-]
You forgot about Donkey Kong (and Junior, and 3, and Mario Bros.)

Punch-Out!!

But most noteworthy is Galaga: Ran on 3, count 'em, 3, Z80's.

reply
flohofwoe
15 days ago
[-]
...also Pacman btw (the original arcade machine), and other 80's arcade machines like Pengo or Bomb Jack (notably Bomb Jack was two Z80 computers duct-taped together, the sound was handled by a separate Z80 board which controlled three AY-3-8910 sound chips).
reply
yincrash
14 days ago
[-]
My mobile computer of choice in school, the TI-83 series calculators.
reply
Fatnino
14 days ago
[-]
Which still goes for full price today despite using an EOL chip.
reply
ztetranz
14 days ago
[-]
And of course the TRS-80 and clones such as the Dick Smith System-80 that we had in Australia and New Zealand. Lots of good memories programing with EDTASM. I only had a cassette drive so if my code went wrong I usually had to hit reset and reload EDTASM and my code again from tape.
reply
Andrex
14 days ago
[-]
I thought it was used for the Game Boy, but apparently while there are many similarities they're basically incompatible[0].

0. https://forums.nesdev.org/viewtopic.php?t=18335

reply
LeFantome
14 days ago
[-]
It was also the CPU in my first computer--the Coleco ADAM: https://en.wikipedia.org/wiki/Coleco_Adam

I still have my copy of Programming the Z80 that I got as a kid: https://en.wikipedia.org/wiki/Programming_the_Z80

reply
userbinator
14 days ago
[-]
Also the many unbranded MP3/"MP4" players that very widespread in the mid to late 2000s: https://en.wikipedia.org/wiki/S1_MP3_player
reply
MisterTea
14 days ago
[-]
I think the real joy of these old 8-bit CPU's is the simplicity and how a single person can wire up a computer by hand.

In uni we built an 8088 board in microprocessor class and it was the best class I ever took; it demystified drivers and hardware for me. I attempted a redesign using KiCAD which added an IO expansion port and better layout with an LCD port for a 2x16 character LCD. I had a prototype made by Futurlec but made a massive mistake in footprint assignment that required an interposer. Furthest I got was soldering in the 8284 and the IC sockets then got distracted by life and its sitting in a box still.

Microcontrollers are great, everything in one package but there's something enormously satisfying about being able to design and build a computer by hand. FPGAs sort of bring this back but the tooling is byzantine dog shit.

reply
avnd
13 days ago
[-]
The open source tooling isn’t perfect but it’s growing fast! I work in this space and recommend checking out the OpenROAD[1] project, which has full synthesis and pnr for select FPGAs.

[1] https://theopenroadproject.org/

reply
rsynnott
14 days ago
[-]
Wow. Just looked it up; the Z80 is now _50 years old_.
reply
Koshkin
15 days ago
[-]
I couldn’t help but notice that the circuit layout looks like a uniform gate array rather something resembling a custom layout you usually see in die photos.
reply
flohofwoe
15 days ago
[-]
Because it's a Verilog implementation which is much closer to a software CPU emulator than the real thing (e.g. it has nothing to do with the original Z80 "transistor layout").

For instance here's the LD A,(DE) "instruction payload":

https://github.com/rejunity/z80-open-silicon/blob/974c7711b2...

And here's the same machine cycle in my software emulator:

https://github.com/floooh/chips/blob/bd1ecff58337574bb46eba5...

Both set the address bus to the content of the DE register (and at the same time the MREQ|RD pins need to be set somewhere to indicate a memory read to the outside world, in my emulator this happens in the _mread macro), and in the next clock cycle load the data bus into the A register.

What's interesting though is that the Verilog implementation doesn't seem to update the internal WZ register with DE+1, which makes me wonder if undocumented behaviour is correctly implemented, but maybe updating WZ is handled elsewhere (there are references to the WZ register in other places).

In the end, if it looks and feels like a Z80 from the outside (e.g. the right pins are active at the right time) the internal implementation doesn't matter.

reply
rowanG077
14 days ago
[-]
Just because this is done in verilog doesn't make it emulation. It's probably just machine placed and routed. Almost everything is these days in digital design.
reply
flohofwoe
14 days ago
[-]
But look here, it's even doing a switch-case over the opcode, which is very typical for a software CPU emulator:

https://github.com/rejunity/z80-open-silicon/blob/974c7711b2...

Instruction decoding on a real Z80 CPU doesn't work at all like that :)

A non-emulator-approach would probably use the reverse engineered Z80 netlist from visual6502.org to base the design on, no idea if this is even doable with modern chip design tooling(?)

If anything, the netlist is useful to verify the Verilog implementation (as is mentioned here in the readme: https://github.com/rejunity/z80-open-silicon?tab=readme-ov-f...)

reply
codebje
14 days ago
[-]
Instruction decoding on a real Z80 CPU works pretty much like that, as it happens. There's a big PLA table that takes the IR inputs and a handful of other control signals (like "is 0xED prefixed") and lights up output control signals to say what the instruction to be executed is. See https://static.righto.com/files/z80-pla-table.html for this table laid out nicely.

Verilog isn't imperative code, to be executed one line after another in sequence. It's a description of combinatorial logic to be wired up to inputs and outputs, gated by a clock edge. Everything in the Verilog module "runs" at the same time, there's no jumping to a branch, there is instead logic to wire up one "casez" block or another to the relevant output signals. All the blocks are lit up, only one has its output selected to connect to the output wires.

The PLA block is more convenient to a hardware engineer laying out a CPU by hand. You can see everything together and trace execution easily. Downstream consequences of decode are done elsewhere. Upstream decode of control signals are done elsewhere. The Verilog is more convenient to a hardware engineer relying on tools to route logic: the Verilog does more than the PLA - it does the additional control signal inputs, and it does the downstream consequences like determining which register(s) are used on which register bus. It's laid out more like a software decode of the instruction bits because it's easier to think about groups of opcodes than individual ones.

In execution, though, they wind up doing very similar things.

reply
flohofwoe
14 days ago
[-]
Thanks a lot for the thorough explanation, much appreciated!
reply
flaghacker
14 days ago
[-]
That switch-case gets optimized and compiled down to logic gates by the synthesis tools. It'll be a different set of gates from the original netlist (which might also have used a more regular grid structure for this), but it won't be _that_ different. It's not somehow running this switch-case in software emulation on a different CPU instantiated in this design.
reply
rowanG077
14 days ago
[-]
So what? A re-implementation of a CPU doesn't require the netlist to be equal. That would mean just moving to a new process node or tooling suddenly means your new brand new CPU is "software emulating" the old one just because it might do somethings slightly differently. A frankly ridiculous proposition.
reply
Vogtinator
14 days ago
[-]
It only looks like that if you treat it as a C-like programming language, which it is not. Unless specified otherwise, all the statements are "executed" in parallel by the synthesized logic. There is no emulation.
reply
userbinator
14 days ago
[-]
I wonder how compatible it is with the original Z80, which had many undocumented instructions as well as the infamous "trap gates" (look at the "Oral History Panel on the Founding of the Company and the Development of the Z80 Microprocessor" documented linked on that page) that might've had an effect on certain obscure instruction sequences and designed to identify the difference between it and clones.
reply
kumarski
15 days ago
[-]
Looks dope.

(I was on early efabless.com team) open source EDA.

reply
tcbawo
15 days ago
[-]
I had heard about Z80’s 4-bit ALU (2x for 8-bit math). Is this considered a major bottleneck? Were there later extensions that added higher bit integer math? I’m curious whether an open source version of the chip will enable new features and variants.
reply
flohofwoe
15 days ago
[-]
> Is this considered a major bottleneck?

No, because an ALU instructions with a register as source is already running as fast as possible (at 4 clock cycles, which is the duration of an opcode fetch 'machine cycle'). Or from a different perspective: an 8-bit ALU wouldn't have made math instructions faster, but would have cost twice as many transistors.

The 4-bit ALU is just an internal implementation detail that isn't visible to the outside (except maybe through the existence of the half-carry flag which indicated a carry from the lower into the higher nibble).

And if you want a CPU replacement that plugs directly into old home computers, the CPU needs to have the original instruction timing, otherwise software that depends on 'cycle counting' won't work (probably less of an issue on the ZX Spectrum though because the Speccy didn't have a programmable video hardware like for instance the Amstrad CPC).

The eZ80 is a modernised and more efficient design, with (among other things) a wider ALU: https://en.wikipedia.org/wiki/Zilog_eZ80. Not an option for keeping old home computers alive though, for this you'd want an exact Z80 clone with the original timings and undocumented behaviour.

reply
becurious
14 days ago
[-]
Cycle counting was key on the Spectrum - for obvious things like the tape load routines but also for advanced techniques like the ‘Rainbow processor’ - updating the attribute bytes (those responsible for the infamous color clash) as each scan line progressed you could get different colors on each scan line.
reply
RetroTechie
14 days ago
[-]
Once made a tape-loading like pattern, and tried to get it as stable (not moving up or down on screen) as possible.

Managed to produce a program where with key presses, you could change delay in the loop in +/- 1 clockcycle increments. Mind you: fastest Z80 opcodes take 4 cycles.

How then? Well, there's also opcodes that take 5 cycles. Or 6. Or 7. And 8=2*4, 9=4+5, etc. Program just automated the insertion/removal of those in the inner loop. Of course I had to pick instructions that didn't mess with some Z80 registers.

Great fun (& educational) figuring out stuff like that. Fun times...

reply
becurious
12 days ago
[-]
There was some game (and I think a program in a book) where the border color would be changed at a specific scan line to get a horizon that would span the entire screen.

I pretty much knew all the clock cycle counts for the instructions as a teenager, and you would code assembler with them always in mind.

reply
flohofwoe
14 days ago
[-]
> ...updating the attribute bytes (those responsible for the infamous color clash) as each scan line progressed...

Ah clever! Didn't think of that. Probably the closest thing to "racing the beam" since the Atari 2600 :)

reply
userbinator
14 days ago
[-]
The Netburst P4s also used a half-wide (16-bit) ALU running at 2x the clock frequency (actually by clocking on both edges, like DDR RAM), which meant ALU operations with a carry/borrow between the two halves took an extra cycle: https://www.realworldtech.com/isscc-2001/7/
reply
Retr0id
15 days ago
[-]
Does anyone know what clock speeds we might be able to expect from this?
reply
drmpeg
15 days ago
[-]
reply
swetland
15 days ago
[-]
That's the expected clock rate for the TT07 run... but Tiny Tapeout designs only have 8 in, 8 out, and 8 bidirectional IOs (plus a reset and clock input) available, so they're using a multiplexing strategy where the Z80 clock runs at 1/4 of the base clock rate and alternates between control signals, A0-A7, control signals, and A8-A15 on the OUT pins:

https://github.com/rejunity/z80-open-silicon/blob/68438f0019...

So you'd get an effective 12.5MHz Z80 clock and need a bit of external logic to demultiplex the full IO interface. Still not too shabby!

The goal (per the project README) appears to be to prototype with TT07 and then look into taping out standalone with ChipIgnite in QFN44 and DIP40 packages (which would be able to have the full traditional Z80 bus interface and run at the full clock rate).

reply
tyingq
15 days ago
[-]
Interesting. Saw this on the Wishbone Z80 project notes:

"Guy Hutchison (see TV80 project) has synthesized an early version of the core in a 130nm TSMC process. He determined the design to contain about 20k gates and run at about 240 Mhz. While the speed is somewhat less than "target", optimizations of the logic should increase this somewhat."

Guy Hutchison's TV80 is also mentioned on this project's page.

reply
Dwedit
14 days ago
[-]
If you were designing a new compatible processor for older systems, the limiting factor would be the memory bus. A cache would be necessary to get high speeds.

The cache would need to know about all bank-switching performed by the system, and understand how the memory banks are mapped into the memory space.

Could have:

* Plain read-only memory (you cache this)

* Plain RAM not shared with other devices (you cache this)

* Memory-mapped IO (you don't cache this)

* RAM shared with other devices where the other device does not write there, such as video memory (write-through cache, full read cache)

* RAM shared with other devices where the other device can write there (don't cache this)

reply
Retr0id
14 days ago
[-]
IIRC this is what the people making swap-in accelerators for 6502 chess computers are doing
reply
Someone
14 days ago
[-]
Because it’s easy to ‘cache’ the entire memory of the host system, it’s better described as a new computer that only slows down to access the memory locations that affect I/O (video, audio, keyboard, I/O ports, etc.)

https://www.e-basteln.de/computing/65f02/65f02/:

“The idea is to use this as a “universal” accelerator for 6502 and 65C02-based host computers – just plug it into the CPU socket. The only thing the FPGA board needs to know about its host is the memory map: Where does the host have memory-mapped I/O? Up to 16 different memory maps can be stored in the FPGA, and selected via a mini DIP switch. Upon power-on, the 65F02 grabs the complete RAM and ROM content from the host and copies it into the on-chip RAM, except for the I/O area. Then the CPU gets going, using the internal memory at 100 MHz for all bus accesses except for any I/O addresses – for these, the internal CPU pauses, and an external bus cycle is started at whatever the external clock speed is.”

reply
Dwedit
13 days ago
[-]
Speaking of new computers slowing down to access IO, someone managed to add an ISA slot to new PCs by using pin headers normally used by TPM devices. It's called the "dISAppointment".
reply
mindcrime
14 days ago
[-]
So, um... you're saying I should not have dumped my life's savings into ordering Z80 chips as part of the "last time buy"? :-(
reply
phendrenad2
15 days ago
[-]
I wonder if this will ever be priced competitively with the massive number of used or NOS Z80 chips out in the wild.
reply
ein0p
15 days ago
[-]
eBay says a Chinese Z80 clone is less than $4 with free shipping. This isn’t even going to be competitive with lower end FPGAs. It’s more of a fun “why not” type of project.
reply
londons_explore
14 days ago
[-]
How do those clones work? Are they asics, or microcontrollers/fpga's which do emulation?
reply
tredre3
13 days ago
[-]
They're ASICs. Likely a direct copy of Zilog's design but I'm sure re-implementations also exist.

I've never received a counterfeit Z80 that outright didn't work, but one thing to be keep in mind is that despite often being branded as CMOS 20Mhz (aka Z84C0020), half the time the chips I've received were lower grade or even worse, NMOS.

reply