32 bits that changed microprocessor design
136 points
13 days ago
| 11 comments
| spectrum.ieee.org
| HN
zik
13 days ago
[-]
The Bellmac-32 was pretty amazing for its time - yet I note that the article fails to mention the immense debt that it owes to the VAX-11/780 architecture, which preceded it by three years.

The VAX was a 32-bit CPU with a two stage pipeline which introduced modern demand paged virtual memory. It was also the dominant platform for C and Unix by the time the Bellmac-32 was released.

The Bellmac-32 was a 32-bit CPU with a two stage pipeline and demand paged virtual memory very like the VAX's, which ran C and Unix. It's no mystery where it was getting a lot of its inspiration. I think the article makes it sound like these features were more original than they were.

Where the Bellmac-32 was impressive is in their success in implementing the latest features in CMOS, when the VAX was languishing in the supermini world of discrete logic. Ultimately the Bellmax-32 was a step in the right direction, and the VAX line ended up adopting LSI too slowly and became obsolete.

reply
rst
13 days ago
[-]
You might want to be more specific by what you mean by "modern", because there were certainly machines with demand-paged virtual memory before the VAX. It was introduced on the Manchester Atlas in 1962; manufacturers that shipped the feature included IBM (on the 360/67 and all but the earliest machines in the 370 line), Honeywell (6180), and, well... DEC (later PDP-10 models, preceding the VAX).
reply
PaulHoule
12 days ago
[-]
My impression of the VAX is, regardless of whether it was absolutely first at anything, it was early to have 32-bit addresses, 32-bit registers and virtual memory as we know it. You could say machines like 68k, the 80386, SPARC, ARM and such all derived from it.

There were just a lot of them. My high school had a VAX-11/730 which was a small machine you don't hear much about today. It replaced the PDP-8 that my high school had when I was in elementary school and visiting to use that machine. Using the VAX was a lot like using a Unix machine although the OS was VMS.

In southern NH in the late 1970s through mid 1980s I saw tons of DEC minicomputers, not least because Digital was based in Massachusetts next door and was selling lots to the education market. I probably saw 10 DECs for every IBM, Prime or other mini or micro.

reply
rst
11 days ago
[-]
In all those respects, the VAX was just following on to the IBM 360/67 and its S/370 successors -- they all had a register file of 32-bit general purpose registers which could be used to index byte-addressed virtual memory. It wasn't exactly an IBM knockoff -- there were a bunch of those, too (e.g., Amdahl's) -- but the influence is extremely clear.
reply
mjevans
13 days ago
[-]
Period might be the best word. Contemporary is also a contender I thought of first, before disqualifying it for implying 'modern'.
reply
pinewurst
13 days ago
[-]
Also Prime as well in the 70s pre-VAX.
reply
TheOtherHobbes
12 days ago
[-]
The article says the Bellmac-32 was single-cycle CISC. The VAX was very CISC and very definitely not single cycle.

It would have been good to know more about why the chip failed. There's a mention of NCR, who had their own NCR/32 chips, which leaned more to emulations of the System/370. So perhaps it was orders from management and not so much a technical failure.

reply
kimi
12 days ago
[-]
I don't think it was single-cycle, someone mentions a STRCPY instruction that would be quite hard to do single-cycle....
reply
Tuna-Fish
11 days ago
[-]
Single-cycle doesn't mean that everything is single cycle, but that the simple basic instructions are. As a rule of thumb, if you can add two registers together in a single cycle, it's a single-cycle architecture.
reply
larsbrinkhoff
12 days ago
[-]
> introduced modern demand paged virtual memory

Didn't Multics, Project Genie, and TENEX have demand paging long before the VAX?

reply
zik
4 hours ago
[-]
I should have said "supermini". While mainframes had tried a variety of virtual memory schemes, the VAX was the first supermini to adopt the style of demand paged flat address space virtual memory which pretty much set the style for all CPUs since then. A lot of VAX features, like the protection rings etc., were copied to the 80386 and its successors.
reply
vintermann
13 days ago
[-]
There was also the Nord-5, which beat the VAX by another couple of years as a 32-bit minicomputer.
reply
Instantix
12 days ago
[-]
Yeah, 1972 - "Nord-5 was Norsk Data's first 32-bit machine and was claimed to be the first 32-bit minicomputer". The Wikipedia record: https://en.wikipedia.org/wiki/Nord-5
reply
pinewurst
12 days ago
[-]
Also Interdata with the 7/32 and 8/32.
reply
macshome
13 days ago
[-]
The more that we find out about Bell Labs the more we all realize how much of our world they built.

We really could use a place like that today.

reply
em3rgent0rdr
13 days ago
[-]
There is a place like that today. It is called "Nokia Bell Labs".
reply
justin66
12 days ago
[-]
This comment was illuminating. It would indeed have been more accurate to say "we could really use people like that today." Where they're located is a secondary concern.
reply
jll29
12 days ago
[-]
This is a valid point.

Are there still regularly ground-breaking innovations (which ones e.g. in the last decade) coming out of the same lab today, whatever its owner or name?

reply
rollcat
12 days ago
[-]
It's more difficult to innovate today on a similar scale, or with similar impact. It also seems that big budgets don't really help.

Graphene chips are an insanely exciting (hypothetical) technology. A friend of mine predicted in 2010 that these chips will dominate the market in 5 years' time. As of 2025 we can barely make the semiconductors.

Apple makes chips that have both excellent performance per watt, and overall great performance, but they make small generational jumps.

On the other hand, startups, or otherwise small-but-brilliant teams can still produce cool new stuff. The KDE team built KHTML, which was later forked into Webkit by three guys at Apple.

Paxos was founded on theoretical work of three guys.

Brin & Page made Google. In the era of big & expensive iron, the innovation was to use cheap, generic, but distributed compute power, and compensate for hardware failures in software. This resulted in cheaper, more reliable, and more performant solutions.

But yeah, most of the "moonshot factories" just failed to deliver anything interesting. Maybe you need constraints, market pressure?

reply
dimator
12 days ago
[-]
> But yeah, most of the "moonshot factories" just failed to deliver anything interesting. Maybe you need constraints, market pressure?

It's funny you say that because to me it seems like bell labs was the exact opposite. Because of the antitrust ruling, there was a cap on profits, so many monies were instead funneled into green field R&D. The facility was run by people who knew how to manage a large group of capable people: put them in close proximity with other cross discipline stars, get out of their way, let their imaginations dictate the path.

reply
rollcat
11 days ago
[-]
I can't quite put my finger on these changes in the landscape. Shuji Nakamura invented blue LEDs in his garage. Meanwhile, graphene. I just don't think there's a universal rule.
reply
Findecanor
12 days ago
[-]
I read through its instruction set manual, and found an instruction with unusual behaviour: Its 'ASL` (arithmetic shift left) instruction.

It shifts all bits except for the sign bit, leaving it unchanged.

I have read many ISA's manuals and not seen this elsewhere. Most ISAs don't have separate arithmetic and logic left shift instructions. On M68K, which does, the difference between `ASL` and `LSL' is only that the former sets the Overflow flag if any of the bits shifted out is different from the resulting sign bit whereas the latter clears it.

reply
RetroTechie
12 days ago
[-]
Z80 has a Shift Right Arithmetic (SRA) instruction. From Zilog Z80 User Manual:

"An arithmetic shift right 1 bit position is performed on the contents of operand m. The contents of bit 0 are copied to the Carry flag and the previous contents of bit 7 remain unchanged. Bit 0 is the least-significant bit."

So if value was used as signed integer, that's a sign-preserving /2 (and easy to expand to 16 or more bits).

Z80 also has a SLA, which does shift all bits.

reply
roelschroeven
12 days ago
[-]
For right shifts, yes, instruction sets commonly have the distinction between arithmetic and logic.

Findecanor was talking about left shifts though.

reply
Joker_vD
12 days ago
[-]
> So if value was used as signed integer, that's a sign-preserving /2

Only if you define your integer division as rounding towards minus infinity which almost none of the languages do (they usually round towards zero). See e.g. [0] for the additional instructions needed to correct the result.

Now, I personally think this is a mistake and my PLs always round integers down instead of towards zero, but others may disagree.

[0] https://godbolt.org/z/1z7eYPT48

reply
Findecanor
12 days ago
[-]
It's a bit inconsistent that most ISAs out there have division that rounds towards zero but only right shift that rounds down.

PowerPC is the only production-ISA I've found that has an arithmetic right-shift instruction designed for rounding towards zero. It sets the Carry flag if a 1 is shifted out and the result is negative. Then only an "add with carry" instruction is needed to adjust the result.

reply
Joker_vD
12 days ago
[-]
From what I can tell, it literally is a holdover from the FORTRAN days when IBM machines used a sign-magnitude integers, so truncating division was an obvious choice for hardware dividers; and then nobody dared to change the semantics of neither the hardware nor the programming languages for some reason, and it carried all the way forwards to nowadays.

I am fairly certain that restoring/non-restoring unsigned binary division algorithms can be made to do signed division that would round down with minimal change that is not "divide the absolute values and then fix the signs", and for algorithms used in high-speed division hardware the choice of rounding doesn't really matter.

reply
jandrese
13 days ago
[-]
The article handwaves over why the chip wasn't a success, which makes my first thought of "how much did each chip cost" all the more relevant. This is such an uplifting story until you think about how the 8086 is just about to wipe it off of the map.
reply
adrian_b
12 days ago
[-]
During their early success years, Intel has made extremely few innovations. One of their few innovative products was the floating-point coprocessor 8087, which has changed completely how everybody does floating-point computations, and for which they were smart to hire external expertise (i.e. William Kahan) which has brought most of the innovative features.

On the other hand, during those years Intel has been extremely good at adopting very quickly any important innovation made by a competitor, while also succeeding to obtain better manufacturing yields, so that they were able to have greater profits, even with cheaper products.

Bellmac-32 has not been important commercially, but without it a product like Intel 80386 would have appeared only some years later.

With 80386, Intel has switched their production of CPUs from NMOS to CMOS, like also Motorola had done one year earlier with 68020. Both Intel and Motorola have drawn heavily from the experience gained by the industry with Bellmac-32.

reply
marcosdumay
12 days ago
[-]
> while also succeeding to obtain better manufacturing yields

AKA, Intel was extremely innovative in manufacturing. Turns out that because of Moore's law, that was the only dimension that mattered.

reply
adrian_b
12 days ago
[-]
I agree.
reply
dboreham
13 days ago
[-]
AT&T/Western Electric didn't really sell chips. They were a systems company. I actually used a machine with their VME boards (which iirc the company I worked for had been contracted to manufacture and market). Even to us the idea that Western Electric made CPUs seemed surprising. It was cool to be running System V on a desktop machine back then though.
reply
larsbrinkhoff
12 days ago
[-]
The Motorola 68000 was already available before the Bellmac-32. The 68000 programming model was mostly 32-bit, although internally it was limited to 16 bit data and 24 bit addresses. This was fixed by later models in the family.
reply
TMWNN
12 days ago
[-]
> The article handwaves over why the chip wasn't a success, which makes my first thought of "how much did each chip cost" all the more relevant.

That's being polite. Attributing the chip's failure to AT&T buying NCR is ridiculous; that happened in 1991.

Here's a rundown of what actually happened:

* After the divestiture, AT&T from 1984 is finally allowed to build and sell computers. (This is also why Unix was not a commercial product from AT&T until then.) Everyone, in and outside AT&T, thinks Ma Bell is immediately going to be an IBM-level player, armed with Bell Labs research and Western Electric engineering. One of many, many such articles that conveys what everyone then expects/anticipates/fears: <https://archive.org/details/microsystems_84_06/page/n121/mod...> If there is anyone that can turn Unix into the robust mainstream operating system (a real market opportunity, given that IBM is still playing with the toy DOS, and DEC and other minicomputer companies are still in denial about the PC's potential), it's AT&T.

* AT&T immediately rolls out a series of superminicomputers (the 3B series) based on existing products Western Digital has made for years for AT&T use (and using the Bellmarc CPU) and, at at the lower end, the 6300 (Olivetti-built PC clone) and UNIX PC (Convergent-built Unix workstation). All are gigantic duds because, despite superb engineering and field-tested products, AT&T has never had to compete with anyone to sell anything before.

* After further fumbling, AT&T buys NCR to jumpstart itself into the industry. It gives up five years later and NCR becomes independent again.

* The end.

>This is such an uplifting story until you think about how the 8086 is just about to wipe it off of the map.

People today have this idea that Intel was this dominant semiconductor company in the 1980s, and that's why IBM chose it as the CPU supplier for the PC. Not at all. Intel was then no more than one of many competing vendors, with nothing in particular differentiating it from Motorola, Zilog, MOS, Western Digital, Fairchild, etc.

The 8088's chief virtue was that it was readily available at a reasonable price; had the PC launched a little later IBM probably would have gone with the 68000, which Intel engineers agreed with everyone else was far superior to the 8086/8088 and 80286. Binary compatibility with them was not even in the initial plan for the 80386, so loathed by everyone (including, again, Intel's own people) was their segmented memory model (and things like the broken A20 line); only during its design, as the PC installed base grew like crazy, did Intel realize that customers wanted to keep running their software. That's why 80386 supports both segmented memory (for backward compatibility with the virtual 8086) and flat. And that flat memory model wasn't put in for OS/2, or Windows NT; it was put in for Unix.

reply
crb3
12 days ago
[-]
> The 8088's chief virtue was that it was readily available at a reasonable price;

That, and it had a compatible suite of peripheral chips, while the M68K didn't... Something I vaguely recall an Intel FAE gloating about soon after: "And we're going to keep it that way."

reply
pinewurst
12 days ago
[-]
I think you mean ‘Western Electric’ rather than ‘Western Digital’.
reply
TMWNN
12 days ago
[-]
I meant Western Design Center; Western Electric, as I alluded to, wasn't producing CPUs for anyone outside AT&T c. 1980. But WDC isn't appropriate as an Intel peer, either, because back then it was just two people in a kitchen in Arizona (and still was, when Apple went to it for the 65816).

Better would have been AMD, Nat Semi, or TI.

reply
willmarquis
13 days ago
[-]
Bellmac-32 went 32-bit CMOS when everyone else was still twiddling 8-bit NMOS, then got shelved before the afterparty. IEEE giving it a milestone in 2025 is basically a lifetime achievement trophy for the domino-logic DNA inside every phone SoC today late, but deserved
reply
rkagerer
12 days ago
[-]
> With no CAD tools available for full-chip verification ... the team resorted to printing oversize Calcomp plots. The schematics showed how the transistors, circuit lines, and interconnects should be arranged inside the chip to provide the desired outputs. The team assembled them on the floor with adhesive tape to create a massive square map more than 6 meters on a side. Kang and his colleagues traced every circuit by hand with colored pencils, searching for breaks, overlaps, or mishandled interconnects.
reply
zahlman
12 days ago
[-]
> Why Bellmac-32 didn’t go mainstream

This is the main thing I wanted to know, the last section heading in the article, and not explained by the remaining text. AT&T choosing someone different is a lame excuse (others could have bought in, like how Apple got ideas from Xerox PARC), and the rest is padded out with a restatement of how the Bellmac-32's ideas shaped future chip development.

reply
didgetmaster
12 days ago
[-]
My very first computer was an AT&T 6300 that I bought in 1986. It came with an Intel 8086 processor.
reply
joezydeco
13 days ago
[-]
If you were a CS student at UIUC in the late 80s your sophomore weed-out class in C and assembly language coding was on this processor. It was a lot more fun to write for this core compared to Intel.

And it was the only processor I ever used that had a STRCPY opcode.

reply
zackmorris
12 days ago
[-]
I was at UIUC from 1995-99 for ECE and studied Computer Organization and Design 2nd edition by Patterson and Hennessy:

https://openlibrary.org/books/OL670149M/Computer_Organizatio...

It mainly covered MIPS but most of the concepts were about as minimal as possible. As in, it would be hard to beat the amount of computation per number of pipeline stages.

Back then, 4 stages was considered pretty ideal for branch prediction, since misses weren't too expensive. I believe the PowerPC had 4 pipeline stages, but Pentium got up to 20-30 and went to (IMHO) somewhat pathological lengths to make that work, with too much microcode, branch prediction logic and cache.

Unfortunately that trend continued and most chips today are so overengineered that they'd be almost unrecognizable to 90s designers. The downside of that being that per-thread performance is only maybe 3 times higher today than 30 years ago, but transistor counts have gone from ~1 million to ~100 billion, so CPUs are about 100,000 times or 5 orders of magnitude less performant than might be expected by Moore's law at a 100x speed increase per decade. Bus speeds went from say 33-66 MHz to 2-4 GHz which is great, but memory was widely considered far underpowered back then. It could have ramped up faster but that wasn't a priority for business software, so gaming and video cards had to lead the way like usual.

I always dreamed of making an under $1000 MIPS CPU with perhaps 256-1024 cores running at 1 GHz with local memories and automatic caching using content-addressable hash trees or something similar instead of associativity. That way it could run distributed on the web. A computer like this could scale to millions or billions of cores effortlessly and be programmed with ordinary languages like Erlang/Go or GNU Octave/MATLAB instead of needing proprietary/esoteric languages like OpenCL, CUDA, shaders, etc. More like MIMD and transputers, but those routes were abandoned decades ago.

Basically any kid could build an AI by modeling a neuron with the power of ENIAC and running it at scale with a simple genetic algorithm to evolve the neural network topology. I wanted to win the internet lottery and do that, but the Dot Bomb, wealth inequality, politics, etc conspired to put us on this alternate timeline where it feels like the Enterprise C just flew out of a temporal anomaly. And instead of battling Klingons, groupthink has us battling the Borg.

reply
rasz
13 days ago
[-]
>STRCPY opcode.

whats wrong with rep movsb?

reply
saagarjha
12 days ago
[-]
You'd need to run strlen first.
reply
cesarb
12 days ago
[-]
> You'd need to run strlen first.

I might be a bit rusty in my x86 assembly, but wouldn't "repnz movsb" be the x86 strcpy opcode, for zero-terminated strings?

reply
saagarjha
11 days ago
[-]
Not a thing unfortunately; you can only use the rep prefix with movable.
reply
muziq
12 days ago
[-]
movsb doesn’t set any flags, so no, it isn’t..
reply
joezydeco
12 days ago
[-]
Real men just blast through memory until hitting a zero. If it's there at all.
reply
rasz
12 days ago
[-]
If you care at least a little about buffer overflows You will run one anyway.
reply
saagarjha
11 days ago
[-]
If you want a strlen+memcpy you are more than welcome to call it yourself
reply
kazinator
12 days ago
[-]
The Wikipedia page says "[the Bellmac 32] was designed with the C programming language in mind".

This is the real first, more significant than anything else. Hardware design catering to the C language started here.

reply
zombot
10 days ago
[-]
> “Ma Bell” had dominated American voice communications, with its Western Electric subsidiary manufacturing nearly every telephone found in U.S. homes and offices.

When others do it, they call it communism. When they do it themselves, they call it good business.

reply