Yet the article goes about the most ass backward way of explaining 8086 segments and constructs a convoluted mental picture of dividing memory into overlapping chunks.
It's really, really simple: segments on the 8086/88 are 64k sliding windows into an 1M address space. You can move them around at 16 byte granularity.
You need more than 64k for code + data? No problem, the CPU knows when it's fetching an instruction vs when it's fetching data, you can have two sliding windows: code (CS) and data (DS). Split them apart, and it's not much different than a Harvard-style machine and gives you access to more than 64k at a time.
Still need more? No problem, the CPU has a hardware stack with dedicated push/pop/call/ret instructions and a base pointer for stack indexing. It knows when it's accessing the stack, so we can split the data window into regular data (DS) and stack data (SS). Oh, you occasionally want to copy stuff between segments or somewhere else in memory? Well, to encode 3 segments we need 2 bits anyway, let's throw in an extra data window (ES) and some DS-to-ES copy instructions.
Backward compatibility was a breath of fresh air at a time were code needed constant porting and rewriting. No two machines were alike.
It's one of the reasons the PC became so popular.
By having no problem I mean we know enough about writing an optimizer to write such a thing. I don't think any compiler does, just that they could.
Most people wrote assembler particularly if they wanted to use more then 64k.
Most non assembly programs were interpreted oddly enough and most such interpreters were also mostly 64k.
In the era a machine with "object addressing" sounded like a perfectly valid futuristic design (what a Lisp machine strived to be; I guess today you would call it tagged memory of some kind). The 8086 is not that, but the original design would have allowed to evolve it into something like that.
The article's point is that since programmers simply treated it as a sliding window (instead of an opaque object handle), the plan could not be implemented, and the half-assed thing became stuck.
Having seen other Intel RISC designs, I fully agree with the premise.
Using a bare segment register as a pointer was quite common. That’s what the DOS memory allocation call would return.
Segmentation meant programs could remain essentially 16 bit with all the benefits to that like smaller code size.
The PC was intended to be cheap and was competing with 8 bit machines. Being 16/20 bit made it already high end.
If you wanted 24 or 32 bit, IBM had many other machines to sell you. Or you could just buy a VAX.
More importantly, there’s backwards compatibility. By the time the 8086 came out, people had spent serious money on getting binary-only software (WordStar cost hundreds of dollars, for example). “Buy this computer, and you can keep running the software you paid for, but faster” was a good selling point.
Very few benefits to a clunky 24 or 20 bit word size; now it costs more and is a bizarre boutique architecture in a world where it needs to compete with 8 bit Z80s and 6502s.
When I think back I think it would be fun to have a hierarchical structure where composite data structures (think an array or hash map) are referred to with a pointer that goes into the segment register and you index inside a data structure with a regular pointer.
This code was a nightmare to port to protected mode 80286 so it went away by the Windows 3.1 era.
This! Thats one of the most interesting things to me: Actually very often in the IT-world, the worst competitor won the race while better solutions were known and available: Microsoft, Intel etc.
Esp. that MS won for decades while making mainly a very bad OS, though they have some good enterprise products.
How would the world look, if Unix/BSD would have won this race?
Macs also existed but were expensive. The PC with DOS was both powerful and cheap.
I wouldn't say Unix itself is the best. It suffered a war between competing implementations pushing their own proprietary components. POSIX is the compromise.
But, ultimately, what is good about it today is not so much "Unix" (the proprietary OS from Bell Labs and its heritage), but specifically Linux and the BSDs. Why? Because they are actually open. They are freedom incarnate. You can add anything you like to them, today, without asking any permission. Not just their kernels, but their userlands too (Linux obviously varies by distro here). There's even a chance you can get your changes adopted upstream (unless it's GNOME), much more than you'd ever get from a proprietary company's OS.
So, while there's always room for improvement on the technical aspects of the OS, the social and political aspects of Linux and the BSDs make them the best we can achieve as a society.
What do you mean, which are the two? Sure, Windows is crappy by Linus and MacOS? They are both awesome.
I agree that Windows is crappy, but that doesn't mean that Linux and MacOS aren't also crappy in their own ways (not to mention iOS, Android).
I do not understand how can one crash their own product/baby that way - and no, this must be Hanlors Razors: They are doing it with (sophisticated) intend, not by accident (or coincidence)
Adding more features to OS is for some use cases a benefit, for other it's a barrier. For one it might be less work to get what you want ,for other it might be more code between you and hardware that just slows it down
Unix-like simplicity is exactly that, for some use cases directness is a benefit, for others it means extra work to do on top to get what you want.
If you just want a house, getting a raw foundation to work with is a lot to build on top, you have to bring the rest of the walls up yourself.
But if you want exactly the house you want, getting entirely different house to start with and changing it is far more work than starting from simple foundation and building up.
Overall unix "here is relatively simple operating system that doesn't force you but needs some things to be built on top to hit your use case" probably IS the best abstraction, despise not being "best" at really anything. There is reason we build houses from concrete and wood, and not carbon fiber and titanium alloys
We just have the illusion of a "flat" memory model, but it's not really flat, the CPU and the operating system does an important job in translating our flat memory model in something that is not flat at all. All that address translation work could have been avoided if we accepted to not have a flat memory model and be aware that our memory is divided in pages.
Basically we are doing in hardware the job of managing a non flat memory space that the programmer, or well, the compiler (or these days you would say the AI agent) could probably to better because it knows how to allocate things to avoid being them on page boundaries, and all of this to give the illusion to the programmer that it's working with a flat memory (except when it does something wrong and gets a segmentation fault, that, as the name suggests, is an hint that at the end the memory is not really flat).
The above is very similar to the argument that you should use a garbage collected langauge.
Back in 1980 most programs were being written in interpreted languages that did all the hard work of memory for you - just like today.
As an aside, the memory model is flat, it's just not physically linear when implementing virtual memory addressing.
Looking back, the simplicity of the instruction set seems quaint next to the thousands of instructions we have today.
(And I completely agree.)
64K of actual text content in a single node could be reached in some documents, but it's not that small, more than a chapter of a typical book.
What was always a problem for segmented memory was graphics, at least if you wanted higher resolution than 320x200 at 256 colors. But you could have a segment pointer to each row of pixels instead of an entire image, as long as it would still fit within 1 MB (16 MB in the 286 protected mode).
286 could then use the next 4 bits from the segment register to allow 16 MB address space and 386 could use all of them for 4GB. And wouldn't it be nice if 386 had 64KB pages (1 segment)?
The 68000 was a complete break so it opted for relocatable code (which also needed more registers, and in fact the 68k had 16 instead of 8).
I get why hey didn't. Someone might want to run two processes each with its own segment, but the whole machine might only have 64k in total.
The original 20 bit vision of the 8086 was when memory was very expensive and they expected typical high end machines to have 128K of memory.
Intel’s assembler was designed so you could have up to 128K of code with a “shared” segment in the middle that either side could reach with near (16 bit only) pointers to call commonly shared routines, and more rarely executed code existed on either end.
In addition data could be its own segment, and/or memory mapped I/O outside of the 128K space.
But memory got so cheap that nobody bothered with this, and the performance gains of writing code that way wasn’t worth the effort. X86 code was compact enough most programs could cram their code into 64k anyway, or 64k per functional unit with calls between them being rare.
The real tragedy is they went for 20 bit instead of 24 bit. 8086 with 16MB of addressable space would have been a very different world and would have made little difference if there use. (Paragraphs would have been 256 bytes, the same size as a page; most data structures would have been fine with that.)
I did use an AI for spell-checking, punctuation, generally making it flow, but its all my text.
You think a machine is going to come up with "near pointers, far pointers, wherever-you-are pointers"?
LLMs generate low-entropy text. That's their entire purpose. But good writing isn't about being as low-entropy as possible. It's about producing peaks and valleys. As a person who's been participating in human-to-human communication your entire life, you probably have a pretty well-developed sense of how to structure the flow of a piece of communication. The small arcs with their ebbs and flows of tension and density provide the reader a rough surface that gives them enough traction to easily move from point to point. Don't let an LLM smooth out all the gaps. It makes it hard for a reader to keep their footing in the text.
Not OP, but this is where you're wrong. The vast majority of people, myself included, have difficulty structuring communication for effectiveness to a wide audience. When I manage to pull it off I'm very proud of the work, but I can't just sit down and do it. Review with an LLM helps me find those places where it CLANKS, distracting the reader, taking them out of the flow. This is why every professional writer has an editor; good communication is quite hard.
- the “make it flow” made it flow in an AI generated way like short paragraphs that are one short sentence.
- I now have to decide if this is entirely AI generated and thus not worth my time reading or not.
- I would prefer to just interact with you as a real person; your writing doesn’t have to be perfect for what you write to be worth reading.
Segments aren’t conceptually difficult, either, but definitely could be annoying, and certainly were, if you had to access data structures larger than 64 kB.
As to the differences:
- you had four segment registers that you could ‘point’ anywhere, allowing you to access four 64kB regions of memory without changing them (the equivalent of bank switching) (one always was used for accessing the instruction to run, one for accessing the stack, but you could use those for other purposes, too (Could, not SHould)
- segments can overlap. You could set DS and ES to the same value, for example.
Segments also can be moved at 16-byte granularity. If you wanted, you could have DS address address memory range 0x0000 ≤ x < 0xFFFF and SS address memory range 0x0010 ≤ x < 0x1000F.
Banking was one solution to the 1MB limit; was it extended or expanded mode? I can no longer remember, but one of those gave you a 64kb window somewhere above the 640kb limit in the address space not used by either video RAM or BIOS. That window could then be paged around the rest of memory.
I don't know about DMG/GBC/GBA games. Some very interesting stuff happened on those platforms (e.g. Game Boy Camera, and some game that lets you control a sewing machine in Japan?) and I bet a pure sliding window mapper exists.
The PC Engine/Turbografx-16 had platform support for mapping (specific CPU instructions did it) but it was 8 fixed windows in the CPUs 64K address space that pointed to 8K size offsets in the ROM I believe. SNES had a 24-bit address space and DMA to copy things to VRAM so not sure mappers were really on that platform.
How is that compatible with an array and a simple implementation of the index operator?
This was a problem.
Huh?
There were no segmented x86 machines capable of addressing 256MB of RAM, aside from the 386 (maybe).
If you had a 386 and the $130K of memory your statement implies, you probably also could afford a Unix (or something else) license to get to that 32-bit address space. (If you weren't doing it all in memory, then you're having to depending on paging stuff out to disk, implying you either have a real OS or a flat memory model isn't enough to save you since you're manually having to page stuff to disk and back anyway.)
That's a super strange scenario you're describing.
Back then you had to chunk it out and fiddle with the offsets. Even then you still would have had to manage loading out the next chunk.
If my memory is right 1MB of memory in the early 90s was like 200-300 per meg. Would have to dig up a computer shopper and look.
I only have a couple reference points around this scale:
My dad's company had a system set up with a searchable index of a bunch of legal testimony. It was a Compaq Deskpro 386 running Unix with an attached 1GB disk. The 1GB disk set up was as big as the machine itself.
A few years later, I worked with a Cyber mainframe equipped with around 30GB of total attached disk storage. The disk array literally filled a room.
256MB disk on an 80's PC would have definitely been quite a bit.
I seem to remember that memory segments came with a permission system (read-only, read/write, execute) in 'protected mode'. Probably only added in the 286 though (I was always more of an m68k guy at that time).
If you do need something approaching a 2MB block of memory, you don't need a contiguous range of memory, what you need is a contiguous range of selectors, which is a different (and probably easier) problem to solve.
The memory itself doesn't have to be contiguous.
2MB of 64K segments maps to 32 segments. So you need 32 locations in physical memory capable of storing 64K.
The programming model for addressing that block of memory necessarily includes both segment selectors and offsets. The segment selectors are indices into a segment table that contains the base address of each of the 32 segments. As long as the segment selectors themselves can be allocated contiguously in the segment table, you have enough to be able to compute which segment you need for which address in the 2MB range. It's the indirection through the segments table that maps it to physical addresses that do not need to be contiguous.
Raymond Chen talks a bit about how it worked in Windows 3.x here: https://devblogs.microsoft.com/oldnewthing/20171113-00/?p=97...
This was just for illustration, not claiming that actual 8086 does this.
>> Raymond Chen talks a bit about how it worked in Windows 3.x here: https://devblogs.microsoft.com/oldnewthing/20171113-00/?p=97...
And this is the problem, it was very painful just to walk through a 200 KB buffer. This required compiler/runtime tricks, different selector increments in real vs protected mode, and special pointer types. Paging later made this kind of thing look like one flat array, a thing segmentation could not: making non-contiguous physical RAM appear contiguous to the program.
Most of that could be (and often was) hidden by the tooling. If you needed to bypass it, you could, but you didn't need to. That's not very different from today... there's a lot of hidden magic that can be bypassed if you need to for whatever reason.
I'd argue that these are useful engineering abstractions that made the best of a less than ideal situation. (The reality of the world being that there are no "ideal" situations... you have to work with what you have at the moment to solve the problem you have. These days, I'd argue that a pointer into a 'flat' memory space is counter productive to the extent it hides issues around cache hierarchy, NUMA, etc. In 1986, we had to worry that a flat memory space looked discontiguous. In 2026, we have to worry the a discontiguous memory space looks flat.
Well, there are large/huge pages (2MiB/4MiB/1GiB) that reduce this problem.
There were literally millions of man-hours wasted on segment registers. A kludge that helped Intel conquer the world, but what a filthy, disgusting architecture, and what a waste of everybody's time and brain power.
This was less about tooling than it was about economics - there was 32-bit hardware available in the personal computer space in 1984, if not before. The issue was cost. In today's currency, a 32-bit capable Mac was $8,000 with 128K. The first 32-bit capable PC was closer to $20,000.
That's a heavy lift in a world where a segmented architecture machine costs a fraction of that amount, runs software you might already have, and works the same way as your co-worker's machine.
> There were literally millions of man-hours wasted on segment registers.
A software developer in 1986 was not forced to deal with segment registers... but they often chose to deal with them to gain access a (much) bigger audience of potential customers for their software.
> A kludge that helped Intel conquer the world, but what a filthy, disgusting architecture, and what a waste of everybody's time and brain power.
The other side of the coin is that (for reasons I state above), segmented architectures got more capable software into more hands more quickly. It arguably did a lot for end users.
Imagine if you could have done something like this:
add si, some-delta
adsc es, 0
in order to move a seg:ofs ptr forward by 'some-delta' bytes.ADSC (add with segment carry) would do:
segreg := segreg + imm + 1000h (if carry)
or: segreg := segreg + imm (no carry)
Maybe there should also have been an instruction to normalize a seg:ofs ptr (so the new offset was in the 0-15 range).ADSC could have been adapted for the 286 with ease, as long as a specific layout of the segment descriptor tables was mandated (probably with 10h instead of 1000h in protected mode).
Edited slightly for clarity (ofs => imm). A normalizing instruction would be harder to do right for the 286 because you don't want to spend too many slots in the descriptor table(s) for a single memory object.
Segmented memory (on hardware that supported segment permissions) was used to good effect in Multics as well.
The segment thing and the convoluted different pointer math caused real gymnastics if you ever had data bigger than 64k... such as images.
I always thought of the segments as windows of 64k but moving between those windows, esp with the limited register set, required some real mental gymnastics.
It was just a hack. Hack to delay migration to 32 bit architecture. Effective one, but hack nonetheless
When I was learning C, we did things at a reasonably low level. I was learning data structures, and building things like binary trees out of things like structs, and the structs were fixed-sized memory blocks holding pointers to regions of memory which were either more structs or data fields. All reasonable stuff. But we weren't writing for a particular machine. We were writing for the idea of a machine, and part of that idea was that the machine had a flat memory model. This really struck me when I compiled my homework (parse some data into a tree) on the departmental SunOS server, and it worked fine, and then took it home and compiled it with Borland C for DOS on my 386 and it segfaulted on the same data. That was when I learned to hate segmented memory, but looking back, it seems to me that I learned the wrong lesson.
I learned to write software for a lowish-level model of an idealized computer. The generation before me was always writing software for a specific computer, consisting of a specific set of hardware. The software was always the goal, but the nature of the task was defined by the hardware. Things like memory segmentation were facts about the hardware, and the available hardware varied widely at the time in a way modern hardware doesn't, really, except maybe in the embedded space.
No, it wasn't
It's the "great idea" that sounds great 5 min in and horrible 10min afterwards
You know, kinda like using null as a string end character
But more importantly it kept the x86 world for too long in that dead end that was 8086 mode programming
"Oh if developers would just..." They won't. They haven't. And they will not ever.
In hindsight maybe a binary level translator from 8080 to 8086 would have worked better (and be simple enough)
> But more importantly it kept the x86 world for too long in that dead end that was 8086 mode programming > > "Oh if developers would just..." They won't. They haven't. And they will not ever.
8086 real mode programming in the mainstream lasted from 1981 until 1991 or so. The last 35 years have 32-bit (and later 64-bit) flat model addressing with pages for the most part. Seems like a reasonable transition period, really.
> In hindsight maybe a binary level translator from 8080 to 8086 would have worked better (and be simple enough)
Part of the reason they liked the segmented model is that it was possible to set the segments to the same value and then ignore them entirely. That gave a programming model for the 8086 that was sufficiently close to the 8080 that it was possible to use a sort of cross assembler to do something like what you suggest. You could then opt into 8086 specific instructions and segmentation as you needed. (Which took a few years... the first IBM PC's shipped with as little as 16K of RAM.)
But what should Intel have done? They needed a CPU that can run 8080 code but with more memory. Also it's the year ~1980 and we're limited to the technology of the age.
A system with 64k sized windows seems unavoidable.
If you extend the size of the address registers, 8080 code will only run in the first 64k, or require some kind of current window register.
An 8080 mode might have worked but that would have been expensive.
Tbf the Motorola 68000 which was released around the same time (1979) had a proper linear address space with 32-bit address registers (of which 24 bits were wired up).
Also the 8086 was intended as a cheap and temporary stop gap until Intel's "proper" 32-bit CPU architecture was ready for prime time (the doomed iAPX 432).
It would be a piece of trivia today if motorola were not 6 months late which forced IBM in frustration to change tracks to Intel and MS DOS instead (which worked on 8086). That 6 month drlay created WinTel of today.
PS: and segmented memory wasn't all that different from the memory banking used before in 8-bit home computers to address more than 64 KBytes, except that the memory mapping hardware was implemented outside the CPU.
An MMU gives you a flat addressing model. There is no comparison. 8086 segments are rigidly locked to a 64KB window that goes forward in memory 16 bytes for every segment (so segmented address 1234:5678 is linear address $12340 + $5678 = $179B8)
It didn't do this to offer a useful feature like an MMU. It did this to allow code that doesn't know segment registers exist to think they're still running on an 8-bit Z80. What a waste of potential. The 68000 didn't pretend to be a 6502.
The 80286 introduced protected mode with "segment descriptors", but this is well after MMUs existed on other CPUs, it didn't invent virtual memory. Only the 80386 offered a 32-bit flat memory model.
If you want to see something to make you weep, look at the MS-DOS version of unzip. It has to do all kinds of crazy, just to allocate 64KB of RAM and get all 64KB, not 8 bytes less. And it's still locked into a memory access model that will not let it ever address more than 64KB of any one object. It's why MS-DOS was viewed as a toy OS for a toy computer.
#if defined(__TURBOC__) && !defined(OS2)
#include <alloc.h>
/* Turbo C malloc() does not allow dynamic allocation of 64K bytes
* and farmalloc(64K) returns a pointer with an offset of 8, so we
* must fix the pointer. Warning: the pointer must be put back to its
* original form in order to free it, use zcfree().
*/
...
static ptr_table table[MAX_PTR];
/* This table is used to remember the original form of pointers
* to large buffers (64K). Such pointers are normalized with a zero offset.
* Since MSDOS is not a preemptive multitasking OS, this table is not
* protected from concurrent access. This hack doesn't work anyway on
* a protected system like OS/2. Use Microsoft C instead.
*/Many programs written in assembly language used self modifying code back then. It saved RAM and improved performance. All programs that used such trickery would have broken by a binary translator.
Why would someone be popping up in 2026 saying it was awesome? Weird.
https://en.wikipedia.org/wiki/Expanded_memory
Except we couldn't. If we made each segment isolated from other, we would waste so much memory because memory are allocated in segment.
If we made each segment dynamic, we need something to manage them.
This "hindsight" is just a MMU in disguise.