I've always wondered if it would be possible to create an SDK to abstract the N64 graphics hardware and expose some modern primitives, lighting, shading, tools to bake lighting as this demo does, etc. The N64 has some pretty unique hardware for its generation, more details on the hardware are here on Copetti.org:
However there is a large caveat, 1. you have to think of the system as a graphics card with a cpu bolted on. and 2. the graphics system is directly exposed.
Graphics chip architecture ends up being a ugly hateful incompatible mess, and as such the vendors of said accelerators generally tend to avoid publishing reference documents for them, preferring to publish intermediate API's instead. things like OpenGL, DirectX, CUDA, Vulcan, mainly so that under the hood they can keep them an incompatible mess(if you never publish a reference, you never have to have hardware backwards compatibility, the up side is they can create novel designs, the down side is no one can use them directly) so when you do get direct access to them, as in that generation of game console, you sort of instinctively recoil in horror.
footnote on graphics influence: OpenGL came out of SGI and nvidia was founded by ex SGI engineers.
The Reality Coprocessor (or RCP) doesn't look like any graphics cards that previously came out of SGI. Despite the marketing, it is not a shrunk down SGI workstation.
It approaches the problem in very different ways is actually more advanced in many ways. SGI workstations had strict fixed function pixel pipelines, but RCP's pixel pipeline is semi-programmable. People often call describe it as "highly configurable" instead of programmable, but it was the start of what lead to modern Pixel Shaders. RCP could do many things in a single-pass which would require multiple passes of blending on a SGI workstation.
And later SGI graphics cards don't seem to have taken advantage of these innovations either. SGI hired a bunch of new engineers (with experience in embedded systems) to create the N64, and then once the project was finished they made them redundant. The new technology created by that team never had a chance to influence the rest of SGI. I get the impression that SGI was afraid such low-cost GPUs would cannibalise their high-end workstation market.
BTW, The console looks most like a shrunk down 90s SGI workstation is actually Sony's Playstation 2. Fixed function pixel pipeline with a huge amount of blending performance to facilitate complex multi-pass blending effects. Though, SGI wouldn't have let programmers have access to the Vector Units and DMAs like Sony did. SGI would have abstracted it all away with OpenGL
------------------
But in a way, you are kind of right. The N64 was the most forwards looking console of that era, and the one that ended up the closest to modern GPUs. Just not for the reason you suggest.
Instead, some of the ex-SGI employees that worked on the N64 created their own company called ArtX. They were originally planning to create a PC graphics card, but ended up with the contract to first create the GameCube for Nintendo (The GameCube design shows clear signs of engineers overcompensating for flaws in the N64 design). Before they could finish, ArtX were bought by ATI becoming ATI's west-coast design division, and the plans for a PC version of that GPU were scrapped.
After finishing the GameCube, that team went on to design the R3xx series of GPUs for ATI (Radeon 9700, etc).
The R3xx is more noteworthy for having a huge influence on Microsoft's DirectX 9.0 standard, which is basically the start of modern GPUs.
So in many ways, the N64 is a direct predecessor to DirectX 9.0.
I haven't programmed for either console. Which features show this in what sense?
On the N64, the CPU always ends up bottlenecked by memory latency. The RAM latency is quite high to start with, your CPU is sitting idle for ~40 cycles if it ever misses the cache, assuming RCP is idle. If RCP is not idle, contention with can sometimes push that well over 150 cycles.
Kaze Emanuar has a bunch of videos (like this one https://www.youtube.com/watch?v=t_rzYnXEQlE) going into detail about this flaw.
The gamecube fixed this flaw in multiple ways. They picked a CPU with a much better cache subsystem. The PowerPC 750 had Multi-way caches instead of a direct mapped, and a quite large L2 cache. Their customisations added special instructions to stream graphics commands without polluting the caches, resulting in way less cache misses.
And when it did cache miss, the latency to main memory is under 20 cycles (despite the Gamecube's CPU running at 5x the clock speed). The engineers picked main memory that was super low latency.
To fix the issue of bus contention, they created a complex bus arbitration scheme and gave CPU reads the highest priority. The gamecube also has much less traffic on the bus to start with, because many components were moved out of the unified memory.
---------------------------
The N64 famously had only 4KB of TMEM (texture memory). Textures had to fit in just 4KB, and to enable mipmapping, they had to fit in half that. This lead to most games on the N64 using very small textures stretched over very large surfaces with bilinear filtering, and kind of gave N64 games a distinctive design language.
Once again, the engineers fixed this flaw in two ways. First, they made TMEM work as a cache, so textures didn't have to fit inside it. Second, they bumped the size of TMEM from 4KB all the way 1MB, which was massive overkill, way bigger than any other GPU of the era. Even today's GPUs only have ~64KB of cache for textures.
---------------------------
The fillrate of the N64 was quite low, especially when using the depth buffer and/or doing blending.
So the Gamecube got a dedicated 2MB of memory (embedded DRAM) for its framebuffer. Now rendering doesn't touch main memory at all. Depthbuffer is now free, no reason to not enable, and blending is more or less free too.
Rasterisation was one of the major causes of bus contention on the N64, so this embedded framebuffer has a side-effect of solving bus contention issues too problem too.
---------------------------
On the N64, the RSP was used for both vertex processing and sound processing. Not exactly a flaw, it saved on hardware. But it did mean any time spent processing sound was time that couldn't be spend rendering graphics.
The gamecube got a dedicated DSP for audio processing. The audio DSP also got its own pool of memory (once again reducing bus contention).
As for vertex processing, that was all moved into fixed function hardware. (There aren't that many GPUs that did transform and lighting in hardware. Earlier GPUs often implemented transform and lighting in DSPs (like the N64's RSP), and the industry were very quickly switching to vertex shaders)
The standard api was pretty much OpenGL, generating in-memory command lists that could be sent to the RSP.
However the RSP was a completely programmable mips processor (with simd instructions in parallel).
One of my favorite tricks in the RDP hardware was it used the parity bits in the rambus memory to store coverage bits for msss
Good point. It is the software APIs are where you do see the strong SGI influence. It's not OpenGL, but it's clearly based on their experience with OpenGL. The resulting API is quite a bit better than other 5th gen consoles.
It's only the hardware (especially RDP) that has little direct connection to other SGI hardware.
But per-vertex lighting was kind of old and boring by even 1995, it massively limited your art style. You really wanted per-pixel lighting.
The GameCube's vertex pipeline was very fixed function, but its Pixel pipeline was quite programmable. Far more programmable than the N64. It was basically equivalent to the Xbox's pixel shaders, more advanced in some ways. But because it wasn't exposed with the pixel shader programming model, many people don't consider it to be "programmable" at all.
*And in many ways, you shouldn't consider the xbox and other DirectX 8.0 shaders to be fully programmable. You were limited to 8-16 instructions, with no control flow at all. On the gamecube, instead of 8-16 instructions, you had 16 stages, each being equivalent to an instruction. The N64 had just two stages, which were less flexible. True Fixed function pixel pipelines (like on the PS1, PS2 or Dreamcast) have just a single stage, and very little configurability.
Those "texture tricks" are per-pixel lighting. Many of them aren't possible on fixed function GPUs like the PS2, they required both textures and a reasonably programmable pixel pipeline.
Even today, most per-pixel lighting is done with a mix of textures and shaders.
The system has seen a dozen of its most popular games decompiled [1] into readable source files, which enables easy porting to PC without an emulator. It also enables a ton of mods to be written, many of which will run on the original hardware.
There are numerous Zelda fan remakes [2]. Complete games with new dungeons and storylines.
The Mario 64 scene is on fire. Kaze has deeply optimized the game [3], and is building his own engine and sequels. If you like technical deep dives into retro tech, his channel is literally golden.
Folks are making crazy demos for the platform, such as Portal [4], which unfortunately brought Valve's lawyers' attention.
Lost games, such as Rare's Dinosaur Planet [5], have leaked, been brought up to near production ready status, been decompiled, and have seen their own indie resurgence.
[1] https://wiki.deco.mp/index.php/N64
[2] https://m.youtube.com/watch?v=bZl8xKDUryI
[3] https://m.youtube.com/channel/UCuvSqzfO_LV_QzHdmEj84SQ
The whole channel is gold. He has dozens of deep dives like this: https://m.youtube.com/watch?v=DdXLpoNLywg
And his game and engine are beautiful: https://youtu.be/Drame-4ufso
Curious - why the desire to have it run on GL 2.1?
Turns out that perfect precision weapons on a m+kb setup are actually not much fun to play with. The movement is so limited compared to the brutal precision a mouse offers that everything just dies really really fast.
I wish I didn't think of a significantly better architecture for my 2d-pixel-art-game-maker-maker this weekend. Now it'll be another month before I can release it :(
- Limited map size
- Limited color palette I think
- and more!
As for Pokémon, the Nintendo 64 launched in June 1996 and the first Pokémon game was Pokémon Snap released nearly three years after the console in March 1999.
What was tricky was a separate technique to get real cubemaps working on the PS2.
Unfortunately, these came too late to actually ship in any PS2 games. The SH trick might have been used in the GameCube game “The Conduit”. Same team.
http://research.tri-ace.com/Data/Practical%20Implementation%...
Any details on that?
Except... The triangle UVs will often cross over between multiple squares. With the above texture, it will cross over into the white area and make the white visible on the mesh. So, you fill the white area with a duplicate of the texels from the square that is adjacent on the cube. That won't work for huge triangles that span more than 1.5 squares. But, it's good enough given an appropriate mesh.
Probably would have been better to just use a lat-long projection texture like https://www.turais.de/content/images/size/w1600/2021/05/spru... Or, maybe store the cubemap as independent squares and subdivide any triangles that cross square boundaries.
HN folks are probably familiar with raster interrupts (https://en.wikipedia.org/wiki/Raster_interrupt) and "racing the beam." I always associated this with the Atari 800. You weren't "supposed" to be able to do stuff like https://youtu.be/GuHqw_3A-vo?t=33, but Display List Interrupts made that possible.
What I didn't know until recently was how much Atari 2600's games owed to this kinda of craziness: https://www.youtube.com/watch?v=sJFnWZH5FXc
It's stuff like this that makes me think that if hardware stopped advancing, we'd still be able to figure out more and more interesting stuff for decades!
What I find more impressive are efforts like FastDoom or the various Mario-64 optimization projects which squeeze significantly better performance out of old hardware. Sometimes even while adding content and features. Maybe there is a connection between demo sceners and more comprehensive efforts?
GT3 heatwave summarizes it well.
"I showed a demo of GT3 that showed the Seattle course at sunset with the heat rising off the ground and shimmering. You can’t re-create that heat haze effect on the PS3 because the read-modify-write just isn’t as fast as when we were using the PS2. There are things like that."
https://old.reddit.com/r/ps2/comments/1cktw88/gran_turismos_...
https://youtu.be/ybi9SdroCTA?t=4103
It's not trying to emulate a real heatwave as new engines like UE5 does, that just tanks fps. It does "tricks" to do it instead. And honestly, looking at RTX tanking frame rates, I would rather have these cheap tricks.
A 299MHz MIPS runs this:
Shadow of the Colossus... https://www.youtube.com/watch?v=xMKtYM8AzC8
GoW2 https://youtu.be/IpKLwIIdvuk?si=TjifKmlYsUuvhk0F&t=970
FFXII https://youtu.be/NytHoYOs_4M?si=jE1Fxy40khEvV6Bn&t=51
GT4 https://www.youtube.com/watch?v=F6lZIxk_h9g (THE BOOTSCREEN crying)
Black (Renderware was a crazy engine) https://youtu.be/bZBjcwyq7fQ?si=Pev5ifpksJm4X6Oi&t=356
Valkyrie profile 2 https://youtu.be/9ScjO4NuUtA?si=Z29cR-hLsT2pnP2I&t=38
Rouge Galaxy https://youtu.be/iR1evzyl-7Q?si=fldm3-NnuFxOITMn&t=624
Burnout 3 https://www.youtube.com/watch?v=_r5r0nE1sA4
Jak and Daxter, Ratchet.
For GC - RE4, Metroid, The Zeldas... ofc. Looks crazy good.
I kneel.
MIPS CPU's are amazing, they can do wonders at low cicles. Just look at the PSP, or the SGI Irix.
Also, the PS2 "GPU" is not the same as the R4k CPU. BTW, on the PS2... the Deus Ex port sucked balls against the PC port, it couldn't fully handle the Unreal engine.
Yes, the PS2 did crazy FX, but with really small levels for the mentioned port; bear in mind DX was almost 'open word' for a huge chunk of the game.
Pentium much faster than MIPS CPU for game logic, 3dfx 50 MPixels/s fillrate matches Playstations 60 MPixels/s, Pentium FPU tho is no match for Playstation GTE 90-300K triangles per second meaning you would have to rely on CPU power alone for geometry processing (like contemporary Bleem) resulting in 166-233MHz Pentium minimum requirements. MMX would be of no help here, it was barely used in few games for audio effects.
The PSX "GPU" just worked with integers and that's it. Any decent compiler such as GCC and flags like -ffast-math would emulate the both dead simple MIPS CPU and the fixed point GPU where no floats are used at all while taking tons of shortcuts. MMX? Ahem, MPEG decoding from videos. If you did things right you could even bypass the BIOS decodings and just call the OS MPEG decoding DLL's (as PPSSPP does with FFMPEG) and drop the emulation CPU usage to a halt and let your media player framework do the work for you.
MMX was meant for anything you would normally use traditional DSP (also fixed point). Intel envisioned software modems and audio processing, in reality it was criminally underused and fell into 'too much effort for little gain' hole. Intels big marketing win was paying Ubi Soft cool $1mil for POD "Designed for MMX" ad right no the box https://www.mobygames.com/game/644/pod/cover/group-3790/cove... while game implements _one optional audio filter_ using MMX. Microsoft also didnt like Intel's Native Signal Processing (NSP) initiative and killed it https://www.theregister.com/1998/11/11/microsoft_said_drop_n...
MP3 - you could decode on Pentium ~100 so why bother, MPEG Pentium ~150 will play it flawlessly as long as graphic card can scale it in hardware. I would love to see the speed difference decoding MPEG with ffmpeg between Pentium 166 with and without MMX. Contemporary study shows up to 2x speedup in critical sections of image processing algorithms but only marginal gains for mp3/mpeg cases https://www.cs.cmu.edu/~barbic/cs-740/mmx_project.html
>drop the emulation CPU usage to a halt
Playstation 1 doesnt support MPEG.
Now could you implement GTE with MMX? Certainly yes, but again why bother when already 166-233MHz CPU is enough to accomplish same thing with integer unit alone.
Also, PAL resolutions for instance were a bit like SVGA, but the PSX did most of games at 320x240, as scaling them on fuzzy CRT's gave them almost a free antialiasing. 320x240 was easier to render on couch/bed TV's far away than Unreal or Quake 2 based games at 640x480.
The PSX pushed tons of triangles? At 320x240 on my 14" Nokia TV, they could look astounding, not so much on a PC desktop having to be rendered into a better quality CRT for PC's where 320x240 would look so-so.
This is why Pentium 100 could not pull of Playstation 1 games with just accelerated rasterizer like 3Dfx. Multiplying matrices is expensive, you either get specialized hardware to do it or hire Carmack and Abrash to use every trick in the book avoiding as much computation as possible. Lowering resolution does nothing, still have to cull, rotate scale, perspective correct and light same amount of geometry per frame.
Xbox 1 titles like jet set radio future etc look amazing also still.
Sorta. The GoW2 video was captured on PCSX2 and likely benefited from upscaling and other niceties in that clip. Didn't look through the rest of them. Either way, GoW2 was an incredible achievement on PS2.