FilterHN

I don't disagree with you but let's keep in mind that this is a low level library, not a user-facing product, and it does nothing by itself. So it's like complaining that SDL or DXVK don't have screenshots.

▲

thecaio

8 months ago

[-]

fair!

▲

devit

8 months ago

[-]

Doesn't seem to be a noticeable improvement from N64-era graphics.

It probably needs generative AI based upscaling to high resolution meshes, textures and realistic materials to actually achieve a quality improvement.

▲

chrisdirkis

7 months ago

[-]

I need to emphasise that this is dramatically different. Comparing the first clip of OoT to [1], I see: - Higher framerate - Real-time lighting - Real-time shadows - Higher render resolution - Water ripples/splashes (unsure about these being new) - Widescreen, motion blur, etc.

If your contention is "they're not an improvement on the graphics, just changes", then we can agree to disagree, but they're otherwise incredibly noticeable!

[1]: https://www.youtube.com/watch?v=nySI72vRl_4

▲

dcrazy

8 months ago

[-]

The improvement is in the shadows. Link’s real-time shadow on the ground, the self-shadowing of the roof of the house that Luigi emerges from, the shadows the ropes cast on Kirby… the N64 could not do these things even at coarse resolution due to limited texture bandwidth.

▲

basch

8 months ago

[-]

It would be an interesting feedback loop to capture a screenshot of a game (sm64), determine the green texture is grass, update the texture realtime, take another screenshot, and iterate, having the game flesh itself out realtime as it’s being played.

▲

kiicia

8 months ago

[-]

Four years ago there was demo of exactly that in GTA5 https://www.ign.com/articles/experimental-ai-tool-makes-gta-...

▲

basch

8 months ago

[-]

was that a filter over the final render, or was it updating the assets, textures, polygons, and lighting behind the scenes?

▲

yieldcrv

8 months ago

[-]

> Doesn't seem to be a noticeable improvement from N64-era graphics

right, because that's the not point, the point is to be like N64-era graphics at 60fps and greater, with widescreen, motion blur, and other things that have high engagement amongst gamers (whether the self-proclaimed gamers like them or not)

▲

robin_reala

8 months ago

[-]

I guess the big test will be running Kaze Emanuar’s code through it.

▲

01HNNWZ0MV43FF

8 months ago

[-]

For those who haven't heard, Kaze is the guy who optimized Mario 64's code, fixed some physics bugs, etc., claims roughly 6x speedup, so that he can make a mod / ROM-hack that runs smoother despite having better graphics: https://www.youtube.com/watch?v=t_rzYnXEQlE

▲

bityard

8 months ago

[-]

Has he actually released it yet?

▲

immibis

8 months ago

[-]

And discovered you could disable cache writeback to have a region of fast RAM without bus contention, and ways to use the hidden 9th bit in each byte which is only usd by certain graphics framebuffer operations.

I don't think he uses either one in serious code, but if he did, good luck emulating it.

▲

dmonitor

8 months ago

[-]

He demos his code on both console and emulator, so there shouldn't be an issue in that regard.

▲

emchammer

8 months ago

[-]

Just about a Fabrice Bellard level of competence there

▲

mouse_

8 months ago

[-]

> Uses ubershaders to guarantee no stutters due to pipeline compilation.

I may sound way out of the loop here, but... How come this was never a problem for older dx9/dx11/GL games and emulators?

▲

tgtweak

8 months ago

[-]

In those gl/dx games (built for non-specific hardware) all the textures and shaders are compiled either during the game's build OR before you get into the scene. Many console systems, particularly Nintendo, do that precompilation specifically for the hardware GPU that is inside the console. That is not known to the emulator in advance (unless someone publishes a shader compilation alongside the rom...) so when the shader is referenced in the scene, it needs to be compiled in runtime to work on the emulated graphics system (translated from nintendo-hardware shader code to direct-x, vulkan or openGL then further into the vendor-specific shader)

Most modern emulators implement a shader cache which stores those shaders as they are encountered so that this "compilation stutter" only happens once per shader - but modern titles can have hundreds or thousands of shaders and that means on a playthrough you're pretty much encountering it consistently. Breath of the Wild was one that stands out as a game where you basically had to run it with precompiled shader caches as it was borderline unplayable without it.

Ubershaders act like fallback shaders - using an off the shelf precompiled "particle" shader vs the actual one, while the actual one is compiled for use next time - this prevents the stutter at a cost of visual fidelity. If you see an explosion in a game, it will be a generic explosion shader vs the actual one used in the game, until it is available in the shader cache.

▲

zeta0134

8 months ago

[-]

That's not quite how ubsershaders work. They're a "fallback" shader in the sense that they rather inefficiently implement the entire pipeline, but they do implement the entire pipeline. The shader being compiled in another thread will be more efficient as it uses only the logic needed for whatever configuration the game is calling up. But the visual result is identical in the ubsershaders case, that's the whole point. If you want, and your host system is powerful enough, you can turn ubsershaders on all the time and disable the entire compilation thread and associated cache.

I believe the term was coined by Dolphin team, who did a pretty good high level writeup of the feature here:

https://dolphin-emu.org/blog/2017/07/30/ubershaders/

▲

tom_

8 months ago

[-]

It predates dolphin's use of it, though don't ask me by how long. Here's a reference to the term from 2008: https://realtimecollisiondetection.net/blog/?p=73

▲

Jasper_

8 months ago

[-]

Ubershader actually has three different opposite meanings, unfortunately.

The classic usage is a single source shader which is specialized using #define's and compiled down to hundreds of shaders. This is what Christer uses in that blog post above (and Aras does as well in his ubershader blog post)

Dolphin used it to mean a single source shader that used runtime branches to cover all the bases as a fallback while a specialized shader was compiled behind the scenes.

The even more modern usage now is a single source shader that only uses runtime branches to cover all the features, without any specialization behind the scenes, and that's what Dario means here.

▲

zeta0134

8 months ago

[-]

Ah, then my correction probably does not stand, and I'll need to look deeper into it. Thanks for the explanation! This jargon really gets out of hand at times. :P But I don't mind being wrong if I learn from it.

▲

Y_Y

8 months ago

[-]

Thank you. This is my favorite kind of comment. There are lots of "technical" terms which manage to acquire similar but distinct uses (today I was contending with "agent" and "prompt"). Keeping them straight in your own head, and recognizing when others don't is as valuable as it is unappreciated.

▲

derefr

8 months ago

[-]

So how about:

1. A global, networked shader cache — where when any instance of the emulator encounters a new shader, it compiles it, and then pushes the KV-pair (ROM hash, target platform, console shader object-code hash)=(target-platform shader object-code) into some KV server somewhere; and some async process comes along periodically to pack all so-far-submitted KV entries with a given (ROM hash, target platform) prefix into shader-cache packfiles. On first load of a game, the emulator fetches the packfile if it exists, and loads the KV pairs from it into the emulator's local KV cache. (In theory, the emulator could also offer the option to fetch global-shader-cache-KV-store "WAL segment" files — chunks of arbitrary global-shader-cache KV writes — as they're published on a 15-minute-ly basis. Or KV entries for given (ROM hash, target) prefixes could be put into message-queue topics named after those prefixes, to which running instances of the emulator could subscribe. These optimizations might be helpful when e.g. many people are playing a just-released ROMhack, where no single person has yet run through the whole game to get it in the cache yet. Though, mind you, the ROMhack's shaders could already be cached into the global store before release, if the ROMhacker used the emulator during development... or if they knew about this, and were considerate enough to use some tool created by the emulator dev to explicitly compile + submit their raw shader project files into the global KV store.)

2. Have the emulator (or some separate tool) "mine out" all the [statically-specified] shaders embedded from the ROM, as a one-time process. (Probably not just a binwalk, because arbitrary compression. Instead, think: a concolic execution of the ROM, that is looking for any call to the "load main-memory region into VRAM as shader" GPU instruction — where there is a symbolically-emulated memory with regions that either have concrete or abstract values. If the RAM region referenced in this "load as shader" instruction is statically determinable — and the memory in that region has a statically-determinable value on a given code-path — then capture that RAM region.) Precompile all shaders discovered this way create a "perfect" KV cachefile for the game. Publish this into a DHT (or just a central database) under the ROM's hash. (Think: OpenSubtitles.org)

Mind you, I think the best strategy would actually combine the two approaches — solution #2 can virtually eliminate stutter with a single pre-processing step, but it doesn't allow for caching of dynamically-procedurally-generated shaders. Solution #1 still has stutter for at least one player, one time, for each encountered shader — but it handles the case of dynamic shaders.

▲

tgtweak

8 months ago

[-]

The issue with "mining" the rom for shaders is they're not defined in a consistent way across the games. Games using the same base engine could have their shaders stored and referencable in a consistent way - but that would not be a catch-all solution for all games, which means you still need a fallback. Most of the ROMs are obfuscated and it's not possible, at least not currently, to "know" all the shaders being used in the game before the hardware calls to load the shader come through the game binary while it's running.

The best experience so far is downloading an additional shader cache alongside the ROM - in some rom formats these can also be included alongside the ROM which acts like a dictionary and can facilitate loading into the emulator vs having to add it as a "mod" for that specific game. Adding this to a DHT type network for "shader sharing" would be great but might open the door to some abuse (shaders are run at hardware level and there are some examples of malicious shaders out there) - Plus you'd be exposing the games you're playing to the dht network.

Anyway - Just a succinct example of the level of effort that goes into making an emulator "just work".

▲

derefr

8 months ago

[-]

> The issue with "mining" the rom for shaders is they're not defined in a consistent way across the games.

I don't want to be snippy, but — I don't think you understood the rest of the paragraph you're attempting to rebut here, since this is exactly (part of) what I said myself. (I wouldn't blame you if you didn't understand it; the concept of "concolic execution" is probably familiar to maybe ~50000 people worldwide, most of them people doing capital-S Serious static-analysis for work in cryptanalysis, automated code verification, etc.)

To re-explain without the jargon: you wouldn't be "mining" the shaders as data-at-rest; rather, you'd be "running" the ROM under a semi-symbolic (symbolic+concrete — concolic!) interpreter, one that traverses all possible code-paths "just enough" times to see all "categories of states" (think: a loop's base-case vs its inductive case vs its breakout case.) You'd do this so that, for each "path of states" that reaches an instruction that tells the console's GPU "this here memory, this is a shader now", the interpreter could:

1. look back at the path that reached the instruction;

2. reconstruct a sample (i.e. with all irrelevant non-branch-determinant values fixed to placeholders) concrete execution trace; and then

3. concretely "replay" that execution trace, using the emulator itself (but with no IO peripherals hooked up, and at maximum speed, and with no need for cycle-accurate timed waits since inter-core scheduling is pre-determined in the trace);

4. which would, as a side-effect, "construct" each piece of shader object-code into memory — at a place where the interpreter is expecting it, given the symbolic "formula" node that the interpreter saw passed into the instruction ("formula node": an AST subtree built out of SSA-instruction branch-nodes and static-value leaf-nodes, referenced by a versioned Single-Static-Information cell, aliasable into slices within CPU-register ADTs, or into a layered+sparse memory-cell interval-tree ADT);

5. so that the interpreter can then pause concrete emulation at the same "load this as a shader" instruction; reach into the emulator's memory where the "formula node" said to look; and grab the shader object-code out.

If you know how the AFL fuzzer works, you could think of this as combining "smart fuzzing" (i.e. looking at the binary and using it to efficiently discover the "constraint path" of branch-comparison value-ranges that reaches each possible state); with a graph-path-search query that "directs" the fuzzer down only paths that reach states we're interested in (i.e. states that reach a GPU shader load instruction); and with an observing time-travelling debugger/tracer connected, to then actually execute the discovered "interesting" paths up to the "interesting" point, to snapshot the execution state at that point and extract "interesting" data from it.

---

Or, at least, that's how it works in the ideal case.

(In the non-ideal case, it's something you can't resolve because the "formula" contains nodes that reference things the interpreter can't concretely emulate without combinatoric state-space explosion — e.g. "what was loaded from this save file created by an earlier run of the game process"; or maybe "what could possibly be in RAM here" when the game uses multiple threads and IPC, and relies on the console OS to pre-emptively schedule those threads, so that "when a message arrives to a thread's IPC inbox" becomes non-deterministic. So this wouldn't work for every game. But it could work for some. And perhaps more, if you can have your concolic interpreter present a more-stable-than-reality world by e.g. "coercing processors into a fake linear clock that always pulses across the multiple CPU cores in a strict order each cycle"; or "presenting a version of the console's OS/BIOS that does pre-emptive thread scheduling deterministically"; etc.)

▲

jamesgeck0

8 months ago

[-]

Many games that have unique shaders all the way up to the end of the game. Reconstructing the path of states required to get an individual sample from the end of a 100 hour JRPG in this manner seems like it would be hilariously computationally expensive?

▲

dcrazy

8 months ago

[-]

It was, in fact, a problem. DX11 and earlier tried to solve it with DXBC, an intermediate bytecode format that all drivers could consume. The driver would only need to lower the bytecode to the GPU’s ISA, which is much faster than a full compilation from HLSL. Prior to the emergence of Vulkan, OpenGL didn’t ever try to solve this; GLSL was always the interface to the driver. (Nowadays SPIR-V support is available to OpenGL apps if your driver implements the GL_ARB_gl_spirv extension, and of course DXIL has replaced DXBC.)

Compilation stutters was perhaps less noticeable in the DX9/OpenGL 3 era because shaders were less capable, and games relied more on fixed functionality which was implemented directly in the driver. Nowadays, a lot of the legacy API surface is actually implemented by dynamically written and compiled shaders, so you can get shader compilation hitches even when you aren’t using shaders at all.

In the N64 era of consoles, games would write ISA (“microcode”) directly into the GPU’s shared memory, usually via a library. In Nintendo’s case, SGI provided two families of libraries called “Fast3D” and “Turbo3D”. You’d call functions to build a “display list”, which was just a buffer full of instructions that did the math you wanted the GPU to do.

▲

kimixa

8 months ago

[-]

Having worked on some of the latter part of that era of GPUs, the "frontend" of the shader compiler was a pretty small fraction of the total time cost, most of it was in the later optimization passes that often extremely hardware specific (so not really possible at the level of DXBC). Especially as hardware started to move away from the assumptions used in designing it.

I think a big part of the user-visible difference in stutter is simply the expected complexity of shaders and number of different shaders in an "average" scene - they're 100s of times larger, and CPUs aren't 100s of times faster (and many of the optimization algorithms used are more-than-linear in terms of time vs the input too)

Modern DXIL and SPIR-V are at a similar level of abstraction to DXBC, and certainly don't "solve" stutter.

▲

dcrazy

8 months ago

[-]

One advantage of contemporary bytecode implementations is that many optimizations can occur in the “middle end”—which is to say on the IR itself, before lowering to ISA.

▲

kimixa

8 months ago

[-]

Yes, many optimizations can be done at the vendor-neutral IR level, but my point is that on GPUs they tend to be some of the computationally less expensive ones - the vast majority of the compilers time (in my experience) was in levels lower than that, like register allocation (as on GPUs "registers" are normally shared for all waves - so there's trade offs in using fewer registers but allowing more waves, for example), or trying to reorder things to hide latency from asynchronous units or higher latency instructions. And all those are very hardware specific.

It's a classic example of the "first 50%" being relatively easy - like an "optimizing" compiler can get pretty good with pretty simple constant propagation/inlining/dead code elimination. But that second 50% takes so much more effort.

▲

babypuncher

8 months ago

[-]

Epic published a very well written article explaining what the problem is, what makes it so much worse in modern games, and why it's a uniquely difficult problem to solve.

https://www.unrealengine.com/en-US/tech-blog/game-engines-an...

▲

thereddaikon

8 months ago

[-]

Older games used precompiled shaders. These are inaccessible to the game devs and usually handled by the hardware makers, so the platform OEM for consoles and the video card OEM on PCs. Game devs have begged for the ability to write their own shaders for years and finally got it with DX11 and Vulkan. And that's when things went to hell. Instead of the shaders being written and compiled for the specific hardware, they now have to be compiled for your GPU at run time. It's a messy and imperfect process. EA, Ubisoft or anyone else is never going to have the same level of understanding of a GPU that Nvidia or AMD will have. Often the stuttering is due to the shaders having to be recompiled in game, something that never happened before.

▲

maximilianburke

8 months ago

[-]

Compiling shaders directly from a high level representation to the GPU ISA only really happens on consoles.

In DirectX on PC, shaders have been compiled into an intermediate form going back to Direct3D 8. All of these intermediate forms are lowered into an ISA-specific instruction set by the drivers.

This final compilation step is triggered lazily when a draw happens, so if you are working on a "modern" engine that uses thousands of different material types your choices to handle this are to a) endure a hiccup as these shaders are compiled the first time they are used, b) force compilation at a load stage (usually by doing like a 1x1 pixel draw), or c) restructure the shader infrastructure by going to a megashader or similar.

▲

dcrazy

8 months ago

[-]

> Compiling shaders directly from a high level representation to the GPU ISA only really happens on consoles.

When targeting Apple platforms, you can use the metal-tt tool to precompile your shaders to ISA. You give it a list of target triples and a JSON file that describes your PSO. metal-tt comes with Xcode, and is also available for Windows as part of the Game Porting Toolkit.

Unfortunately, most people don’t do that. They’re spoiled by the Steam monoculture, in which Steam harvests the compiled ISA from gamers’ machines and makes it available on Valve’s CDN.

▲

Jasper_

8 months ago

[-]

metal-tt/metal-nt require you to specify the exact architecture, and that's not forward-compatible unless you update your application for every new device release. Even minor SKU revisions like applegpu_g13p/applegpu_g13g/applegpu_g13s/applegpu_g13c are different compilation targets, and that doesn't help you when Apple releases applegpu_g14.

▲

dcrazy

8 months ago

[-]

Yep. That’s why the artifact includes the IR, which can be compiled on first use on new architectures.

▲

powerhugs

8 months ago

[-]

> Compiling shaders directly from a high level representation to the GPU ISA only really happens on consoles.

No, that's not correct. In fact, it's mostly the other way around. Consoles have known hardware and thus games can ship with precompiled shaders. I know this has been done since at least PS2 era since I enjoy taking apart game assets.

While on PC, you can't know what GPU is in the consumer device.

For example, Steam has this whole concept of precompiled shader downloads in order to mitigate the effect for the end user.

▲

maximilianburke

8 months ago

[-]

> Consoles have known hardware and thus games can ship with precompiled shaders. I know this has been done since at least PS2 era since I enjoy taking apart game assets.

That's what I said. Consoles ship GPU machine code, PCs ship textual shaders (in the case of OpenGL) or some intermediate representation (DXIL, DXBC, SPIRV, ...)

▲

powerhugs

8 months ago

[-]

Alright!, sorry for the misunderstanding on my part.

▲

garaetjjte

8 months ago

[-]

It was, but it was made worse by DX12/Vulkan. Previously shader stages were separate and fixed pipeline settings were loose bag of knobs. If that corresponded to hardware stages, great, everybody was happy! However, if there was mismatch between API and hardware (for example, some fixed pipeline state wasn't really hardware state but lowered to shader code) then driver needed to hack around it. If that could be done by patching the shader, it was ugly, but it worked fine. But if it required recompilation, then stutters occurred and it couldn't be prevented by application because it was all hidden from it. The solution was thought to encapsulate everything in pipeline state objects. If you could do that, excellent, no stutters no matter on what hardware it runs. It has issues however: in reality applications (especially emulators) don't always know beforehand which states they will need, previous ugly driver tricks that sometimes could have avoided recompilation by patching shaders are no longer possible, and even if the hardware does have easily switchable state API doesn't expose that leading to combinatorial explosion of state objects that was previously unnecessary. Some of that was rolled back by introducing API extensions that allow more granular switches of partial state.

▲

edflsafoiewq

8 months ago

[-]

The oldest N64 emulators predate programmable shaders. They mapped possible configurations of the N64 GPU onto configurations of the host GPU. But this is really hard and some configurations are just impossible. It's basically a huge list of tricks and special cases. For reference, I think it was like 16K lines for the color combiner, which basically does (A-B)*C+D.

▲

CrossVR

8 months ago

[-]

To answer that question in the context of Vulkan I highly recommend reading the proposal document for VK_EXT_shader_object which contains a great explanation in the problem statement about how modern graphics APIs ended up in this situation:

https://github.com/KhronosGroup/Vulkan-Docs/blob/main/propos...

The gist of it is that graphics APIs like DX11 were designed around the pipelines being compiled in pieces, each piece representing a different stage of the pipeline. These pieces are then linked together at runtime just before the draw call. However the pieces are rarely a perfect fit requiring the driver to patch them or do further compilation, which can introduce stuttering.

In an attempt to further reduce stuttering and to reduce complexity for the driver Vulkan did away with these piece-meal pipelines and opted for monolithic pipeline objects. This allowed the application to pre-compile the full pipeline ahead of time alleviating the driver from having to piece the pipeline together at the last moment.

If implemented correctly you can make a game with virtually no stuttering. DOOM (2016) is a good example where the number of pipeline variants was kept low so it could all be pre-compiled and its gameplay greatly benefits from the stutter-free experience.

This works great for a highly specialized engine with a manageable number of pipeline variants, but for more versatile game engines and for most emulators pre-compiling all pipelines is untenable, the number of permutations between the different variations of each pipeline stage is simply too great. For these applications there was no other option than to compile the full pipeline on-demand and cache the result, making the stutter worse than before since there is no ability to do piece-meal compilation of the pipeline ahead of time.

This gets even worse for emulators that attempt to emulate systems where the pipeline is implemented in fixed-function hardware rather than programmable shaders. On those systems the games don't compile any piece of the pipeline, the game simply writes to a few registers to set the pipeline state right before the draw call. Even piece-meal compilation won't help much here, thus ubershaders were used instead to emulate a great number of hardware states in a single pipeline.

▲

Jasper_

8 months ago

[-]

It happened all the time. I've played plenty of D3D11 games where clicking to shoot my gun would stutter the first time it happened as it was compiling shaders for the bullet fire particle effects.

Driver caches mean that after everything gets "prewarmed", it won't happen again.

▲

an-unknown

8 months ago

[-]

I think there is some confusion about "ubershaders" in the context of emulators in particular. Old Nintendo consoles like the N64 or the GameCube/Wii didn't have programmable shaders. Instead, it was a mostly fixed-function pipeline but you could configure some stages of it to kind of somewhat fake "programmable" shaders with this configurable pipeline, at least to some degree. Now the problem is, you have no idea what any particular game is going to do, right until the moment it writes a specific configuration value into a specific GPU register, which instantly configures the GPU to do whatever the game wants it to do from that very moment onwards. There literally is no "shader" stored in the ROM, it's just code configuring (parts of) the GPU directly.

That's not how any modern GPU works though. Instead, you have to emulate this semi-fixed-function pipeline with shaders. Emulators try to generate shader code for the current GPU configuration and compile it, but that takes time and can only be done after the configuration was observed for the first time. This is where "Ubershaders" enter the scene: they are a single huge shader which implements the complete configurable semi-fixed-function pipeline, so you pass in the configuration registers to the shader and it acts accordingly. Unfortunately, such shaders are huge and slow, so you don't want to use them unless it's necessary. The idea is then to prepare "ubershaders" as fallback, use them whenever you see a new configuration, compile the real shader and cache it, and use the compiled shader once it's available instead of the ubershader, to improve performance again. A few years ago, the developers of the Dolphin emulator (GameCube/Wii) wrote an extensive blog post about how this works: https://de.dolphin-emu.org/blog/2017/07/30/ubershaders/

Only starting with the 3DS/Wii U, Nintendo consoles finally got "real" programmable shaders, in which case you "just" have to translate them to whatever you need for your host system. You still won't know which shaders you'll see until you observe the transfer of the compiled shader code to the emulated GPU. After all, the shader code is compiled ahead of time to GPU instructions, usually during the build process of the game itself. At least for Nintendo consoles, there are SDK tools to do this. This, of course, means, there is no compilation happening on the console itself, so there is no stutter caused by shader compilation either. Unlike in an emulation of such a console, which has to translate and recompile such shaders on the fly.

> How come this was never a problem for older [...] emulators?

Older emulators had highly inaccurate and/or slow GPU emulation, so this was not really a problem for a long time. Only once the GPU emulations became accurate enough with dynamically generated shaders for high performance, the shader compilation stutters became a real problem.

▲

dcrazy

8 months ago

[-]

> Old Nintendo consoles like the N64 or the GameCube/Wii didn't have programmable shaders.

The N64 did in fact have a fully programmable pipeline. [1] At boot, the game initialized the RSP (the N64’s GPU) with “microcode”, which was a program that implemented the RSP’s graphics pipeline. During gameplay, the game uploaded “display lists” of opcodes to the GPU which the microcode interpreted. (I misspoke earlier by referring to these opcodes as “microcode”.) For most of the console’s lifespan, game developers chose between two families of vendor-authored microcode: Fast3D and Turbo3D. Toward the end, some developers (notably Factor5) wrote their own microcode.

[1]: https://www.copetti.org/writings/consoles/nintendo-64/

▲

Jasper_

8 months ago

[-]

Microcode was only used for the RSP, which was a modified MIPS coprocessor and could only realistically be used for T&L. After that, the RSP then sends triangles to the RDP for rasterization, pixel pipeline, and blending, all of which are fixed-function, admittedly with some somewhat flexible color combiner stuff.

▲

dcrazy

8 months ago

[-]

I appreciate the correction. Still, programmable T&L was kind of a big deal. PC GPUs didn’t get hardware T&L until the DX7 era, and it didn’t really become programmable until DX9/Shader Model 2.0.