FilterHN

PlayStation 3 Architecture (2021)

199 points

by adamwk

6 days ago

| past

| 13 comments

| copetti.org

| HN

▲

rtpg

2 days ago

[-]

I remember hearing somebody talk about programming hot loops in either the the PS3 or PS2 in Excel, to get a good handle on the concurrency question by having assembler in multiple columns next to each other

▲

xgkickt

2 days ago

[-]

That would be the PS2’s VUs which had an upper and lower pipe and it was easier to write instructions for each in separate columns. Then in one SDK we received program called vcl which took a single list of instructions, doing all the pipelining for you, as well as optimizing loops and assigning registers automatically. It was a godsend.

▲

mabster

2 days ago

[-]

I can't remember the details because we coded the SPU in C, but the PS3 SPUs had odd and even cycles with different access properties too.

▲

trelane

2 days ago

[-]

Discussed in the article, should you want a refresher. https://www.copetti.org/writings/consoles/playstation-3#arch...

▲

nickpsecurity

2 days ago

[-]

Sounds like a Gantt chart with code might fit.

▲

moomoo11

2 days ago

[-]

I love those

▲

wtcactus

2 days ago

[-]

The PS3 was the only console I had in my life. I bought it specifically to play the original Demon Souls.

I see a lot of comments here saying how underwhelming it was. And that’s probably true.

But one thing always surprised me. The quality of the games for PS3 at the platform end of life were gorgeous. Developers became so good at extracting all the power from the platform (and it had some in its difficult to use way), that great things were achieved.

▲

netrix

2 days ago

[-]

This is totally strange. I just got interested in the architecture of PS3 and its emulators (on Android too) and now there is article on HN...

▲

dlcarrier

2 days ago

[-]

https://en.wikipedia.org/wiki/Frequency_illusion

▲

asimovDev

2 days ago

[-]

tbh this is a very popular link. It's the second time I saw it get posted in the past 2 weeks

▲

eek2121

2 days ago

[-]

So, I'd have to dig through some older notes I have, however, some of this information seems inaccurate based upon my own interpretation of the specs (and writing code...specifically, but not limited to, the PowerPC part). A suggestion from me is to provide sources, and also maybe an epub of this.

▲

flipacholas

2 days ago

[-]

Please see this: https://github.com/flipacholas/Architecture-of-consoles

> A suggestion from me is to provide sources, and also maybe an epub of this

What do you mean?

▲

rideontime

2 days ago

[-]

It seems they missed this. https://payhip.com/copetti

▲

flipacholas

2 days ago

[-]

That was a small fundraiser started to convert all articles into epubs, finished in 2022

▲

RiverCrochet

2 days ago

[-]

> The EIB is made of twelve nodes called Ramps, each one connecting one component of Cell... Having said that, instead of recurring to single bus topologies (like the Emotion Engine and its precursor did), ramps are inter-connected following the token ring topology, where data packets must cross through all neighbours until it reaches the destination (there’s no direct path).

I knew IBM was involved in the design of the Cell BE, but I had no idea some successor of IBM's token ring tech (at least the concept of it) lived on in it. I'm sure there's other hardware (probably mainframe hardware) in and before that 2006 with similar interconnects.

▲

wmf

2 days ago

[-]

The EIB has nothing to do with 1980s Token Ring and this is arguably a mistake in the article. It's just a ring topology.

▲

MBCook

2 days ago

[-]

I suspect it’s an attempt at a metaphor that isn’t clearly marked as such.

▲

CHB0403085482

2 days ago

[-]

PS3 would have done better as a gaming console if the architecture wasn't so hard to program and wasn't forced to be a trojan blue-ray player.

▲

FirmwareBurner

2 days ago

[-]

IIRC the Cell architecture was OK-ish for a CPU architecture, just that the PS3 was too gimped on RAM(256MB) for game devs to use to its full potential. The trojan Blu-ray Player actually worked in its favor.

The big issue with the Cell architecture is that it was designed to act as a GPU as well which they realized later in development that it won't be powerful enough for those graphics and they'll still need a dedicated GPU in addition. That's why the Cell is such a franken-cpu compared to the vanilla IBM PowerPC it's based on.

The Cell architecture was also a product of it's time. In the early 00s when they started Cell development nobody would have expected that X86 would have made such leaps by the time the PS3 hit the market.

▲

jonhohle

1 day ago

[-]

Every console chose PowerPC that generation. It definitely wasn’t obvious to anyone (except maybe Apple), that the architecture was reaching it’s commercial limit.

▲

bluedino

2 days ago

[-]

It was common to try to get grant money to build a cluster of PS3's when they came out. I'm betting a bunch of computer science students got a console after that.

Was the PS3 the one that was banned from some countries? And wasn't the PS2 rumored to be used a ballistic missile guidance chip for some country?

▲

whaleofatw2022

2 days ago

[-]

PS2 had export limitations put on it by Japan. There were also rumors that Saddam was using them for a supercomputer.

I do have the odd anecdote, way back in the day, I was in a CompUSA in Dearborn MI and overheard a middle eastern guy at the counter asking if they had any PS2s. When they said no (this was a point where availability was low) instead he bought bought at least 5 (might have been 10?) PS1s.

▲

pests

2 days ago

[-]

Many fond memories at that CompUSA. Sad all those retailers are gone now.

▲

nxobject

1 day ago

[-]

Honestly, might have also been just to resell back home to consumers - I doubt it would’ve been easy to import, too.

▲

jokoon

2 days ago

[-]

I wonder if that architecture was designed to prevent emulation.

Because emulators still work insanely hard to make those games work, even today.

▲

dlcarrier

2 days ago

[-]

It's more so that every generation afterward has used the same architecture as most computers, making emulation more of a compatibility layer than actual emulation.

Development kits for the Xbox 360 used Power Mac G5s, because they were the same architecture as the Xbox 360, and modern Xbox and PlayStation development kits use x86 processors, again because there's no change in architecture.

Granted, you can't easily get a computer with a cell processor, but it isn't because of a lack of trying. Sony worked with IBM and Toshiba on designing and manufacturing the cell processor, and all three developed products using it, but the only one that was successful was Sony's PlayStation 3, but its success was likely despite the cell processor, not because of it.

▲

maximilianburke

2 days ago

[-]

No, they were optimizing for FLOPS/$.

▲

smallpipe

2 days ago

[-]

Doubt it. Avoiding jailbreak sure to keep selling games, but no one cares about emulators.

▲

jl6

2 days ago

[-]

I remember discussion at the time about how the PS3 was a uniquely difficult architecture to emulate. Was that true? Have those difficulties now been overcome? I see RPCS3 exists but I’ve no idea if it has done the difficult parts.

▲

nikanj

2 days ago

[-]

With sufficient thrust, pigs fly just fine. Eventually you can overcome any issues by throwing more CPU at the problem

▲

bakugo

2 days ago

[-]

Depends on your definition of "overcome". RPCS3 does emulate the architecture, and many games are playable on it, but it's still far from being perfect. Many games have stability issues due to timing/synchronization inaccuracies, for example.

▲

jl6

2 days ago

[-]

I think those timing issues are part of what I imagine the difficult part to be.

▲

jokoon

2 days ago

[-]

I believe the PS3 was designed to make it difficult to emulate.

Why go through the pain to design such thing, which makes it difficult for developers, while I don't think it would really result in better performance.

▲

andrepd

2 days ago

[-]

I miss the times when consoles [1] had unique and quirky architectures. Now everything is a PC x)

[1] And home computers, but that ended a couple decades earlier than consoles.

▲

lepicz

2 days ago

[-]

i did a bit dev on ps3 and i remember there was a small memory on the chip, like 256k that was accessible to programmer.

i always found this very appealing, having a blazing fast memory under programmer control so i wonder: why don't we have that on other cpus?

▲

fredoralive

2 days ago

[-]

It's kinda a neat idea for a fixed target CPU like on a games console, but for a general purpose CPU range you generally don't want to reveal too much behind the curtain like that, what if you did a new model with a bigger scratchpad? Would existing software just ignore it? Or a budget model with less? Do you have a crash, or just an slow fallback? The system where the CPU magically makes cache work is better when you're in a situation where the CPU and software problem aren't fixed.

▲

trelane

2 days ago

[-]

This is the SPU's local store. It's mentioned in the article. More details at https://en.wikipedia.org/wiki/Cell_(processor) Apparently it could in theory go to 4GiB.

"The local store does not operate like a conventional CPU cache since it is neither transparent to software nor does it contain hardware structures that predict which data to load."

I think the general term for this is scratchpad memory. https://en.wikipedia.org/wiki/Scratchpad_memory

This kind of indicates the problem with it. When switching tasks, each local store would have to be put into main RAM and the new task's local stores pulled back out. This would make switching tasks increasingly expensive. I believe the PS3 (and maybe all cell processors) dealt with this by not having tasks switch on the SPUs.

▲

flohofwoe

2 days ago

[-]

> why don't we have that on other cpus

Pure speculation from my side, but I'd think that the advantages over traditional big register banks and on-chip caches are not that great, especially when you're writing 'cache-aware code'. You also need to consider that the PS3 was full of design compromises to keep cost down, e.g. there simply might not have been enough die space for a cache controller for each SPU, or the die space was more vaulable to get a few more kilobytes of static scratch memory instead of the cache logic.

Also, AFAIK on some GPU architectures you have something similar like per-core static scratch space, that's where restrictions are coming from that uniform data per shader invocation may at most be 64 KBytes on some GPU architectures, etc...

▲

izacus

2 days ago

[-]

We call it "cache" don't we these days? And they've become massive - Apple M series and AMX Strix series have 24/32MB of L3 cache.

This is where a lot of their performance comes from.

▲

protimewaster

2 days ago

[-]

Is the cache on M series and Strix under control of the programmer? I was under the impression those were traditional caches and thus automatically handled by the hardware.

▲

izacus

1 day ago

[-]

*AMD, obviously I meant AMD, not AMX :)

▲

ack_complete

2 days ago

[-]

Some do, some ARM-based devices have tightly coupled memory (TCM). The RP2040 in the original Raspberry Pi also has a 4K bank for each core intended for stack and per-core variables, though it is not limited to access only by that core.

The main disadvantage of such dedicated memory is inefficient usage compared to using that same amount of fast local memory to cache _all_ of main memory.

▲

bitwize

2 days ago

[-]

The TI-99/4A had 256 BYTES (128 words) of static RAM available to the CPU. All accesses the 16K of main memory had to be done through the video chip. This made a lot of things on the TI-99/4A slow, but there were occasional bits of brilliance where you see a tiny bit of the system it could've been. Thanks to the fast SRAM and 16-bit CPU, the smooth scrolling in Parsec was done entirely in software—the TMS9918A video chip lacking scroll registers entirely.

▲

corysama

2 days ago

[-]

“Shared mem” in CUDA and compute shaders works the same way. And, geometry shaders are very reminiscent of PS2 VU1 programming, during essentially computer shaders that can output directly to the rasterizer without going through DRAM.

▲

MaxBarraclough

2 days ago

[-]

Sounds a little like the 10MB of EDRAM on the Xbox 360, although I think it was only accessible by the GPU.

https://en.wikipedia.org/wiki/Xbox_360_technical_specificati...

▲

scraft

2 days ago

[-]

On the PS2 there was a very small memory area, called the scratchpad, that was very quick to access, the rough idea on the PS2 was to DMA data in and out of the scratch pad, and then do work in the data, without creating contention with everything else going on at the same time.

In general most developers struggled to do much with it, it was just too small (combined with the fiddlyness of using it).

PS2 programmer's were very used to thinking in this way as it's how the rendering had to be done. There is a couple of vector units, and one of them is connected to the GPU, so the general structure most developers followed was to have 4 buffers in the VU memory (I think it only had 16kb of memory or something pretty small), but essentially in parallel you'd have:

1. New data being DMAd in from main memory to VU memory (into say buffer 1/4). 2. Previous data in buffer 3/4 being transformed, lit, coloured, etc and output into buffer 4/4. 3. Data from buffer 2/4 being sent/rendered by the GPU.

Then once the above had finished it would flip, so you'd alternate like:

Data in: B1 (main memory to VU) Data out: B2 (VU to GPU) Data process from: B3 (VU processing) Data process to: B4 (VU processing)

Data in: B3 Data out: B4 Data process from: B1 Data process to: B2

The VU has two pipelines running in parallel (float and integer), and every instruction had an exact number of cycles to process, if you read a result before it is ready you stall the pipeline, so you had to painstakingly interleave and order your instructions to process three verts at a time and be very clever about register pressure etc.

There is obviously some clever syncing logic to allow all of this to work, allowing the DMA to wait until the VU kicks off the next GPU batch etc.

It was complex to get your head around, set up all the moving parts and debug when it goes wrong. When it goes wrong it pretty much just hangs, so you had to write a lot of validators. On PS2 you basically spend the frame building up a huge DMA list, and then at the end of the frame kick it off and it renders everything, so the DMA will transfer VU programs to the VU, upload data to the VU, wait for it to process and upload next batch, at the end upload next program, upload settings to GPU registers, bacially everything. Once that DMA is kicked off no more CPU code is involved in rendering the frame, so you have a MB or so of pure memory transfer instructions firing off, if any of them are wrong you are in a world of pain.

Then throw in, just to keep things interesting, the fact that anything you write to memory is likely stuck in caches, and DMA doesn't seem caches, so extra care has to be taken to make sure caches are flushed before using DMA.

It was a magical, horrible, wonderful, painful, joyous, impossible, satisfying, sickening, amazing time.

▲

otabdeveloper4

2 days ago

[-]

> why don't we have that on other cpus?

We do, it's called "cache" or "registers".

▲

maximilianburke

2 days ago

[-]

It's definitely not registers; the SPEs had 128 128-bit registers each.

In some ways it's like cache, it has the latency of L1 cache (6 cycles), but it's fully deterministic in terms of access.

▲

lepicz

13 hours ago

[-]

as a programmer you have (almost) no control over cache. that's not what i meant.

registers ok, but i want at least one megabyte of them :)

▲

andrepd

2 days ago

[-]

Wow, outstanding! I have reading material for months.

▲

amelius

2 days ago

[-]

Can it run deep learning workloads?

▲

nxobject

2 days ago

[-]

The PS3 was used a few time in clusters – some NN work was done on it back in the day. My understanding (somewhat echoed in TFA) is that when programming Cell, you really needed to think about communication patterns to avoid quickly running into memory bandwidth limitations, especially given memory hierarchy and bus quirks.

https://open.clemson.edu/all_theses/629/

▲

cogman10

2 days ago

[-]

For a while, it was a major player in protein folding. I remember the PS3 was particularly apt at doing that sort of work.

▲

Tuna-Fish

2 days ago

[-]

For it's day, it packed a lot of compute into cheap package, so long as you could do something useful with a data set that fit into 256kB, the size of the local memory buffer on each SPE. If you overflowed that, the anemic system bandwidth would make it suck. Protein folding was an example of a problem that back then used tons of compute but could be fit into small space.

▲

specialp

2 days ago

[-]

It was the biggest contributor to folding @ home at one point. It came bundled with the PS3 and played relaxing music and showed a heat map of the world ps3 compute nodes as it went on. There was also https://en.wikipedia.org/wiki/PlayStation_3_cluster

▲

867-5309

2 days ago

[-]

they've also been used for crypto mining/cracking

▲

frankbell

2 days ago

[-]

▲

zorgmonkey

2 days ago

[-]

With enough effort you could definitely do it. Just remember it is a device that came out in 2006 and it has 256MB of system RAM and 256MB of VRAM, at best you're running a quite small model after a lot work trying to port some inference code to CELL processors. Honestly it does sound a cool excuse to write code for the CELL processors, but don't expect amazing performance or anything.

▲

duskwuff

2 days ago

[-]

It's a nearly 20 year old gaming console. Even if you could port a deep learning workload to run efficiently on the Cell architecture, it would be thoroughly outclassed by a modern cell phone (to say nothing of a desktop computer).

▲

maximilianburke

2 days ago

[-]

Eugh, maybe?

The PS3 only had 256mb of main memory so you'd be pretty limited there. Memory bandwidth, great at the time, is pretty poor by today's standards (25 gb/s)

▲

russell_h

2 days ago

[-]

https://en.wikipedia.org/wiki/PlayStation_3_cluster