It's prefill; slow prefill kills agentic workloads dead.
If you have 100,000 tokens at ~150tok/s per the OP, you're looking at:
You have: 100000 / (150/s)
You want: hms
11 min + 6.6666667 sec
Which is quite a wait indeed.- In 2017, the v100 was a ~$10,000 GPU. I believe there was a PCI-e version but this is probably so cheap because SXM2 is going to be harder to use;
- A 5090 has 1800GB/s of internal memory bandwidth (compared to 900GB/s in the 9 year old GPU). Of course a 5090 is substantially more expensive;
- A 5090 has ~21k CUDA cores vs ~5k;
- The current $10k NVidia GPU is the RTX 6000 Pro w/ 96GB of VRAM. It has slightly more CUDA cores but it otherwise pretty much just a 5090. This is unsurprising. NVidia uses VRAM for market segmentation.
Consider this: in 5-10 years, the trillions spent on AI data centers will likewise be sold for scrap most likely. That's how short the runway is for OpenAI and Anthropic to recover that investment.
Anyway, I'm kind of impressed the author managed to get this all to work. I don't think it even would've occurred to me that someone had made an SXM2 adapter, particularly because it's not even used anymore. Like props to whoever did that.
Even more interesting: it'll devalue all of SaaS and the entire US tech sector.
We might have just shot our most valuable non-AI tech products in the foot.
Had to stop there. Annoying. I can't stand AI use for writing. It makes any otherwise great article feel so disingenuous.
The thought of throwing away working cards sounds so bizarre to me. I can't believe companies would dispose them into the landfill like that, it is at least worth giving away for refuse.
Because humans write exactly like this /s
The project is still very cool, but it’s a little less enjoyable to read when everything sounds the same. It would be just as annoying for people to manually write in a corporate/marketing style, because humanity is what makes the small web interesting.
Not from individual human content, that's for sure - maybe MLM marketing copy? Sleazy 4AM ads?
I mean, every time this response comes up, I keep asking the person to point at something written prior to 2022 that gets 80%+ on the LLM detectors, and yet no one can find anything.
Maybe you, postalrat, can find something written in this style that was published prior to 2022.
Classic LLM writing style.
Isn't a rasbpi with 16gb of RAM $300 now?