[1]https://www.servethehome.com/nextsilicon-maverick-2-brings-d...
For a minute I thought maybe it was Risc-V with a big vector unit, but its way different from that.
By the time special chips were completed and mature, the developers of "mainstream" CPUs had typically caught up speedwise in the past, which is why we do not see any "transputers" (e.g. Inmos T800), LISP machines (Symbolics XL1200, TI Explorer II), or other odd architectures like the Connection Machine CM-2 around anymore.
For example, when Richard Feynman was hired to work on the Connection Machine, he had to write a parallel version of BASIC first before he could write any programs for the computer they were selling: https://longnow.org/ideas/richard-feynman-and-the-connection...
This may also explain failures like Bristol-based CPU startup Graphcore, which was acquired by Softbank, but for less money than the investors had put in: https://sifted.eu/articles/graphcore-cofounder-exits-company...
Modifications are likely on the level of: Does this clang support my required c++ version? Actual work is only required when you want to bring something else, like Rust (AFAIK not supported).
However, to analyze the efficiency of the code and how it is interpreted by the card you need their special toolchain. Debugging also becomes less convenient.
If they provide a compiler port and update things like BLAS to support their hardware then higher level applications should not require much/any code modification.
So really the Mill-Core is in a way the expression of the customer's code. really.
I can't access the page directly, because my browser doesn't leak enough identifying information to convince Reuters I'm not a bot, but an actual bot is perfectly capable of accessing the page.
I guess it it hard to compare chip for chip but the question is, if you are building a supercomputer (and we ignore pressure to buy sovereign) then which is better bang for the buck on representative workloads?
Support for operating systems, compilers, programming languages, etc.
This is why a Raspberry Pi is still so popular even though there are a lot of cheaper alternatives with theoretically better performance. The software support is often just not as good.
"It uses technology called RISC-V, an open computing standard that competes with Arm Ltd and is increasingly being used by chip giants such as Nvidia and Broadcom."
So the fact that rpi tooling is better than the imitators and it has maintained a significant market share lead is relevant. Market share isn't just about performance and price. It's also about ease of use and network effects that come with popularity.
ARM, x86, and CUDA-capable stuff is available off the shelf at Best Buy. This means researchers don't need massive grants or tremendous corporate investment to build proofs of concepts, and it means they can develop in their offices software that can run on bigger iron.
IBM's POWER series is an example of what happens when you don't have this. Minimum spend for the entry-level hardware is orders of magnitude higher than the competition, which means, practically speaking, you're all-in or not at all.
CUDA is also a good example of bringing your product to the users. AMD spent years locking out ROCm behind weird market-segmentation games, and even today if you look at the 'supported' list in the ROCm documentation it only shows a handful of ultra-recent cards. CUDA, meanwhile, happily ran on your ten-year-old laptop, even if it didn't run great.
People need to be able to discover what makes your hardware worth buying.
The main product/architecture discussed has nothing to do with vector processors or riscv.
It's a new, fundamentally different data-flow processor.
Hopefully we will improve in explaining what we do and why people may want to care.
Systolic arrays often (always?) have a predefined communication pattern and are often used in problems where data that passes through them is also retained in some shape or form.
For NextSilicon, the ALUs are reconfigured and rewired to express the application (or parts of) on the parallel data-flow acclerator.
GreenArray processors are complete computers with their own memory and running their own software. The GA144 chip has 144 independently programmable computers with 64 words of memory each. You program each of them, including external I/O and routing between them, and then you run the chip as a cluster of computers.
One distinction of the GreenArrays chip is that they claim it is very energy-efficient.
In a way, this is not new, it’s pretty much what annapurna did: they took ARM and got serious with it, creating the first high performance arm cpus. Then they got acqui-hired by amazon and the rest is history ;)
From a cursory read-through, it isn’t clear where the high-leverage point is in this silicon. What is the thing, at a fundamental level, that it does better than any other silicon? It seems pretty vague. I’m not saying it doesn’t have one, just that it isn’t obvious from the media slop.
What’s the specific workload where I can abuse any other silicon at the same task if I write the software to fit the silicon?
In other words, while the JIT can be applied to all code in principle, the nature of accelerated HW is that it makes sense where embarrassingly parallel workloads are around.
Having said that, NextSilicon != GPU, so different approach to acceleration of said parallel code.