FilterHN

ARM adds neural accelerators to GPUs

176 points

by dagmx

6 days ago

| past

| 12 comments

| newsroom.arm.com

| HN

▲

armchairhacker

3 days ago

[-]

I think ML has lots of potential in this area specifically.

Imagine a game with bare-bones graphics and lighting, and a NN that converts it into something pretty. Indie developers can make AA-looking games and all game developers can devote more effort into design and logic. Artists will still be needed for art direction and possibly fine-tuning, although there will be less needed for each game (also less developers needed with AI agents and better tools).

Related, ML also has potential for AI enemies (and allies). Lots of players still prefer multiplayer, in part because humans are more realistic enemies (but also because they want to beat real humans); but multiplayer games struggle because good netcode is nontrivial, servers are expensive, some players are obnoxious, and most games don’t have a consistent enough playerbase.

▲

mdp2021

3 days ago

[-]

> Imagine a game with bare-bones graphics and lighting

https://eu-images.contentstack.com/v3/assets/blt740a130ae3c5...

# The Art Of Braid: Creating A Visual Identity For An Unusual Game

https://www.gamedeveloper.com/design/the-art-of-braid-creati...

▲

cyanydeez

2 days ago

[-]

I don't think you have a realistic view of how this will be used.

First, porn.

Second, artificial botting to make your game look active.

Third, hire a art developer in india, VPN them to your AI tool, fire them when the game is done.

You really should check your prescription rose colored glasses.

▲

serf

2 days ago

[-]

>Second, artificial botting to make your game look active.

MMOs have been using artificial players produced by the developers since at least the early EverQuest days.

The choice space in an mmo isn't that great, it's trivial to make a realistic acting NPC that mimics player behaviors and hand-wave the poor language capability as the other player being a unable to understand your chosen language.

NCSoft was involved in things like this in the early Lineage days and were fined for it. I would have a real hard time thinking this behavior is now uncommon given how low the fruit is.

Deck an NPC-Player in the most expensive cash-shop goods and have it stand around in a social area doing emotes but otherwise silent just to make the other players jealous and apt to purchase goods -- self-generated whale-bait.

▲

viraptor

2 days ago

[-]

> artificial botting to make your game look active.

There's no reason to involve an NN in this one. We had convincing bots with varied behaviours for ages.

▲

BobbyJo

2 days ago

[-]

So? Why would that stop us from doing cool things for games?

▲

N_Lens

3 days ago

[-]

This article 2 links deep had better technical details -

https://community.arm.com/arm-community-blogs/b/mobile-graph...

Upscaling solution mainly targeted at mobile gaming, with an 'AI pipeline' for upscaling graphics (They claim 540p upscaled to 1080p at 4ms per frame). I'm a bit skeptical because this is a press release for chips that are in the works and claim to be releasing in DEC-26, and then on actual devices after that. So sounds more like a strategic/political move (Perhaps stock price related manoeuvring).

Unreal Engine 5 plugin will allow previewing the upscaled effects using the though, which will be nice for game developers.

▲

wmf

3 days ago

[-]

It's a copy of DLSS/FSR4 which are pretty well understood by now. As for the schedule, Arm always announces IP ahead of time.

▲

ksec

2 days ago

[-]

Is DLSS really that mature by now? I thought only DLSS 4 was good enough and we should still have ways to improve on it.

And there seems to be a lot of hate towards DLSS from Gaming community.

▲

wmf

2 days ago

[-]

DLSS 2.x is pretty good; I'd expect Arm NSS 1.0 to be similar to that.

▲

bobajeff

6 days ago

[-]

It sounds like this a geared towards games. However, I like the idea of exposing all of the ML features through Vulkan extensions rather than some proprietary API. Though I think exposing them through OpenCL extensions would work for me as well.

▲

pjmlp

2 days ago

[-]

Extension spaghetti is hardly any better when each vendor does its own way.

▲

cubefox

3 days ago

[-]

There are now at least three ways to accelerate machine learning models on consumer hardware:

  - GPU compute units (used for LLMs)
  - GPU "neural accelerators"/"tensor cores" etc (used for video game anti-aliasing and increasing resolution or frame rate)
  - NPUs (not sure what they are actually used for)

And of course models can also be run, without acceleration, on the CPU.

▲

colejohnson66

3 days ago

[-]

An "NPU" is a matrix multiplier accelerator. It removes some general-purpose stuff that GPUs provide in favor of more "AI"-useful units, like support for values a byte or smaller (i.e., FP4, INT4, etc.).

▲

MobiusHorizons

2 days ago

[-]

I think NPUs are often aimed at efficient matmul performance. Not all implementations are significantly faster than vector units in the CPU, but they use much lower power. Gpu acceleration is typically much faster than the CPU, but also higher power.

▲

cubefox

2 days ago

[-]

All three of them accelerate matrix multiplications actually.

▲

almostgotcaught

2 days ago

[-]

any thing that computes matmul faster than by hand technically accelerates matmul - so what's your point?

▲

atq2119

2 days ago

[-]

At least for desktop gaming, the tensor cores are in the GPU compute units (SM), same as for the big data center GPUs.

It seems ARM believe it makes sense to go a different route for mobile gaming.

▲

catgary

2 days ago

[-]

From what I can tell, NPUs are mostly being used by Microsoft to encourage vendor lock-in to the MicrosoftML/ONNX platform (similar to their DirectX playbook).

▲

jonas21

2 days ago

[-]

They're used a lot on mobile. Apple uses their "neural engine" NPU to power their on-device ML stuff and Samsung does something similar in their Exynos processors. Apple also exposes the NPU to developers via CoreML.

▲

pjmlp

2 days ago

[-]

Extension spaghetti is hardly any better, just because it says the same API name on the tin.

Google and Apple have been doing NPUs for a while now.

▲

bigyabai

2 days ago

[-]

Yeah, and the NPUs have displaced approximately 0% of the hardware demand for Nvidia products. Real snipe chase they've commit to.

Extension spaghetti is fine, I'd much rather end up with AI acceleration being handled like Vulkan than suffering a fate like Metal or DirectX.

▲

pjmlp

2 days ago

[-]

Failure to understand why developers chose CUDA, is exactly why NVidia keeps selling.

Same applies to proprietary 3D APIs.

There is a reason why only FOSS devs make such big fuss out of APIs, while professional game studios keep talk about how to take each hardware to its limits at GDC, since 8 bit heterogeneous home game systems.

▲

jms55

1 day ago

[-]

For people not familiar with how ML is being used by games, checkout this great and very recent SIGGRAPH 2025 course https://dl.acm.org/doi/suppl/10.1145/3721241.3733999. Slides are in the supplementary material section, and code is at https://github.com/shader-slang/neural-shading-s25.

Neural nets are great for replacing manually-written heuristics or complex function approximations, and 3d rendering is _full_ of these heuristics. Texture compression, light sampling, image denoising/upscaling/antialiasing, etc.

Actual "generative" API in graphics is pretty rare, at least currently. That's more of an artist thing. But there's a lot of really great use cases for small neural networks (think 3-layer MLPs, absolutely nowhere near LLM-levels of size) to approximate expensive or manually-tuned heuristics in existing rendering pipeline, and it just so happens that the GPUs used for rendering also now come with dedicated NPU accelerator things.

▲

imbusy111

3 days ago

[-]

I figured there is a need for generating a lot of samples and building a predictive model per game for best results. Documentation confirms:

> Most of these corner cases can be resolved by providing the model with enough training data without increase the complexity and cost of the technique. This also enables game developers to train the neural upscalers with their content, resulting in a completely customized solution fine-tuned for the gameplay, performance, or art direction needs of a particular title.

Source: https://developer.arm.com/documentation/111019/latest/

▲

ltbarcly3

2 days ago

[-]

"Arm neural technology is an industry first, adding dedicated neural accelerators to Arm GPUs"

HiSilicon Kirin 970 had an NPU in like 2017. I think almost every performance-oriented Arm chip released in the last 5 years has had some kind of NPU on it.

I suspect they are using Arm here to mean "Arm-the-company-and-brand" not "Arm the architecture", which is both misleading and makes the claim completely meaningless.

▲

atq2119

2 days ago

[-]

The marketing speak isn't exactly clear, but I believe the point is that this is like an NPU inside of the GPU instead of next to it as a separate device. That would indeed be new, and I can see how it'd be beneficial to integration with games.

▲

ginko

2 days ago

[-]

“ARM” is the architecture. “Arm” is the company.

▲

adrian_b

2 days ago

[-]

Not really, i.e. not any more.

In all recent documents issued by the Arm company, "Arm" is used for the architecture, i.e. the architecture variants are named "Armv6", "Armv7", "Armv8", "Armv9".

▲

ksec

2 days ago

[-]

At this point, IMG / PowerVR isn't even used by MediaTek. Which means GPU on Mobile is just Apple, Qualcomm Adreno, ARM Mali. Still wish ARM had rebranded their Mali range.

Samsung Exynos uses AMD RDNA but I am not even sure if they are being used at all. Nvidia seems to have no interest in the market.

▲

ryao

2 days ago

[-]

Nvidia has the Tegra line, but the market is not interested in it outside of game consoles.

▲

msh

2 days ago

[-]

Or Qualcomm used their monopoly to keep nvidia out of phones.

▲

TiredOfLife

2 days ago

[-]

Before Switch basically every company made 1-2 Tegra products only to newer use Nvidia again. Tegra was late and bad.

▲

dagmx

2 days ago

[-]

Qualcomm wouldn’t have to try to do that. Tegra was basically a side project for Nvidia that they barely care about till the switch came along.

▲

msh

2 days ago

[-]

The launch platform for android tablets was tegra chips. They were also popular in automotive.

▲

dagmx

2 days ago

[-]

Yeah and then NVIDIA got bored and didn’t iterate at the same rate as everyone else.

▲

TiredOfLife

2 days ago

[-]

Apple is basically PowerVR with serial numbers partially filed off

▲

flakiness

3 days ago

[-]

hardware-wise, this seems like a NVIDIA TensorCore? via https://huggingface.co/Arm/neural-super-sampling/blob/main/2...

- https://github.com/KhronosGroup/Vulkan-Docs/blob/5d386163f25... Adding tensor ops to the shader kernel vocaborary (SPIR-V). Promising.

- https://github.com/KhronosGroup/Vulkan-Docs/blob/5d386163f25... Adding TenforFlow/NNAPI/-like graph API. Good luck.

▲

cubefox

3 days ago

[-]

Yes

▲

xgkickt

1 day ago

[-]

I really would have thought we’d have ROP-less compute units or even texture sampling instructions on CPUs by now instead.

▲

Roark66

2 days ago

[-]

ARM adds... Since I saw the first arm based soc (rockchip rk3566) every so came with npu accelerator. Usually pretty small ones. 0.5 Tops (int8) etc.

The novel thing seems to be that they will make it a part of the GPU? Really? Even my Samsung Galaxy S7 (quite few years old by now) supported Vulcan and run neural nets pretty well with Vulcan etc.

Where is the novelty?

▲

MobiusHorizons

2 days ago

[-]

Low power.

▲

westurner

2 days ago

[-]

How many TOPS/WHr?