Mark that out in 2d with axes of input weight precision and activation weight precision, you could perhaps do sweeps to find the best accuracy per parameter bit, or accuracy/speed, or some sweet spot that has a nice balance of operating speed, accuracy, and model size.
Regarding your point that "90% of the benefit of KANs can be gained from a small variety of function shapes": even within the B-spline basis, the shapes are quite uniform. Much of the actual benefit of scaling up the basis size comes from learning more complex, piecewise-polynomial activation functions. Scaling up the number of basis functions (i.e. more granular intervals) also increases locality and allows the activation function's value across different parts of the domain to be learned semi-independently. (There obviously is a tradeoff here with overfitting.)
The number of basis functions (G+S) is largely what determines how expressive the activation is, as it relates to your point: "you could have a representation that scales from a standard relu perceptron though KANs to something with weighted inputs and fancy weighted activation functions."
Precision in the activation function is targetting a part of neural networks that you don't want. There are many other methods that work with high precision. You use neural networks because of their implicit bias toward regular solutions. That means there is a sweet spot at low precision that you're targetting.
However, on GPUs, KAN implementations are far less efficient than MLPs: since B-spline locality is hard to exploit and lookup operations aren't as efficient. This is your original point about MLPs training and running faster on modern architectures: each KAN layer is more expressive, but its poor hardware efficiency makes it a net negative (at least for current approaches).
On FPGAs, LUT lookups are cheap, so KANs' expressive layers map to very hardware-efficient implementations, and the resulting networks are thus much more compact and efficient than equivalent MLPs.
On your second point: low precision is certainly viable for both inference and learning (as shown in our work), and quantization can even have a mild regularizing effect. However, task performance generally worsens with lower precision (here and across the literature): the use of low precision is fundamentally a result of the efficiency-performance tradeoff.
I'm still very skeptical of arguing for KANs as an eventual replacement, like I've seen some papers on the subject argue. The reduced depth may not be an advantage. For example, higher depth for standard neural networks doesn't just add to expressivity, it actually induces spectral sparsity bias. KANs have a bias of their own, but it is different, and is sometimes better, sometimes worse, depending on the task. If increasing depth turns out to be important, KANs might remain less efficient overall.
I wonder how much of that is not so much the overall task but the need to build up to a complex state where KANs can excel. If you consider the classic neuralnet edge detector example, it's hard to imagine a KAN doing the task more efficiently, it seems like a necessary task as part of the overall process but delegating a more capable system to a menial task is probably wasting resources.
One layer of conv2d might be enough to turn pixels into something that KANs manage better.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
EDIT: Oh, on second read, do you mean you're running the model on an FPGA?
But the blue monkeys are metal rods with radio and the forest matrix are forest wide fungal colonies.
One primary application of this work is in high-energy physics (https://home.cern/smarter-decisions-at-the-speed-of-collisio...). Ultrafast and real-time learning is also very applicable for problems in quantum computing, plasma control, etc. (https://arxiv.org/pdf/2602.02005).
As one example, I've shoved <100 parameter networks into driver code before and hand-tuned them to run in 10-20 nanoseconds. E.g., touchpad hardware tends to suck, especially as it ages, sometimes generating thousands of phantom events per second and causing drift and other such issues. Typically that's solved via careful tuning of hysteresis and other parameters, but the problem is actually very amenable to neural nets. It's easy to collect good-enough data en masse, and you can tune precision vs recall to bias heavily toward dropping more events without any issues (doing so has the effect of slightly slowing down the mouse pointer, which you can compensate for at the OS level where you adjust pointer speed) to achieve 100% reduction of the phantom events.
Lots of image recognition tasks ( like spotting undesirable products in industrial settings), image modification tasks (I have some models locally to process hand-drawn images and unwarp them, remove notebook paper lines, etc), audio modification tasks (part of my editing pipeline includes hand-editing audio to achieve some effect, doing that a few times, and training models to copy that edit), and all sorts of other things are similarly doable in much smaller models than you might think -- not as small as that driver code, but still small enough to fit in hobbyist FPGAs.
Not all of those require low latency or high throughput, but audio processing is expensive, so high throughput is nice; industrial applications often operate on fast streams of many products, so both throughput and latency are important; and more generally when you have fast models available (or any fast code really) you'll tend toward different thought patterns and creative ideas which you wouldn't have even considered otherwise and which wouldn't be possible without those faster solutions.
Now that I think about it, we average 1.5M inferences per second at $WORK, expected to scale up 10-30x this year, and we have a moderately tight latency budget. This solution wouldn't fit without a larger, more expensive FPGA, at least not unless KANs are comparatively that much more expressive than our current solution (based on past experimentation, my hunch is that they're not, but you never know), but it's borderline useful.
HN comments page on that is here: https://news.ycombinator.com/item?id=40219205
I'm a big fan of KANs. The really seem like the start of something big and new. We've got a couple of papers out and in the works on KANs. The most relevant to OP's is this one: https://arxiv.org/abs/2512.15742v2
And we just put up a general primer on KANs on YT: https://youtu.be/wgcSsJ69x1c?si=fiUl1YGTgaTt_bn9 Fun stuff if you want to get into the weeds.
And if you are really interested in KANs, you should really check out Ziming (KAN creator)'s blog: https://kindxiaoming.github.io/blog/
Is the work predefined,ie, how much will each person do?
> Some have the group leader as the first name, or the 'big shot' is always first. My impression is that in medicine it is often a kind of ranking from 'most to least' main author
Does this affect your standing in workplace or industry? When we read about papers in scientific journals, news, or even arxiv, they mostly refer to the work based on first name, like “potato-peeler et al”. Sure it’s for brevity But when you look at the authors, there maybe 10 ppl listed. I have always wondered how do they get recognised. Since Like you mentioned they have to contribute. If they contribute but their names get swallowed up within “et al”, how does someone know how much was their contribution?
Not everyone in quant is a centi-millionaire, probably almost none of them in r&d actually.
p.s. Thanks for posting this and welcome to HN!