Converting an Integer to a Decimal String in Under Two Nanoseconds
50 points
4 days ago
| 6 comments
| onlinelibrary.wiley.com
| HN
Nokinside
2 hours ago
[-]
Sounds familiar. If one of the authors Lemire? Of course.

SIMD-accelerated integer-to-string conversion https://lemire.me/blog/2026/05/18/simd-accelerated-integer-t...

Other speedy things:

On-Demand JSON: A Better Way to Parse Documents? https://lemire.me/en/publication/arxiv231217149/

Parsing Millions of URLs per Second https://lemire.me/en/publication/arxiv231110533/

Transcoding Unicode Characters with AVX-512 Instructions https://lemire.me/en/publication/arxiv221205098/

reply
childintime
1 hour ago
[-]
What will be the lifetime of AVX512? There have been many similar extensions before it. So it's a great result, but heavily marked by the target platform. I have the hope that RISC-V vector extensions will prove to be the more durable substrate to develop on, and a result there would be much more relevant for the future.
reply
simonask
51 minutes ago
[-]
It will be literal decades before RISC-V becomes mainstream. Not because it’s not a perfectly fine ISA, but because business incentive structures aren’t nowhere near supporting it.

Literal man-millennia have been poured into writing software for both x86 and ARM, and nobody seems close to designing a competitive RISC-V chip.

reply
xlii
3 hours ago
[-]
I wonder if this can be categorized as galactic algorithm. I can't imagine systems where bulk of processing goes into integer to decimal string conversion but maybe there are such.

https://en.wikipedia.org/wiki/Galactic_algorithm

reply
oersted
2 hours ago
[-]
My understanding of a Galactic Algorithm is that it has better performance scaling based on input size/complexity, but its overhead is such that it will not actually be faster unless you use it for impracticality large inputs.

I don’t think it has much to do with the case of an algorithm that offers a faster solution to a problem that is rarely a bottleneck (not sure if that’s true in this case anyway).

reply
Tuna-Fish
2 hours ago
[-]
It takes a substantial amount of time when emitting lots of numbers in JSON, happens very commonly.

And this algorithm has low constant costs, and does not take dramatically more icache than the simple versions. There is no reason not to use this if your compile target can handle avx-512.

reply
superjan
33 minutes ago
[-]
It’s faster for 3 digits and more. 3 digits is not galactic scale. Otoh, if over half of your numbers are single digits, it will lose to other implementations. I think that is more often the case that we’d like it to be.
reply
adrian_b
2 hours ago
[-]
I always use binary interchange formats between programs so I am not familiar with the overhead caused by format conversions. Even when displaying numbers for reading them, in the case of floating-point numbers that are displayed in the "scientific" format, i.e. with exponents, I prefer to have only the exponent as a decimal number, but the significand as a hexadecimal number. So I do not need fast algorithms for number conversions.

Nonetheless, there are plenty of people who advocate the use of JSON, XML and similar formats, in which case I assume that number conversions can take a non-negligible time, which might be decreased by such fast algorithms.

reply
superjan
23 minutes ago
[-]
You know, if can change code without overhead to ends of the pipeline, using the language & library of my choice, I’d do this too. For many of us this isn’t always the case.
reply
Cold_Miserable
3 hours ago
[-]
This is just a worse copy of the original ifma method. Sneller is even better for max throughput.
reply
IshKebab
3 hours ago
[-]
Very impressive! But yeah AVX-512 is an awkward requirement.
reply
adrian_b
2 hours ago
[-]
There already exists a large installed base of AMD Zen 4 and Zen 5 CPUs.

Next year, these AVX-512 supporting CPUs will be joined by AMD Zen 6 and Intel Nova Lake. Starting with Intel Nova Lake, all future Intel CPUs will support AVX-512.

reply
IshKebab
54 minutes ago
[-]
Sure, it's not just the support though. As I understand it it also has serious power and frequency implications. Also if your process uses AVX-512 you suddenly have an extra 2kB of data to save/restore on context switches. Maybe not super significant but I really doubt this will ever make it into standard libraries.
reply
jqpabc123
4 days ago
[-]
Our design exploits the AVX-512 instruction set

AVX-512 is being discontinued in newer Intel consumer CPUs, particularly with the Alder Lake series, where it has been completely disabled through BIOS updates.

reply
adrian_b
3 hours ago
[-]
Your comment is obsolete.

AVX-512 had been discontinued in the CPU generations from Alder Lake until the Panther Lake, Wildcat Lake and Clearwater Forest CPUs introduced during the first half of 2026, but Intel has committed than all future Intel CPUs will implement the complete 512-bit variant of the AVX-512 a.k.a. AVX10 ISA, starting with the Nova Lake desktop and laptop CPUs, to be launched by the end of this year.

Obviously, the competition from the AMD Zen 4, Zen 5 and Zen 6 CPUs, all of which implement AVX-512 and easily beat any Intel CPU in any workload that has been updated to use the AVX-512 ISA, has forced Intel to reconsider its previous decision.

reply
anematode
4 hours ago
[-]
To the contrary, Nova Lake, coming out this year, will have it.
reply
yvdriess
3 hours ago
[-]
And that's a shame, but the relevant workloads typically run on server class CPUs.
reply
adrian_b
3 hours ago
[-]
From all the workloads that I execute on my laptops or desktops, there is only one where the speed matters yet it is not significantly affected by the use of the AVX-512 ISA: the compilation of big software projects.

All the other things that I do and which can take a noticeable CPU time (i.e. not time used for waiting on SSDs or other peripherals) can be accelerated by AVX-512. This includes things like computing file hashes, data compression and encryption algorithms, graphics/audio/video algorithms and also EDA/CAD applications.

reply