The problem with LLVM has always been that it takes a long time to produce code. The post in the link promises a new backend that produces a slower artifact, but does so 10-20x quicker. This is great for debug builds.
This doesn’t mean the compilation as a whole gets quicker. There are 3 steps in compilation
- Front end: transforms source code into an LLVM intermediation representation (IR)
- Backend: this is where LLVM comes in. It accepts LLVM IR and transforms it into machine code
- Linking: a separate program links the artifacts produced by LLVM.
How long does each step take? Really depends on the program we’re trying to compile. This blog post contains timings for one example program (https://blog.rust-lang.org/2023/11/09/parallel-rustc/) to give you an idea. It also depends on whether LLVM is asked to produce a debug build (not performant, but quicker to produce) or a release build (fully optimised, takes longer).
The 10-20x improvement described here doesn’t work yet for clang or rustc, and when it does it will only speed up the backend portion. Nevertheless, this is still an incredible win for compile times because the other two steps can be optimised independently. Great work by everyone involved.
I agree that front-ends are a big performance problem and both rustc and Clang (especially in C++ mode) are quite slow. For Clang with LLVM -O0, 50-80% is front-end time, with TPDE it's >98%. More work on front-end performance is definitely needed; maybe some things can be learned from Carbon. With mold or lld, I don't think linking is that much of a problem.
We now support most LLVM-IR constructs that are frequently generated by rustc (most notably, vectors). I just didn't get around to actually integrate it into rustc and get performance data.
> The 10-20x improvement described here doesn’t work yet for clang
Not sure what you mean here, TPDE can compile C/C++ programs with Clang-generated LLVM-IR (95% of llvm-test-suite SingleSource/MultiSource, large parts of the LLVM monorepo).
This is the old "correctness versus performance" problem and we already know that "faster but wrong" isn't meaningfully faster it's just wrong, anybody can give a wrong answer immediately and so that's not at all useful.
The really difficult thing would be to write a new compiler backend with a coherent IR that everybody understands and you'll stick to. Unfortunately you can be quite certain that after you've done the incredible hard work to build such a thing, a lot of people's assessment of your backend will be:
1. The code produced was 10% slower than LLVM, never use this, speed is all that matters anyway and correctness is irrelevant.
2. This doesn's support the Fongulab Splox ZV406 processor made for six years in the 1980s, whereas LLVM does, therefore this is a waste of time.
But why would you bother, when with those same skills and a lot less time, you could fork LLVM, correct its IR semantics yourself (unilaterally), and then push people to use your fork?
(I.e. the EGCS approach to forcing the upstream to fix their shit.)
> This doesn's support the Fongulab Splox ZV406 processor made for six years in the 1980s, whereas LLVM does, therefore this is a waste of time.
AFAIK, the various Fongulab Sploxes that LLVM has targets for, are mostly there to act as forcing functions to keep around features that no public backend would otherwise rely on, because proprietary, downstream backends rely on those features. (See e.g. https://q3k.org/lanai.html — where the downstream ISA of interest is indeed proprietary, but used to be public before an acquisition; so the contributor [Google] upstreamed an implementation of the old public ISA target.)
As to the first point, I suspect this is a foundational problem. Like, suppose you realise the concrete used to make a new skyscraper was the wrong mixture. In a sense this is a small change, there's nothing wrong with the elevators, the windows, cabling, furnishing, air conditioning, and so on. But, to "fix" this problem you need to tear down the skyscraper and replace it. Ouch.
I may be wrong, I have never tried to solve this problem. But I fear...
Move away from classical UNIX compiler pipelines.
However in current times, I would rather invest into LLM improvements into generating executables directly, the time to mix AI into compiler development has come, and classical programming languages are just like doing yet another UNIX clone, in terms of value.
Also correctness guarantees? Hahaha... I'll pretend you didn't just claim C++ has correctness guarantees on par with other languages, LLVM or otherwise. C++ gives you next to nothing with respect to correctness guarantees.
But I can prove that this comment wasn’t LLM generated -> fuck you.
(LLMs don’t swear)
LLMs decline when asked to say fuck you. Gemini: “I am unable to respond to that request.” Claude: “I’d rather not use profanity unprompted.”
But allowing a fuck you would need a modification to the rules anyway, I suppose.
As far as I can tell it gets pareto improvements far above LLVM, Cranelift, and any WebAssembly backend out there. You'd expect there to be a rush to either adopt their techniques or at least find arguments why they wouldn't work for generalist use cases, but instead it feels like maintainers of the above project have absolutely no curiosity about it.
Yes there's no roadmap but it's flat out wrong that most contributions are from university students because hardly any contributions are from university students. It's actually an issue because we should be more supportive of student contributors! You can literally look at the contributors on GitHub and see that probably the top 100 are professionals contributing as part of their dayjob:
Thanks for pointing out that is a pile of useless knowledge, maybe I should waste my time reading other stuff more appropriate.
Should we go through an agenda?
I have no relation to the authors.
How come? The Copy-and-Patch Compilation paper reports:
> The generated code runs [...] 14% faster than LLVM -O0.
I don't have time right now to compare your approach and benchmark to theirs, but I would have expected comparable performance from what I had read back then.
The fact that they don't do a comparison against LLVM on larger benchmarks/functions or any other code they haven't written themselves makes that single number rather questionable for a general claim of being faster than LLVM -O0.
When used in a C/C++ compiler the stencils correspond to individual (or a few) LLVM-IR instructions which then leads to bad runtime performance. Also as mentioned, on larger functions register allocation becomes a problem for the Copy-and-Patch approach.
I thought they used the same technique (pre-generating machine code snippets in a high-level language)?
Your work is greatly appriciated. With unit tests everywhere, faster compiling is more important than ever.
In JIT compilation, a fast baseline is always useful. LLVM is obviously not a great fit (the IR is slow to generate and inspect), but for projects that don't want to roll their own IR and use LLVM for optimized builds anyway, this is an easy way to drastically reduce the startup latency. (There is a JIT case study showing the overhead of LLVM-IR in Section 7/Fig. 10 in the paper.)
> And if a project is not large one then build times should not be that much of a problem.
I disagree -- I'm always annoyed when my builds take longer than a few seconds, and typically my code changes only involve fewer compilation units than I have CPU cores (even when working on LLVM). There's also this study [1] from Google, which claims that even modest improvements in build times improve productivity.
[1]: https://www.computer.org/csdl/magazine/so/2023/04/10176199/1...
I am both hands for faster -O1 build times though. Point taken.
(Which... that seems like an interesting idea, now that I think of it. What, if anything, could such a flag hint the compiler to do?)
Favor throughput over latency?
What does TPDE stand for? Even on their own pages, their github, their docs, their landing pages, they never define it.
I wonder what such a "typical" subset is. How exotic should something be to not work with it?
[0]: news.ycombinator.com/pool