But in practice C, Rust and Fortran are not really distinguishable on their own in larger projects. In larger projects things like data structures and libraries are going to dominate over slightly different compiler optimizations. This is usually Rust's `std` vs `libc` type stuff or whatever foundational libraries you pull in.
For most practical Rust, C, C++, Fortran and Zig have about the same performance. Then there is a notable jump to things like Go, C# and Java.
The big one is multi-threading. In Rust, whether you use threads or not, all globals must be thread-safe, and the borrow checker requires memory access to be shared XOR mutable. When writing single-threaded code takes 90% of effort of writing multi-threaded one, Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement. In C, the cost/benefit analysis is different. Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp. Risk of having to debug heisenbugs means that code typically won't be made multi-threaded unless really necessary, and even then preferably kept to simple cases or very coarse-grained splits.
I wouldn't consider there to be any notable effort in making thread build on target platforms in C relative to normal effort levels in C, but it's objectively more work than `std::thread::spawn(move || { ... });`.
Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism. Case in point, Go has no implicit memory safety with both races and atomicity issues being easy to make, and yet relies much heavier on concurrency (with a parallelism degree managed by the runtime) with much less consideration than Rust. After all, `go f()` is even easier.
(As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)
Is that beyond just "concurrency is tricky and a language that makes it easier to add concurrency will make it easier to add sneaky bugs"? I've definitely run into that, but have never written concurrent C to compare the ease of heisenbug-writing.
This is my experience too.
In some ways, this is kind of the core observation of Rust: "shared xor mutable". Aliasing is only an issue if the aliasing leads to mutability. You can frame it in terms of aliasing if you have to assume all aliases can mutate, but if they can't, then that changes things.
If you do not use that the generated code can be quite suboptimal in certain cases.
This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.
This to avoid that the underlying code gets monomorphized.
https://github.com/rust-lang/rust/blob/8c52f735abd1af9a73941...
There's no free lunch here. Reducing the amount of code that's monomorphised reduces the code emitted & improves compile times, but it reduces the scope of the code that's exposed to the input type, which reduces optimisation opportunities.
#pragma omp for
is a very low mental-overhead way to speed up code.OpenMP does nothing to prevent data races, and anything beyond simple for loops quickly becomes difficult to reason about.
It is easy to divide loop body into computation and share info update, the latter can be done under #pragma omp critical (label).
The we have the anecdotal "They failed firefox layout in C++ twice then did it in Rust" < to this I sigh in chrome.
It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.
> to this I sigh in chrome.
I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.
The hard part isn't splitting loop iterations between threads, but doing so _safely_.
Proving an arbitrary loop's iterations are split in a memory safe way is an NP hard problem in C and C++, but the default behavior in Rust.
Using pthread in C, for example, TBB is not required.
Not sure about C11 threads, but I have always thought that GLIBC just uses pthread under the hood.
C++26 will get another similar dependency, because BLAS algorithms are going to be added, but apparently the expectation is to build on top of C/Fortran BLAS battle tested implementations.
What about energy use and contention?
CPUs are most energy efficient sitting idle doing nothing, so finishing work sooner in wall-clock time usually helps despite overheads.
Energy usage is most affected by high clock frequencies, and CPUs will boost clocks for single-threaded code.
Threads waiting on cache misses let CPU use hyperthreading, which is actually energy efficient (you get context switching in hardware).
You can waste energy in pathological cases if you overuse spinlocks or spawn so many threads that bookkeeping takes more work than what the threads do, but helper libraries for multithreading all have thread pools, queues, and dynamic work splitting to avoid extreme cases.
Most of the time low speed up is merely Amdahl's law – even if you can distribute work across threads, there's not enough work to do.
On a backend system where you already have multiple processes using various cores (databases, web servers, etc) it usually doesn’t make sense as a performance tool.
And on an embedded device you want to save power so it also rarely makes sense.
Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so. Obviously if this is important in a particular context, it's probably worth measuring it in that context (e.g. embedded devices), but I suspect this is true more often than it isn't true.
Only if it leads to better utilisation. But in the scenario that the parent comment suggests, it does not lead to better utilisation as all cores are constantly busy processing requests.
Throughput as well as CPU time across cores remains largely the same regardless of whether or not you paralellise individual programs/requests.
That said, I suspect it's a rare case where you really do have perfect core utilisation.
In addition to my sibling comments I would like to point out that multithreading quite often can save power. Typically the power consumption of an all core load is within 2x the power consumption of a single core load, while being many times faster assuming your task parallelizes well. This makes sense b/c a fully loaded cpu core still needs all the L3 cache mechanisms, all the DRAM controller mechanisms, etc to run at full speed. A fully idle system on the other hand can consume very little power if it idles well(which admittedly many cpus do not idle on low power).
Edit:
I would also add that if your system is running a single threaded database, and a single threaded web server, that still leaves over a hundred of underutilized cores on many modern server class cpus.
If you use a LAMP style architecture with a scripting language handling requests and querying a database, you can never write a single line of multithreaded code and already are setup to utilize N cores.
Each web request can happen in a thread/process and their queries and spawns happen independently as well.
Are people making user facing apps in rust with GUIs?
We are talking not only about Rust, but also about C and C++. There are lots of C++ UI applications. Rust poses itself as an alternative to C++, so it is definitely intended to be used for UI applications too - it was created to write a browser!
At work I am using tools such as uv [1] and ruff [2], which are user-facing (although not GUI), and I definitely appreciate a 16x speedup if possible.
Multithreading is an invaluable tool when actually using your computer to crunch numbers (scientific computing, rendering, ...).
yes
To over explain, if you just need to make N forks of the same logic then it’s very easy to do this correctly in C. The cases where I’m going to carefully maintain shared mutable state with locking are cases where the parallelism is less efficient (Ahmdal’s law).
Java style apps that just haphazardly start threads are what rust makes safer. But that’s a category of program design I find brittle and painful.
The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.
This is a huge limitation of C's compilation model, and basically every other language since then does it differently, so not sure if that's a good example. You do want some "interconnection" between translation units, or at least less fine-grained units.
What’s better? Rust? Haskell? Swift?
It’s very hard to do multithreading at a more granular level without hitting amdahl’s law and synchronization traps.
In c the caller isn’t choosing typically. The author of some library or api decides this for you.
This turns out to be fairly significant in something like an embedded context where function pointers kill icache and rob cycles jumping through hoops. Say you want to bit bang a bus protocol using GPIO, in C with function pointers this adds maybe non trivial overhead and your abstraction is no longer (never was) free. Traits let the caller decide to monomorphize that code and get effectively register reads and writes inlined while still having an abstract interface to GPIO. This is excellent!
I'd usually rather have a nice language-level interface for customizing implementation, but ELF and Linux scripting is typically good enough. Binary patching is in a much easier to use place these days with good free tooling and plenty of (admittedly exploit-oriented) tutorials to extrapolate from as examples.
Tbf this applies to Rust too. If the author writes
fn foo(bar: Box<dyn BarTrait>)
they have forced the caller into dynamic dispatch.Had they written
fn foo(bar: impl BarTrait)
the choice would've remained open to the caller fn foo(bar: impl BarTrait)
and AFAIK it isn't possible to write that in C (though C++ does allow this kind of thing).This is monomorphized for every type you pass in, in short.
Not really. You can store it on any struct that specializes to the same type of the value you received. If you get a pre-built struct from somewhere and try to store it there, your code won't compile.
the struct in which it is stored, could be generic as well
No one would ask this question in the case where the struct is generic over a type parameter bounded by the trait, since such a design can only store a homogeneous collection of values of a single concrete type implementing the trait; the question doesn't even make sense in that situation.
The question only arises for a struct that must store a heterogeneous collection of values with different concrete types implementing the trait, in which case a trait object (dyn Trait) is required.
Someone down the line might be wondering why suddenly their Rust builds take 4x the time after merging something, and just maybe remembering this offhand comment will make them find the issue faster :)
That way you can get most of the speed of the Release version, with a fairly good chance of getting usable debug info.
A huge issue with C++ debug builds is the resulting executables are unusably slow, because the zero-cost abstractions are not zero cost in debug builds.
Similar capabilities could be made available in other compilers.
Now to hate a bit on MSVC - its Edit & Continue functionality makes debug builds unbearably slow, but at least it doesn't work, so my first thing is to turn that thing off.
In the extreme, you surely wouldn't accept a 1 day or even 1 week build time for example? It seems like that could be possible and not hypothetical for a 1 week build since a system could fuzz over candidate compilation, and run load tests and do PGO and deliver something better. But even if runtime performance was so important that you had such a system, it's obvious you wouldn't ever have developer cycles that take a week to compile.
Build time also even does matter for release: if you have a critical bug in production and need to ship the fix, a 1 hour build time can still lose you a lot here. Release build time doesn't matter until it does.
I've changed my approach significantly over time on how I debug (probably in part due to Rusts slower compile times), and usually get away with 2-3 compiles to fix a bug, but spend more time reasoning about the code.
Folks have worked tirelessly to improve the speed of the Rust compiler, and it's gotten significantly faster over time. However, there are also language-level reasons why it can take longer to compile than other languages, though the initial guess of "because of the safety checks" is not one of them, those are quite fast.
> How slow are we talking here?
It really depends on a large number of factors. I think saying "roughly like C++" isn't totally unfair, though again, it really depends.
(Uh oh, there's an em-dash, I must be an AI. I don't think I am, but that's what an AI would think.)
That's sort of part of it, but it's also specific language design choices that if they were decided differently, might make things faster.
Note that C++ also has almost as large problem with compile times with large build fanouts including on templates, and it's not always realistic for incremental builds to solve either especially time burnt on linking, e.g. I believe Chromium development often uses a mode with .dlls dynamic linking instead of what they release which is all static linked exactly to speed up incremental development. The "fast" case is C not C++.
Bevy, a Rust ECS framework for building games (among other things), has a similar solution by offering a build/rust "feature" that enables dynamic linking (called "dynamic_linking"). https://bevy.org/learn/quick-start/getting-started/setup/#dy...
Rust does make it a lot easier to use generics which is likely why using more traits appears to be the cause of longer build times. I think it's just more that the more traits you have, the more likely you are to stumble over some generic code which ultimately generates more code.
Aah, yes, that sounds more correct, the end result is the same, I failed to remember the correct mechanism that led to it. Thank you for the correction!
However, in the spirit of the question: someone mentioned the stricter aliasing rules, that one does come to mind on Rust's side over C/C++. On the other hand, signed integer overflow being UB would count for C/C++ (in general: all the UB in C/C++ not present in Rust is there for performance reasons).
Another thing I thought of in Rust and C++s favor is generics. For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it. I don't know if C compilers specialize qsort() based on comparison function this way. They might, but it's certainly a lot more to ask of the compiler, and I would argue there are probably many cases like this where C++ and Rust can outperform C because of their much more powerful facilities for specialization.
That's more of a critique of the standard libraries than the languages themselves.
If someone were writing C and cared, they could provide their own implementation of sort such that the callback could be inlined (LLVM can inline indirect calls when all call sites are known), just as it would be with C++'s std::sort.
Further, if the libc allows for LTO (active area of research with llvm-libc), it should be possible to optimize calls to qsort this way.
Sure, at the limit, I agree with you, but in reality, relying on the compiler to do any optimization that you care about (such as inlining an indirect function call in a hot loop) is incredibly unwise. Invariably, in some cases it will fail, and it will fail silently. If you're writing performance critical code in any language, you give the compiler no choice in the matter, and do the optimization yourself.
I do generally agree that in the case of qsort, it's an API design flaw
Rust defaults to the platform treatment of overflows. So it should only make any difference if the compiler is using it to optimize your code, what will most likely lead to unintended behavior.
On the other hand, writing a function that recovers from overflows in an incorrect/useless way still isn't helpful if there are overflows.
Now: the languages may expose patterns that a compiler can make use of to improve optimizations. That IS interesting, but it is not a question of speed. It is a question of expressability.
Saying that a language is about "expressability" is obvious. A language is nothing other than a form of expression; no more, no less.
Speed is a function of all three -- not just the language.
Optimizations for one architecture can lead to perverse behaviours on another (think cache misses and memory layout -- even PROGRAM layout can affect speed).
These things are out of scope of the language and as engineers I think we ought to aim to be a bit more precise. At a coarse level I can understand and even would agree with something like "Python is slower than C", but the same argument applies there as well.
But at some point objectivity ought to enter the playing field.
There is expressing idea via code, and there is optimization of code. They are different. Writing what one may think is "fully optimized code" the first time is a mistake, every time, and usually not possible for a codebase of any significant size unless you're a one-in-a-billion savant.
Programming languages, like all languages, are expressive, but only as expressive as the author wants to be, or knows how to be. Rarely does one write code and think "if I'm not expressive enough in a way the compiler understands, my code might be slightly slower! Can't have that!"
No, people write code that they think is correct, compile it, and run it. If your goal is to make the most perfect code you possibly can, instead of the 95% solution is the robust, reliable, maintainable, and testable, you're doing it wrong.
Rust is starting to take up the same mental headspace as LLMs: they're both neat tools. That's it. I don't even mind people being excited about neat tools, because they're neat. The blinders about LLMs/Rust being silver bullets for the software industry need to go. They're just tools.
It is an argument about economics. I can write C that is as fast as C++. This requires many times more code that takes longer to write and longer to debug. While the results may be the same, I get far better performance from C++ per unit cost. Budgets of time and money ultimately determine the relative performance of software that actually ships, not the choice of language per se.
I've done parallel C++ and Rust implementations of code. At least for the kind of performance-engineered software I write, the "unit cost of performance" in Rust is much better than C but still worse than C++. These relative costs depend on the kind of software you write.
Only if ignoring the C++ compile time execution capabilites.
I generally agree with your take, but I don't think C is in the same league as Rust or C++. C has absolutely terrible expressivity, you can't even have proper generic data structures. And something like small string optimization that is in standard C++ is basically impossible in C - it's not an effort question, it's a question of "are you even writing code, or assembly".
There is a similar argument around using "unsafe" in Rust. You need to use a lot of it in some cases to maintain performance parity with C++. Achievable in theory but a code base written in this way is probably going to be a poor experience for maintainers.
Each of these languages has a "happy path" of applications where differences in expressivity will not have a material impact on the software produced. C has a tiny "happy path" compared to the other two.
C and C++ don't actually have an advantage here because this is only limited to signed integers unless you use compiler-specific intrinsics. Rust's standard library allows you to make overflow on any specific arithmetic operation UB on both signed and unsigned integers.
"Culturally", C/C++ has opted for "unsafe-but-high-perf" everywhere, and Rust has "safe-but-slightly-lower-perf" everywhere, and you have to go out of your way to do it differently. Similarly with Zig and memory allocators: sure, you can do "dynamically dispatched stateful allocators that you pass to every function that allocates" in C, but do you? No, you probably don't, you probably just use malloc().
On the other hand: the author's point that the "culture of safety" and the borrow checker in Rust frees your hand to try some things in Rust which you might not in C/C++, and that leads to higher perf. I think that's very true in many cases.
Again, the answer is more or less "basically no, all these languages are as fast as each other", but the interesting nuance is in what is natural to do as an experienced programmer in them.
Another one is std::shared_ptr. It always uses atomic operations for reference counting and there's no way to disable that behavior or any alternative to use when you don't need thread safety. On the other hand, Rust has both non-atomic Rc and atomic Arc.
That issue predates move semantics by ages. The language always had very simple object life times, if you create Foo foo; it will call foo.~Foo() for you, even if you called ~Foo() before. Anything with more complex lifetimes either uses new or placement new behind the scenes.
> Another one is std::shared_ptr.
From what I understand shared_ptr doesn't care that much about performance because anyone using it to manage individual allocations already decided to take performance behind the shed to be shot, so they focused more on making it flexible.
I don't agree with you about shared_ptr (it's very common to use it for a small number of large/collective allocations), but even if what you say is true, it's still a part of C++ that focuses on safety and ignores performance.
Bottom line - C++ isn't always "unsafe-but-high-perf".
Then, I raise you to Zig which has unsigned integer overflow being UB.
Anyway that's a long way of saying that you're right, integer overflow is illegal behavior, I just think it's interesting.
https://doc.rust-lang.org/std/intrinsics/fn.unchecked_add.ht...
I think this is something of a myth. Typically, a C compiler can't inline the comparison function passed to qsort because libc is dynamically linked (so the code for qsort isn't available). But if you statically link libc and have LTO, or if you just paste the implementation of qsort into your module, then a compiler can inline qsort's comparison function just as easily as a C++ compiler can inline the comparator passed to std::sort. As for type-specific optimizations, these can generally be done just as well for a (void *) that's been cast to a T as they can be for a T (though you do miss out on the possibility of passing by value).
That said, I think there is an indirect connection between a templated sort function and the ability to inline: it forces a compiler/linker architecture where the source code of the sort function is available to the compiler when it's generating code for calls to that function.
I guess for your example, qsort() it is optional, and you can chose another implementation of that. Though I tend to find that both standard libraries tend to just delegate those lowest level calls to the posix API.
I'm actually very curious about how good C compilers are at specializing situations like this, I don't actually know. In the vast majority cases, the C compiler will not have access to the code (either because of dynamic linking like in this example, or because the definition is in another translation unit), but what if it does? Either with static linking and LTO, or because the function is marked "inline" in a header? Will C compilers specialize as aggressively as Rust and C++ are forced to do?
If anyone has any resources that have looked into this, I would be curious to hear about it.
The use of function pointers doesn't have much of an impact on inlining. If the argument supplied as a parameter is known at compile time then the compiler has no issue performing the direct substitution whether it's a function pointer or otherwise.
Your C comparator function is already “monomirphized” - it’s just not type safe.
I'm very happy to see the nuanced take in this article, slowly deconstructing the implicit assumptions proposed by the person asking this question, to arrive at the same conclusion that I long have. I hope this post reaches the right people.
A particular language doesn't have a "speed", a particular implementation may have, and the language may have properties that make it difficult to make a fast implementation (of those specific properties/features) given the constraints of our current computer architectures. Even then, there's usually too many variables to make a generalized statement, and the question often presumes that performance is measured as total cpu time.
It's a good thing to keep in mind when you read the comments on any article.
1. What costs does the language actively inject into a program?
2. What optimizations does the language facilitate?
Most of the time, it's sufficient to just think about the first point. C and Rust are faster than Python and Javascript because the dynamic nature of the latter two requires implementations to inject runtime checks all over the place to enable that dynamism. Rust and C simply inject essentially zero active runtime checks, so membership in this club is easy to verify.
The second one is where we get bogged down, because drawing clean conclusions is complicated by the (possibly theoretical) existence of optimizing compilers that can leverage the optimizability inherent to the language, as well as the inherent fragility of such optimizations in practice. This is where we find ourselves saying things like "well Rust could have an advantage over C, since it frequently has more precise and comprehensive aliasing information to pass to the optimizer", though measuring this benefit is nontrivial and it's unclear how well LLVM is thoroughly utilizing this information at present. At the same time, the enormous observed gulf between Rust in release mode (where it's as fast as C) and Rust in debug mode (when it's as slow as Ruby) shows how important this consideration is; Rust would not have achieved C speeds if it did not carefully pick abstractions that were amenable to optimization.
It's also interesting to think about this in terms of the "zero cost abstractions"/"zero overhead abstractions" idea, which Stroustrup wrote as "What you don't use, you don't pay for. What you do use, you couldn't hand code any better". The first sentence is about 1, and the second one is about what you're able to do with 2.
That is, most of the time, most of the users aren't thinking about how to squeeze the last tenth of a percent of speed out of it. They aren't thinking about speed at all. They're thinking about writing code that works at all, and that hopefully doesn't crash too often. How fast is the language for them? Does it nudge them toward faster code, or slower? Are the default, idiomatic ways of writing things the fast way, or the slow way?
Much of the language's semantics can be boiled away before JIT compilation, because that flexibility isn't in use at that time, which can be proven by a quick check before entering the hot code. (Or in the extreme, the JIT code doesn't check it at all, and the runtime invalidates that code lazily when an operation is performed that violates those preconditions.) Which thwarts people who do simple-minded comparisons of "what language is fastest at `for (i = 0; i < 10000000; i++) x += 7`?", because the runtime is entirely dominated by the hot loop, and the machine code for the hot loop is identical across all languages tested.
Still: you have to spend time JIT compiling. You have to do some dynamic checks in all but the innermost hot code. You have to materialize data in memory, even if just as a fallback, and you have to garbage collect periodically.
So I agree with your conclusion, except for perhaps un-nuanced use of the term "performance floor" -- there's really no elevated JS floor, at least not a global one; simple JS can generate the same or nearly the same machine code as equivalent C/C++/Rust, will use no more memory, and will never GC. But that floor only applies to a small subset of code (which can be the bulk of the runtime!), and the higher floor does kick in for everything else. So generally speaking, JS can only "be as fast" as non-managed languages for simple programs.
(I'll ignore the situations where the JIT can depend on stricter constraints at runtime than AOT-compiled languages, because I've never seen a real-world situation where it helps enough to counterbalance everything else.)
Speed is also not the only metric, Rust and C enable much better control over memory usage. In general, it is easier to write a memory-efficient program in Rust or C than it is in JS.
The only case where one language is likely to be inherently faster than another is when the other language is so high level or abstracted away from the processors it is going to run on that an optimizing compiler is going to have a hard time bridging that gap. It may take more work for an optimizing compiler to generate good code for one language than another, for example by having to recognize when aliasing doesn't exist, but again this is ultimately a matter of implementation not language.
The Mythical Sufficiently Smart Compiler is, in fact, still mythical.
It might be interesting to compare LLVM generated code (at same/maximum optimization level) for Rust vs C, which would remove optimizer LOE as a factor and more isolate difficulties/opportunities caused by the respective languages.
If you do hand optimize your code, all bets are off. With both languages. But I think the notion that the Rust compiler has more context for optimizing than the C compiler is maybe not as controversial as the notion that language X is better/faster than language Y. Ultimately, producing fast/optimal code in C kind of is the whole point of C. And there aren't really any hacks you can do in C that you can't do in Rust, or vice versa. So, it would be hard to make the case that Rust is slower than C or the other way around.
However, there have been a few rewrites of popular unix tools in Rust that benchmark a bit faster than their C equivalents. Could those be optimized in C. Probably; but they just haven't. But there is a case there of arguing that maybe Rust code is a bit easier to make fast than C code.
Well, then in many cases we are talking about LLVM vs LLVM.
> Ultimately, producing fast/optimal code in C kind of is the whole point of C
Mostly a nitpick, but I'm not convinced that's true. The performance queen has been traditionally C++. In C projects it's not rare to see very suboptimal design choices mandated by the language's very low expressivity (e.g. no multi-threading, sticking to an easier data structure, etc).
Compiler optimisations certainly play a large role, but they're not the only thing. Tracing-moving garbage collectors can trade off CPU usage for memory footprint and allow you to shift costs between them, so depending on the relative cost of CPU and RAM, you could gain speed (throughput) in exchange for RAM at a favourable price.
Arenas also offer a similar tradeoff knob, but they come with a higher development/evolution price tag.
I'd say most people use this definition, with the caveat that there's no official "average programmer", and everyone has different standards.
If you prefer it, salaries correlate with years of experience, and the latter surely correlates with skills, right?
(No, this doesn't mean that every 10 years XP dev is better than a 3 years XP one, but it's definitely a strong correlation)
In that context, the designer can reason about how should code written that way should perform.
So I think this is a meaningful question for a langauge designer, which makes it a meaningful question for the users as well, when phrased like this:
'How does idiomatic code (as imagined by the language creators) perform in language X vs Y?'
Does Rust not do this for subtle reasons that I'm missing, or does it just not matter as much as I'd expect it to?
I think these two things are also things people would argue about a lot. It's hard to talk about them in a concrete sense of things, rather than just "I feel like code usually does X".
This is more of a side comment about a different question, perhaps "ok fine, but then what are the language differences that could be performance-relevant for one language or the other, even if (as you say) they don't lead to a yes/no answer for your original question?"
1. crates.io makes it easy to use complex data structures. Basically this argument https://bcantrill.dtrace.org/2018/09/28/the-relative-perform...
2. Rust's safety guarantees making it easier to maintain more dangerous things over time.
3. On the C side, there's a lot more cultural understanding overall of how to use the language to get good performance results
4. It might be easier to find people who are experienced in heavily optimizing C code as opposed to Rust code.
Rust is a project that is rather more comparable to GCC than ISO C.
There is a set of programs that you can write in C and that are correct, that you cannot write in Rust without leaning into unsafe code. So if by "Rust" we mean "the safe subset of Rust", then this implies that there must be optimal algorithms that can be written in C but not in Rust.
On the other hand, Rust's ownership semantics are like rocket fuel for the compiler's understanding of aliasing. The inability of compilers to track aliasing precisely is a top inhibitor of load elimination in C compilers (so much so that C compiler writers lean into shady nonsense like strict aliasing, and even that doesn't buy very much precision). But a Rust compiler doesn't need to rely on shady imprecise nonsense. Therefore, there are surely algorithms that, if written in a straightforward way in both Rust and C, will be faster in Rust. I could even imagine there are algorithms for which it would be very unnatural to write the C code in a way that matches Rust's performance.
I'm purely speaking theoretically, I have no examples of either case. Just trying to provide my PL/compiler perspective
Well, unsafe rust is part of rust. So no, we don’t mean that.
The first is, we do have some amount of empirical evidence here: Rust had to turn its aliasing optimizations on and off again a few times due to bugs in LLVM. A comment from 2021: https://github.com/rust-lang/rust/issues/54878#issuecomment-...
> When noalias annotations were first disabled in 2015 it resulted in between 0-5% increased runtime in various benchmarks.
This leaves us with a few relevant questions:
Were those benchmarks representative of real world code? (They're not linked, so we cannot know. The author is reliable, as far as I'm concerned, but we have no way to verify this off-hand comment directly, I link to it specifically because I'd take the author at their word. They do not make any claim about this, specifically.)
Those benchmarks are for Rust code with optimizations turned off and back on again, not Rust code vs C code. Does that make this a good benchmark of the question, or a bad one?
These were llvm's 'noalias' markers, which were written for `restrict` in C. Do those semantics actually take full advantage of Rust's aliasing model, or not? Could a compiler which implements these optimizations in a different way do better? (I'm actually not fully sure of the latest here, and I suspect some corners would be relying on the stacked borrows vs tree borrows stuff being finalized)
Additionally, it was 10 years ago and LLVM has changed. It could be that LLVM does better now, or it could do worse. I would actually be interested in seeing some benchmarks with modern rustc.
There are 2 main differences between versions with and without strict aliasing. Without strict aliasing compiler can't assume that the result accumulator doesn't change during the loop and it has to repeatedly read/write it each iteration. With strict aliasing it can just read it to register, do the looping and write the result back at the end once. Second effect is that with strict aliasing enabled compiler can vectorize the loop processing 4 floats at the same time, most likely the same uncertainty of counter prevents vecotorization without strict aliasing.
If you want something slightly simpler example you can disable vectorization by adding '-fno-tree-vectorize'. With it disabled there is still difference in handling of counter.
Using restrict pointers and multiple same type input arrays it would probably be possible to make something closer to real world example.
Also note that C++ does not have restrict, formally speaking, though it is a common compiler extension. It's a C feature only!
When you can directly write assembly with either, comparing performance requires having some constraints.
For what it's worth, I think coding agents could provide a reasonable approximation of what "average" code looks like for a given language. If we benchmark that we'd have some indication of what the typical performance looks like for a given language.
I wrote this at a time when I was pretty anti-LLM, but I do think that you're right that there's some interesting implications of LLM usage in this space. And that's because one version of this question is "what can the average x programmer do compared to the average y programmer in the same amount of time," and I'm curious if LLMs lift all tides here, or not.
I rewrote a C project in Rust some years ago, and in the Rust version I included many optimizations that I probably wouldn't have in C code, thanks to the ability to do them "fearlessly". The end result was so much more performant I had to double check I didn't leave something out!
What is fast is writing code with zero abstractions or zero cost abstractions, and if you can't do that (because writing assembly sucks), get as close as possible.
Each layer you pile on adds abstraction. I've never had issues optimizing and profiling C code -- the tooling is excellent and the optimizations make sense. Get into Rust profiling and opimization and you're already in the weeds.
Want it fast? Turn off the runtime checks by calling unsafe code. From there, you can hope and pray like with most LLVM compiled languages.
If you want a stupid fast interpreter in C, you do computed goto, write a comment explaining why its not, in fact, cursed, and you're done. In C++, Rust, etc. you'll sit there examining the generated code to see if the heuristics detected something that ends up not generating effectively-computed-goto-code.
Not to mention panics, which are needed but also have branching overhead.
The only thing that is faster in Rust by default is probably math: You have so many more errors and warnings which avoid overflows, casts, etc. that you didn't mean to do. That makes a small difference.
I love Rust. If I want pure speed, I write unsafe Rust, not C. But it's not going to be as fast as trivial C code by default, because the tradeoffs fundamentally differ: Rust is safe by default, and C is efficient by default.
The article makes some of the same points but it doesn't read like the author has spent weeks in a profiler combing over machine code to optimize Rust code. Sadly I have, and I'm not getting that time back.
Bit of an aside, but these days it might be worth experimenting with tail call interpreters coupled with `musttail` annotations. CPython saw performance improvements over their computed goto interpreters with this method, for example [0].
You can do that for sure, but you can also sometimes write your code in a different way. https://davidlattimore.github.io/posts/2025/09/02/rustforge-... is an interesting collection of these.
> it doesn't read like the author has spent weeks in a profiler combing over machine code to optimize Rust code
It is true that this blog post was not intended to be a comprehensive comparison of the ways in which Rust and C differ in performance. It was meant to be a higher level discussion on the nature of the question itself, using a few examples to try and draw out interesting aspects of that comparison.
These variances pretty much mean that trying to compare with other "low level" languages is far from an apples to apples comparison.
So, to answer the question, "It depends." ... In the end, I think developers tend to optimize for a preferred style or ergonomics over hard technical reasons... it's mostly opinion, IMO.
I guess you could argue that C would reach the same speed because noalias is part of C as well. But I'd say that the interesting competition is for how fast idiomatic and "hand-optimized" (no unrolling, no aliasing hints etc) code is.
Comparing programming languages "performance" only makes sense if comparing idiomatic code. But you could argue that noalias in C is idiomatic. But you could equally well argue that multi threading in Rust is more idiomatic than it is in C and so on. That's where it becomes interesting (and difficult) to quantify.
What I will say is that the fact that Rust uses this so much, and had to turn it off because of all the bugs it shook out, at least implies that it's not used very much in real-world C code. I don't know how to more scientifically analyze that, though.
However I remember reading a few years back that due to the Rust frontend not communicating these opportunities to LLVM, and LLVM not being designed to take advantage of them, the real-world gains do not always materialize.
Also sometimes people write code in Rust that does not compile under the borrow checker rules, and alleviate this issue either by cloning objects or using RefCell, both of which have a runtime cost.
also, if performance is critical to you, profile stuff and compare outputted assembly, more often than not you'll find that llvm just outputs the same thing in both cases
See "6.7.3.2 Structure and union specifiers", paragraph 16 & 17:
> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
> Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
But, I still don't think that what you've said is true. This is because alignment isn't decided per-object, but per type. That bit is covered more fully in 6.2.8 Alignment of objects.
You also have to be able to take a pointer to a (non-bitfield) member, and those pointers must be aligned. This is also why __attribute__((packed)) and such are non-standard extensions.
Then again: I have not passed the C specification lawyer bar, so it is possible that I am wrong here. I'm just an armchair lawyer. :)
(but for padding, yes, that's correct.)
[1] https://open-std.org/JTC1/SC22/WG14/www/docs/n3220.pdf section 6.7.3.2, paragraph 17.
It's part of the ABI spec. It's true that C evolved in an ad hoc way and so the formal rigor got spread around to a bunch of different stakeholders. It's not true that C is a lawless wasteland where all behavior is subject to capricious and random whims, which is an attitude I see a lot in some communities.
People write low level software to deal with memory layout and alignment every day in C, have for fourty years, and aren't stopping any time soon.
So "Is language X faster than language Y?" is totally answerable, but the answer depends on the answerer.
[0] tldr: "I think that there are so many variables that it is difficult to draw generalized conclusions."
I don't think a language should count as "fast" if it takes an expert or an inordinate amount of time to get good performance, because most code won't have that.
So on those grounds I would say Rust probably is faster than C, because it makes it much much easier to use multithreading and more optimised libraries. For example a lot of C code uses linked lists because they're easy to write in C, even when a vector would be faster and more appropriate. Multithreading can just be a one line change in Rust.
Let's say they only need 2 hours to get the <X> to work, and can use the remaining 6 hours for optimizing. Can 6 hours of optimizing a Python program make it faster than the assembly program?
The answer isn't obvious, and certainly depends on the specific <X>. I can imagine various <X> where even unlimited time spent optimizing Python code won't produce faster results than the assembly code, unless you drop into C/C++/Zig/Rust/D and write a native Python extension (and of course, at that point you're not comparing against Python, but that native language).
Assembly is going to give you pretty great performance generally, but the line only starts when you get to "ridiculous effort"!
Compare:
"Have you stopped beating your wife yet?"
"I do not beat my wife."
The response contributes to the answer, even if it brings you no closer to "yes" or "no".
Where C application code often suffers, but by no means always, is the use of memory for data structures. A nice big chunk of static memory will make a function fast, but I’ve seen many C routines malloc memory, do a strcpy, compute a bit, and free it at the end, over and over, because there’s no convenient place to retain the state. There are no vectors, no hash maps, no crates.io and cargo to add a well-optimized data structure library.
It is for this reason I believe that Rust, and C++, have an advantage over C when it comes to writing fast code, because it’s much easier to drop in a good data structure. To a certain extent I think C++ has an advantage over Rust due to easier and better control over layout.
Back then the C implementation of the (i.e., "one") micro benchmark beat the Rust implementation. I could squeeze out more performance by precisely controlling the loop unrolling. Nowadays, I don't really care and operate under the assumption that "Python is faster than $X and if it is not, it is still fast enough!"
What good is speed if you cannot compile? c has both. Maybe in another decade rust will have settled down but now wrangling all the incompatible rust versions makes c the far better option. And no, setting cargo versions doesn't fix this. It's not something you'd run into writing rust code within a company but it's definitely something you run into trying to compile other people's rust code.
Part of what I'm getting at here is that you have to decide what is in those benchmarks in the first place. Yes, benchmarks would be an important part of answering this question, but it's not just one question: it's a bunch of related but different questions.
That is a damn good reason to choose Rust over C++, even if the Rust implementation of the "same" thing should be a bit slower.
It doesn't provide a lot of evidence in either direction for the rest of the vast space of potential programs.
(Knowing C++ fairly well and Rust not very well, I have Opinions, but they are not very well-informed opinions. They roughly boil down to: Rust is generally better for most programs, largely due to cargo not Rust, but C++ is better for more exploratory programming where you're going to be frequently reworking things as you go. Small changes ripple out across the codebase much more with Rust than C++ in my [limited] experience, and as a result the percentage of programming time spent fixing things up is substantially higher with Rust.)
Rust does have some interesting features, which restrict what you are allowed to do and thus make some things impossible but in turn make other things easier. It is highly likely that those restrictions are part of what made this possible. Given infinite resources (which you never have) a C++ implementation could be faster because it has better shared data concepts - but those same shared data concepts make it extremely hard to reason about multi-threaded code and so humanly you might not be able to make it work.
In short, the previous two attempts were done by completely different groups of different people, a few years apart. Your direct question about if direct wisdom from these two attempts was shared, either between them, or used by Stylo, isn't specifically discussed though.
> a C++ implementation could be faster because it has better shared data concepts
What concepts are those?
Data can be modified by any thread that wants to. It is up to you to ensure that modifications work correctly without race conditions. In rust you can't do this (unsafe aside), the borrow checker enforces data access patterns that can't be proved correct.
Again let me be clear: the things rust doesn't allow are hard to get correct.
I agree that it has no meaning. Speed(language) is undefined, therefore there is no faster language.
I get this often because python is referred to as a slow language, but since a python programmer can write more features than a C programmer in the same time, at least in my space, it causes faster programs in python, because some of those features are optimizations.
Now speed(program(language,programmer)) is defined, and you could do an experiment by having programmers of different languages write the same program and compare its execution times.
... well, that's what I get for reading an article with a silly title.
This is because C does so little for you -- bounds checking must be done explicitly for instance, like you mention in the article, so C is "faster" unless you work around rust's bounds checking. It reminds me of some West Virginia residents I know who are very proud of how low their taxes are -- the roads are falling apart, but the taxes are very low! C is this way too.
C is pretty optimally fast in the trivial case, but once you add bounds checking and error handling and memory management its edge is much much smaller (for Rust and Zig and other lowish-level languages)
Betteridge's Law of Headlines, saved you a click.
When it comes to assembly, the "compiler" is the person writing the code, and while assembly gives you the maximum flexibility to potentially equal or outperform any compiler for any language, there are not too many people with the skill to do that, especially when writing large programs (which due to the effort required are rarely written in assembler). In general there is much more potential for improving the speed of programs by changing the design and using better algorithms, which is where high level languages offer a big benefit by making this easier.
> I went to the University of Washington and [then] I got hired by this company called Geoworks, doing assembly-language programming, and I did it for five years. To us, the Geoworkers, we wrote a whole operating system, the libraries, drivers, apps, you know: a desktop operating system in assembly. 8086 assembly! It wasn't even good assembly! We had four registers! [Plus the] si [register] if you counted, you know, if you counted 386, right? It was horrible.
> I mean, actually we kind of liked it. It was Object-Oriented Assembly. It's amazing what you can talk yourself into liking, which is the real irony of all this. And to us, C++ was the ultimate in Roman decadence. I mean, it was equivalent to going and vomiting so you could eat more. They had IF! We had jump CX zero! Right? They had "Objects". Well we did too, but I mean they had syntax for it, right? I mean it was all just such weeniness. And we knew that we could outperform any compiler out there because at the time, we could!
> The problem is, picture an ant walking across your garage floor, trying to make a straight line of it. It ain't gonna make a straight line. And you know this because you have perspective. You can see the ant walking around, going hee hee hee, look at him locally optimize for that rock, and now he's going off this way, right?
> This is what we were, when we were writing this giant assembly-language system. Because what happened was, Microsoft eventually released a platform for mobile devices that was much faster than ours. OK? And I started going in with my debugger, going, what? What is up with this? This rendering is just really slow, it's like sluggish, you know. And I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen. It wasn't just the title bar. Everything was getting called multiple times.
> Because we couldn't see how the system worked anymore!
> Small systems are not only easier to optimize, they're possible to optimize. And I mean globally optimize.
Instead, I'd say that Rust & C are close enough, speed-wise, that (1) which one is faster will depend on small details of the particular use case, or (2) the speed difference will matter less than other language considerations.