> The coercions introduced by conversion rules can however lead to subtle semantic differences
The example they give is this C code:
1 uint8_t x[1] = { 0 };
2 uint8_t *y = x;
3 *y = 1;
4 assert(*x == 1); /* SUCCESS */
getting translated to this (safe) Rust code: 1 let x: [u8; 1] = [0; 1];
2 let mut y: Box<[u8]> = Box::new(x);
3 y[0] = 1;
4 assert!(x[0] == 1) /* failure */
So the pointer (iterator) targeting an existing (stack-allocated) array declared on line 2 gets translated to an owning pointer/Box) targeting a (heap-allocated) new copy of the array. So if the original code was somehow counting on the fact that the pointer iterator was actually targeting the array it was assigned to, the translated code may (quietly) not behave correctly.For comparison, the scpptool (my project) auto-translation (to a memory safe subset of C++) feature would translate it to something like:
1 mse::lh::TNativeArrayReplacement<uint8_t, 1> x = { 0 };
2 mse::lh::TNativeArrayReplacement<uint8_t, 1>::iterator y = x; // implicit conversion from array to iterator
3 *y = 1;
4 assert(*x == 1); /* SUCCESS */ // dereferencing of array supported for compatibility
or if y is subsequently retargeted at another type of array, then line 2 may end up as something like: 2 mse::TAnyRandomAccessIterator<uint8_t> y = x; // implicit conversion from array to iterator
So the OP project may only be converting C code that is already amenable to being converted to safe Rust. But given the challenge of the problem, I can respect the accomplishment and see some potential utility in it.edit: added translation for line 2 in an alternate hypothetical situation.
The concept of the borrow checker has been on a simplified version of rust https://people.mpi-sws.org/~dreyer/papers/rustbelt/paper.pdf - work has continued in this area steadily (e.g. see tree borrows)
There's a variety of tools that take rust code and translate it to something a proof system understands, and then checks that it matches a specification. AWS is leading a project to use these to verify the standard library: https://aws.amazon.com/blogs/opensource/verify-the-safety-of...
It frustrates me more than it should, I admit, that people always mention Rust when they talk about safety, but never Ada / SPARK. You want formal verification? Use Ada / SPARK. It has been battle-tested. It has been used for critical systems for a really long time now.
(And a compiler being formally verified vs. being able to write formally verified code means two different things.)
For example if someone on GitHub sees that the project is written in Rust, they are automatically going to assume it is safe, incorrectly so. I do not blame them though.
That seems excessive and tedious.
Concretely: spatial and temporal memory safety are good things, and Rust achieves both. It’s not unique in this regard, nor is it unique in not having a formal definition.
In this case it's not about Frama-C or similar tools though, see your sibling comments about the caveats.
The problem with transpiling C to rust is that unsafe and unverified behavior can be a key property of the behavior resulting program, so there isn’t an obvious way to spit out a sort of rustified (CRustified?) binary that matches the behavior of the C program.
I'm not sure what it has to do with translating the code to Rust.
Formally verified C programs typically have had all their undefined behaviour already stamped out during the verification process. That makes mapping it over to Rust a lot simpler.
The first is that the translation function (from C to Rust) has itself been formally verified. In other words: if you give it a C program that obeys certain necessary pre-conditions, it will give you a safe Rust program. I believe the title of the paper is primarily using "Formalized" with regard to this characteristic.
The second is that the set of programs the authors evaluate their tool on are C programs that have themselves been formally verified. I only barely skimmed the introduction and didn't see it addressed directly there, but I would assume that this is because the pre-conditions necessary for their translation to work are most easily (only?) met by formally verified C programs, where of course the verification has been rendered with respect to properties that would be relevant here (e.g., memory use).
You can also see this effect in the article on the history of Coverity’s analyzer. Real-world code was horrible to deal with vs the academic examples they started with.
https://cacm.acm.org/research/a-few-billion-lines-of-code-la...
Alternatively this might be seen as a stepping stone to translating non-formally-verified C to rust, which I understand the US government has expressed a fair bit of interest in.
First of all they never translated any C! At all. Zero lines.
They took code written in F* and modified its C compiler to emit Rust. They never had to deal with any actual C code of any complexity, aside from the most trivial code that might theoretically be emitted by a toy compiler (but wasn't even that!). They just pretended they were dealing with C (or a heavily restricted Mini-C).
Even if we accept this, there are plenty of arbitrary restrictions on the C it can even support in principle:
> If the original C program further relies on x, our translation will error out, and will ask the programmer to fix their source code.
That's saying you want C to already be written in a style that makes the Rust borrow checker happy! They're avoiding the actual hard parts of the problem.
> These rules present simplified versions of what we support. Our implementation features several peephole optimizations ...
If I may translate from the language of academia. "We present beautiful rules in figure 4. But in reality, our implementation relies on a large number of hacks."
But it gets worse!
> For overlap cases that can be distinguished statically (as above), we emit a compile-time error; otherwise, the Rust code will panic at runtime.
The resulting code is not guaranteed to be correct! For example, aliasing in C can cause crashes in the resulting Rust code. See comment at the top of page 10. We're going from a formally verified C program to a "It may crash now" Rust program!? That's nuts!
> We apply our methodology to existing formally verified C codebases, namely, the HACL* cryptographic library ...
This is such a blatant lie that I had to return to it. It's designed to catch an unwary reviewer. We often talk about HACL being in verified C because it gets compiled to that. But it's not a C library, it's written in a totally different language. You cannot confuse the two.
I'm not a reviewer for their paper. But if I was, I would strongly fight for rejection.
The fact that they only handle formally verified C is so astronomically far away from being their main problem.
An honest title would be "Compiling a subset of F* to partially Safe Rust, Partially Formalized"
Sadly that is the name of the game in conference publishing. Make a big claim in the title and hope the reviewer does not read the fine print.
> If I may translate from the language of academia. "We present beautiful rules in figure 4. But in reality, our implementation relies on a large number of hacks."
Er, this I can understand. Every conference paper only presents you with a simplified view of the contributions. There is no way to describe every corner case within the page limits of a conference paper (specifically, I bet this was submitted to POPL 2025).
In my group I don't allow hacks. Most other good papers I can think of don't do that. It takes new students a while to get used to not maximizing performance at all costs. And if a hack must exist, then it's explained, not as a hack, but as part of the method.
Don't you hate it when you implement some beautiful idea from a paper only to discover that it hardly bares any relationship to the actual method that works?
> There is no way to describe every corner case within the page limits of a conference paper (specifically, I bet this was submitted to POPL 2025).
You're probably right. But POPL allows for unlimited pages in an attached appendix. They can and should describe the full method. Particularly if they're going to play the game of saying that they formalized this.
> Sadly that is the name of the game in conference publishing. Make a big claim in the title and hope the reviewer does not read the fine print.
It's a form of namespace squatting. Do the worst job possible to claim a title that you couldn't execute on, so that when people figure out that problem, they will be forced to cite your non-working solution. I loathe this approach to publishing.
We should punish people for doing. Reject for lack of honesty and your paper can't be published in related conference for X years.
These kinds of manual or automated conversation from C to <safer language> therefore have potential not only for increasing adoption of safer languages but also for uncovering existing bugs.
[1] https://www.researchgate.net/profile/James-Cheney-2/publicat...
Seems it would be a win even if the unsafe portion is quite large. Obviously not of it’s 90% of the end result.
I have low hopes for this kind of approach; it’s sure to hit the limits of what’s possible with static analysis of C code. Also, choosing Rust as the target makes the problem unnecessarily hard because Rust’s ownership model is so foreign to how real C programs work.
For example, C's buffers obviously have lengths, but in C the length isn't explicitly tied to a pointer, so the translator has to deduce how the C program tracks the length to convert that into a slice. It's non-trivial even if the length is an explicit variable, and even trickier if it's calculated or changes representations (e.g. sometimes used in the form of one-past-the-end pointer).
Other C patterns like `bool should_free_this_pointer` can be translated to Rust's enum of `Owned`/`Borrowed`, but again it requires deducing which allocation is tied to which boolean, and what's the true safe scope of the borrowed variant.
Formally verified C involves sticking to a strict subset of the capabilities of C that is verifiable, much like Rust enforces, so it makes sense that programs meeting that standard could be translated.
That’s just one example of how C code is nowhere near meeting Rust’s requirements. There are lots of others.
It’s literally in the standard library
https://doc.rust-lang.org/std/collections/struct.LinkedList....
Imagine refactoring your average C program to use GLib for all (all!) of its data structures. Now imagine doing that, but also translating it into Rust at the same time.
That's what LLMs are for - idiom translation. You can't trust them to do it right, though.
[Pan et al . 2024] find that while GPT-4 generates code that is more idiomatic than C2Rust, only 61% of it is correct (i.e., compiles and produces the expected result), compared to 95% for C2Rust.
This problem needs both AI-type methods to help with the idioms and formal methods to insure that the guessed idioms correctly capture the semantics.
A big advance in this project is that they can usually translate C pointer arithmetic into Rust slices. That's progress on of one of the hardest parts of the problem. C2Rust did not do that. That system just generates unsafe raw pointer arithmetic, yielding ugly Rust code that replicates C pointer semantics using function calls.
DARPA is funding research in this area under the TRACTOR program. Program awards in April 2025, so this is just getting started. It's encouraging to see so much progress already. This looks do-able.
Optimizing C compilers also happened to be good at idiom recognition, and we can probably trust them a little more. The OP paper does mention future plan to use clang as well: >We have plans for a libclang-based frontend that consume actual C syntax.
If such transformation can be done at IR level it might be more efficient to be to C-IR > idiom transform to Rust-IR > run safe-checks in Rust-IR > continue compilation in C-IR or Rust-IR or combining both for better optimization properties.
https://dl.acm.org/doi/pdf/10.1145/3597503.3639226
> As for C2Rust, the 5% unsuccessful translations were due to compilation errors, the majority of them caused by unused imports.
I'm rather confused by what that's supposed to mean, since unused imports cause warnings, not errors in Rust.
In practice, a little unsafe is usually fine. I only bring it up since the article is about translating to safe rust.
Unsafe blocks are an escape hatch where you promise that some invariants the compiler cannot verify are in fact true. If the translated code were to use that collection, via its safe interfaces, it would still be “safe rust”.
More generally: it’s incorrect to say that the rust ownership model forbids X when it ships with an implementation of X, regardless of if and how it uses “unsafe” - especially if “unsafe” is a feature of the ownership model that helps implement it.
You can nitpick the meaning of "forbids", but as far as the current context is concerned, if you translate code that implements a doubly linked list (as opposed to using one from a library) into Rust, it's not going to work without unsafe. Or an index-based graph or something.
Unlike C++ or Rust, C has no references, only pointers, so developer must release memory manually at some arbitrary point. This is the problem and source of bugs.
Note that the converse to this isn't necessarily true! People I trust way more to write unsafe Rust code than me than me have argued that unsafe Rust can be harder than writing C in some ways due to having to uphold certain invariants that don't come up in C. While there are a number of blog posts on the topic that anyone interested can probably find fairly easily by googling "unsafe Rust harder than C", I'll break my usual rule of strongly preferring articles to video content to link a talk from youtube because the speaker is one of those people I mention who I'd trust more than me to write unsafe code and I remember seeing him give this talk at the meetup: https://www.youtube.com/watch?v=QAz-maaH0KM
Yes, this is absolutely correct and on top of this you sometimes have to employ tricks to make the compiler infer the right lifetime or type for the abstraction you're providing. On the other hand, again thanks to the abstraction power of Rust compared to C, you can test the resulting code way more easily using for example Miri.
So C and C++ are the exceptions to the rule, but how do they make it easy to write doubly linked lists? Obviously, the key assumption is that that the developer makes sure that node->next->prev = node->prev->next = node (Ignoring nullptr).
With this restriction, you can safely write a doubly linked list even without reference counting.
However, this isn't true on the pointer level. The prev pointers could be pointing at the elements in a completely random order. For example tail->prev = head, head->prev = second_last and so on. So that going backwards from the tail is actually going forwards again!
Then there is also the problem of having a pointer from the outside of the linked list pointing directly at a node. You would need a weak pointer, because another pointer could have requested deletion from the linked list, while you're still holding a reference.
If you wanted to support this generic datastructure, rather than the doubly linked list you have in your head, then you would need reference counting in C/C++ as well!
What this tells you, is that Rust isn't restrictive enough to enforce these memory safe contracts. Anyone with access to the individual nodes could break the contract and make the code unsafe.
C programmers don’t do linked lists by using libraries, they hand roll them, and often they are more complex than “just” a linked list. Lots of complex stuff out there.
The ownership model is close enough, but the way that model is expressed by the developer is completely arbitrary (and thus completely nuts).
If that’s the Rust way, then I’m all for it. Will make it easier for Fil-C to have some epic kill shots.
Zig seems to be awesome at creating mixed environs of zig for new code and C for old, and translating or interop, plus being a C compiler.
There must be some very good reasons why Linux kernel maintainers aren't looking to zig as a C replacement rather than Rust.
I don't know enough to even speculate so would appreciate those with more knowledge and experiencing weighing in.
Zig has much better interoperability with C than Rust, but it’s not memory safe or stable. I think we’ll see quite a lot of Zig adoption in the C world, but I don’t think it’s in direct competition with Rust as such. In my region of the world nobody is adopting Rust, the C++ people are remaining in C++. There was some interest in Rust originally but it never really caught on in any company I know of. Likely for the same reason Go has become huge in younger companies but will not really make its way into companies which are traditionally Java/C# because even if it made sense technically (and it probably doesn’t) it’s a huge change management task. Zig is seeing traction for programs without the need for dynamic memory allocation, but not much beyond that.
Rust isn't a "replacement for C", but an addition to it. It's a tool that Torvalds et. al. has recognised the value of and thus it's been allowed in the kernel. The majority of the kernel code will still be written in C.
I'm no kernel maintainer, but I can speculate that two of the main reasons for Rust over Zig are the compile time guarantees that the language provides being better as well as the rate of adoption. There is a lot of work done by many leading companies in the industry to provide Rust native code or maintained Rust bindings for their APIs. Windows devs are re-writing parts of _their_ kernel in Rust. There's a "movement" going on that has been going on for a while. I only hope it doesn't stop.
Maybe the maintainers feel like Zig doesn't give them enough over C to be worth the change? Many of them are still opposed to Rust as well.
Hmm I think to clarify I would say that Rust _is_ intended as a replacement for C in general, but that this isn't how the Linux kernel developers are choosing to use it.
As for why the kernel developers would choose Rust, I would think another one of the primary benefits is that the type system guarantees the absence of a wide class of memory-related errors that are prevalent in C, and this type system (as well as those of its predecessors) has been subjected to significant scrutiny by the academic community over the last couple of decades to help iron out problems. I suspect this is also part of why Rust has a relatively large and passionate community compared to other C alternatives.
It is hard to get a full picture of how academic research influenced Rust and vice versa. Two examples:
- The use of linearity for tracking ownership in types has been known to academics but had never found its way into a mainstream language.
- researchers in programming language semantics pick Rust as a target of formalization, which was only possible because of design choices around type system. They were able to apply techniques that resulted from decades of trying to get a certified C. They have formalized parts of the standard library, including unsafe Rust, and found and fixed bugs.
So it seems fair to say that academic research on safety for C has contributed much to what makes Rust work today, and in ways that are not possible for C and C++ because these languages do not offer static guarantees where types Transport information about exclusive access to some part of memory.
Has there actually been a successfull contribution to the mainline kernel? The last two big projects I heard of (ext2 / Apple drivers) seemed to have issues getting their code accepted.
Saying this as someone who generally likes Zig's direction.
Also, is much of the authors did manual or was it run through something to produce the Rust code? If so, where is the code that generates Rust, I do not see any links to any source repos.
The paper states that these developments will be released under open source licenses after the review process is completed, i.e. most likely, after the paper is formally published.
Ironically, AI is able to produce research-grade algorithms and will probably become an authority on the subject, helping take more traditional CS to the next level.
I agree with the sibling -- I think LLMs may be able to help automate some parts of it, but humans are still 95% of it. At least for now.
Similarly, it they might generate safe wrappers that let teams write new code in Rust side by side with the field-proven C. New code has the full benefits, old code is proven safe, and the interfaces are safer.
A full on translator might be an ideal option. We’d want one language for the codebase in the future. Push-button safety with low, false positives for existing C and C++ is still the greatest need, though. Maybe auto-correcting bad structure right in the C, too, like Google’s compiler tool and ForAllSecure’s Mayhem do.
No, some C programs cannot be made safe. This can be due to dependency on undefined or unspecified behaviors, or it can be because introducing proper safety checks would limit the domain of possible inputs too much to be useful, or other things.
Translating to a safe language can maintain the expressive capabilities of the inputs while statically guaranteeing correct operation at run-time. It is objectively better in these cases.
> field-proven C
I don't think this exists, as the numerous critical vulnerabilities over the years have shown. All we have is C that seems to work pretty well often enough to be useful.
> old code is proven safe
Old code is assumed to be safe due to luck, actually. "Prove" has a specific meaning (especially on a post for a paper about proving things), and the overwhelming majority of C code is not proven to any rigorous mathematical standard. In contrast, the Rust type system has been mathematically proven to be correct.
> A full on translator might be an ideal option.
It depends on what you're willing to give up. If you don't mind losing performance, limiting your domain of inputs or range of outputs, giving up code legibility, and so on, then sure, this can probably be done to some extent. But when you start wanting your translator to be both sound and complete over all of these concerns, you run into problems.
Is this the case? E.g. the issue "Prove the Rust type system sound" https://github.com/rust-lang/rust/issues/9883 is closed with comment "This will be an open issue forever. Closing." in 2016: https://github.com/rust-lang/rust/issues/9883#issuecomment-2... .
At least nowadays (since 2022) we do have a language specification for Rust: https://ferrous-systems.com/blog/the-ferrocene-language-spec...
Neither of these papers include the trait system, unfortunately, and I'm not aware of another line of research that does (yet?).
You can certainly replace code using undefined behavior in C code by using defined constructs.
> I don't think this exists, as the numerous critical vulnerabilities over the years have shown. All we have is C that seems to work pretty well often enough to be useful.
I think this highly misleading. Some of the most reliable programs I know are written in C and Rust projects will also have critical vulnerabilities. Most vulnerabilities are not actually related to memory safety and the use of unsafe Rust will also lead to memory safety issues in Rust code. So I see some advantage to Rust but to me it is obviously overhyped.
Flagged, this is just a lie of a title.