Compiling C to Safe Rust, Formalized
242 points
17 hours ago
| 13 comments
| arxiv.org
| HN
wffurr
16 hours ago
[-]
Note that this is done for “existing formally verified C codebases” which is a lot different from typical systems C code which is not formally verified.
reply
safercplusplus
9 hours ago
[-]
And even then, not completely reliably it seems (from Section 2.2):

> The coercions introduced by conversion rules can however lead to subtle semantic differences

The example they give is this C code:

    1 uint8_t x[1] = { 0 };
    2 uint8_t *y = x;
    3 *y = 1;
    4 assert(*x == 1); /* SUCCESS */
getting translated to this (safe) Rust code:

    1 let x: [u8; 1] = [0; 1];
    2 let mut y: Box<[u8]> = Box::new(x);
    3 y[0] = 1;
    4 assert!(x[0] == 1) /* failure */
So the pointer (iterator) targeting an existing (stack-allocated) array declared on line 2 gets translated to an owning pointer/Box) targeting a (heap-allocated) new copy of the array. So if the original code was somehow counting on the fact that the pointer iterator was actually targeting the array it was assigned to, the translated code may (quietly) not behave correctly.

For comparison, the scpptool (my project) auto-translation (to a memory safe subset of C++) feature would translate it to something like:

    1 mse::lh::TNativeArrayReplacement<uint8_t, 1> x = { 0 };
    2 mse::lh::TNativeArrayReplacement<uint8_t, 1>::iterator y = x; // implicit conversion from array to iterator
    3 *y = 1;
    4 assert(*x == 1); /* SUCCESS */ // dereferencing of array supported for compatibility
or if y is subsequently retargeted at another type of array, then line 2 may end up as something like:

    2 mse::TAnyRandomAccessIterator<uint8_t> y = x; // implicit conversion from array to iterator
So the OP project may only be converting C code that is already amenable to being converted to safe Rust. But given the challenge of the problem, I can respect the accomplishment and see some potential utility in it.

edit: added translation for line 2 in an alternate hypothetical situation.

reply
akkad33
6 hours ago
[-]
Is Rust formally verified? Not that I know of
reply
gpm
1 hour ago
[-]
No, but small pieces of it are, and there's active work to extend it

The concept of the borrow checker has been on a simplified version of rust https://people.mpi-sws.org/~dreyer/papers/rustbelt/paper.pdf - work has continued in this area steadily (e.g. see tree borrows)

There's a variety of tools that take rust code and translate it to something a proof system understands, and then checks that it matches a specification. AWS is leading a project to use these to verify the standard library: https://aws.amazon.com/blogs/opensource/verify-the-safety-of...

reply
PartiallyTyped
5 hours ago
[-]
You can always run model checkers like Kani, though even that is limited.
reply
medo-bear
3 hours ago
[-]
So no?
reply
johnisgood
2 hours ago
[-]
The answer is that it is not.

It frustrates me more than it should, I admit, that people always mention Rust when they talk about safety, but never Ada / SPARK. You want formal verification? Use Ada / SPARK. It has been battle-tested. It has been used for critical systems for a really long time now.

(And a compiler being formally verified vs. being able to write formally verified code means two different things.)

reply
medo-bear
2 hours ago
[-]
I think a disclaimer like this should be written with every Rust application, like health warnings on cigarette packets
reply
johnisgood
2 hours ago
[-]
At this point, I think that would be better, yes, just because people think Rust is "fully" safe, which is just incorrect. I think the problem was the Rust hype and repeated statements of it being very safe, so we have some undoing to do.

For example if someone on GitHub sees that the project is written in Rust, they are automatically going to assume it is safe, incorrectly so. I do not blame them though.

reply
keybored
1 hour ago
[-]
You presumably extend this to every virtual machine or interpreter for every language which is implemented in an unsafe language. When that language claims to be safe (like all such languages claim to be).

That seems excessive and tedious.

reply
medo-bear
47 minutes ago
[-]
The point, I think, was that "safety" presumptions about Rust are often exaggerated or poorly misunderstood due to hype. That could certainly lead to problems
reply
woodruffw
3 minutes ago
[-]
I don’t think Rust’s actual safety properties aren’t overhyped, although they may be subject to misunderstanding about their exact extent.

Concretely: spatial and temporal memory safety are good things, and Rust achieves both. It’s not unique in this regard, nor is it unique in not having a formal definition.

reply
chillingeffect
49 minutes ago
[-]
Rust is to safe as Tesla is to autopilot.
reply
medo-bear
45 minutes ago
[-]
As Tesla is to Tesla
reply
jokoon
7 hours ago
[-]
What is formally verified C? Why isn't there more of it?
reply
PhilipRoman
6 hours ago
[-]
Because it takes a lot of effort. Google Frama-C. On the flip side, it can express not just memory safety constraints, but also correctness proofs.

In this case it's not about Frama-C or similar tools though, see your sibling comments about the caveats.

reply
zozbot234
6 hours ago
[-]
Arguably, generating verified C from a low-level focused subset of F* (Low*, used in the HACL project) is close enough to count as a "similar tool".
reply
cushychicken
2 hours ago
[-]
Woof. Major unspoken caveat in the title!
reply
tonetegeatinst
13 hours ago
[-]
What is the main difference? Can compiler flags force compliance?
reply
LPisGood
12 hours ago
[-]
My understanding is that formal verification is a tough goal to achieve and that it usually requires designing the program or the language to be a specific way.

The problem with transpiling C to rust is that unsafe and unverified behavior can be a key property of the behavior resulting program, so there isn’t an obvious way to spit out a sort of rustified (CRustified?) binary that matches the behavior of the C program.

reply
thunkingdeep
10 hours ago
[-]
Typical term is “Oxidized”. I think they feel clever when they do the RiiR thing and say that.
reply
immibis
9 hours ago
[-]
"Formally verified" means someone has written a correct mathematical proof that the code has no bugs (the proof is checked by a computer program to make sure it is correct). This is a very high bar.

I'm not sure what it has to do with translating the code to Rust.

reply
swiftcoder
9 hours ago
[-]
> I'm not sure what it has to do with translating the code to Rust.

Formally verified C programs typically have had all their undefined behaviour already stamped out during the verification process. That makes mapping it over to Rust a lot simpler.

reply
tjoff
7 hours ago
[-]
Translating undefined behavior is the easy part.
reply
swiftcoder
4 hours ago
[-]
Maybe, but only if you know the specifics of the environment in which it is executing (i.e. which compiler/architecture-specific behaviours the code actually relies on)
reply
Filligree
3 hours ago
[-]
Correctly written C programs don’t have undefined behaviour, so a translator can operate assuming there is none.
reply
DonaldPShimoda
8 hours ago
[-]
In this case specifically, two separate aspects are being referred to with regard to "formal verification".

The first is that the translation function (from C to Rust) has itself been formally verified. In other words: if you give it a C program that obeys certain necessary pre-conditions, it will give you a safe Rust program. I believe the title of the paper is primarily using "Formalized" with regard to this characteristic.

The second is that the set of programs the authors evaluate their tool on are C programs that have themselves been formally verified. I only barely skimmed the introduction and didn't see it addressed directly there, but I would assume that this is because the pre-conditions necessary for their translation to work are most easily (only?) met by formally verified C programs, where of course the verification has been rendered with respect to properties that would be relevant here (e.g., memory use).

reply
nickpsecurity
12 hours ago
[-]
Formal verification often requires simplified code in a restrictive style. You might not even be able to use C features or structures that have the performance you want. How theorem provers and brains work are also different enough that making something easy for one often makes it harder for the other.

You can also see this effect in the article on the history of Coverity’s analyzer. Real-world code was horrible to deal with vs the academic examples they started with.

https://cacm.acm.org/research/a-few-billion-lines-of-code-la...

reply
medo-bear
3 hours ago
[-]
What is the benefit of compiling formally correct code to Rust? It seems that all the possible benefits are already there (if not more)
reply
gpm
1 hour ago
[-]
I suppose hypothetically putting it in an easier language to make changes in. Though it's hard to imagine it being easier to make changes to transpiled code than the original.

Alternatively this might be seen as a stepping stone to translating non-formally-verified C to rust, which I understand the US government has expressed a fair bit of interest in.

reply
medo-bear
1 hour ago
[-]
Good luck with both of that. Otherwise people whose dayjob is rewritting everything in Rust will soon be out of a job
reply
light_hue_1
7 hours ago
[-]
Oh, this has so so many more caveats! It's borderline false advertising.

First of all they never translated any C! At all. Zero lines.

They took code written in F* and modified its C compiler to emit Rust. They never had to deal with any actual C code of any complexity, aside from the most trivial code that might theoretically be emitted by a toy compiler (but wasn't even that!). They just pretended they were dealing with C (or a heavily restricted Mini-C).

Even if we accept this, there are plenty of arbitrary restrictions on the C it can even support in principle:

> If the original C program further relies on x, our translation will error out, and will ask the programmer to fix their source code.

That's saying you want C to already be written in a style that makes the Rust borrow checker happy! They're avoiding the actual hard parts of the problem.

> These rules present simplified versions of what we support. Our implementation features several peephole optimizations ...

If I may translate from the language of academia. "We present beautiful rules in figure 4. But in reality, our implementation relies on a large number of hacks."

But it gets worse!

> For overlap cases that can be distinguished statically (as above), we emit a compile-time error; otherwise, the Rust code will panic at runtime.

The resulting code is not guaranteed to be correct! For example, aliasing in C can cause crashes in the resulting Rust code. See comment at the top of page 10. We're going from a formally verified C program to a "It may crash now" Rust program!? That's nuts!

> We apply our methodology to existing formally verified C codebases, namely, the HACL* cryptographic library ...

This is such a blatant lie that I had to return to it. It's designed to catch an unwary reviewer. We often talk about HACL being in verified C because it gets compiled to that. But it's not a C library, it's written in a totally different language. You cannot confuse the two.

I'm not a reviewer for their paper. But if I was, I would strongly fight for rejection.

The fact that they only handle formally verified C is so astronomically far away from being their main problem.

An honest title would be "Compiling a subset of F* to partially Safe Rust, Partially Formalized"

reply
lou1306
6 hours ago
[-]
> An honest title would be "Compiling a subset of F* to partially Safe Rust, Partially Formalized"

Sadly that is the name of the game in conference publishing. Make a big claim in the title and hope the reviewer does not read the fine print.

> If I may translate from the language of academia. "We present beautiful rules in figure 4. But in reality, our implementation relies on a large number of hacks."

Er, this I can understand. Every conference paper only presents you with a simplified view of the contributions. There is no way to describe every corner case within the page limits of a conference paper (specifically, I bet this was submitted to POPL 2025).

reply
light_hue_1
6 hours ago
[-]
> Er, this I can understand. Every conference paper only presents you with a simplified view of the contributions.

In my group I don't allow hacks. Most other good papers I can think of don't do that. It takes new students a while to get used to not maximizing performance at all costs. And if a hack must exist, then it's explained, not as a hack, but as part of the method.

Don't you hate it when you implement some beautiful idea from a paper only to discover that it hardly bares any relationship to the actual method that works?

> There is no way to describe every corner case within the page limits of a conference paper (specifically, I bet this was submitted to POPL 2025).

You're probably right. But POPL allows for unlimited pages in an attached appendix. They can and should describe the full method. Particularly if they're going to play the game of saying that they formalized this.

> Sadly that is the name of the game in conference publishing. Make a big claim in the title and hope the reviewer does not read the fine print.

It's a form of namespace squatting. Do the worst job possible to claim a title that you couldn't execute on, so that when people figure out that problem, they will be forced to cite your non-working solution. I loathe this approach to publishing.

We should punish people for doing. Reject for lack of honesty and your paper can't be published in related conference for X years.

reply
amenghra
7 hours ago
[-]
In 2002, a group of researchers presented a paper on Cyclone, a safe dialect of C [1]. While (manually) porting code from C to Cyclone, they found safety bugs in the C code.

These kinds of manual or automated conversation from C to <safer language> therefore have potential not only for increasing adoption of safer languages but also for uncovering existing bugs.

[1] https://www.researchgate.net/profile/James-Cheney-2/publicat...

reply
alkonaut
7 hours ago
[-]
If you used a naïve translation to Rust, wouldn’t you get parts that are safe and parts that are unsafe? So your manual job would need to be only verifying safety in the unsafe regions (same as when writing rust to begin with)?

Seems it would be a win even if the unsafe portion is quite large. Obviously not of it’s 90% of the end result.

reply
CodesInChaos
3 hours ago
[-]
A naïve translation would produce rust code which is almost entirely unsafe (using raw pointers instead of references everywhere). Translating to references is difficult, since C code isn't written with the restrictions of the Rust alias model / borrow-checker in mind.
reply
pizlonator
15 hours ago
[-]
Compiling a tiny subset of C, that is. It might be so tiny as to be useless in practice.

I have low hopes for this kind of approach; it’s sure to hit the limits of what’s possible with static analysis of C code. Also, choosing Rust as the target makes the problem unnecessarily hard because Rust’s ownership model is so foreign to how real C programs work.

reply
pornel
14 hours ago
[-]
Rust's ownership model is close enough for translating C. It's just more explicit and strongly typed, so the translation needs to figure out what a more free-form C code is trying to do, and map that to Rust's idioms.

For example, C's buffers obviously have lengths, but in C the length isn't explicitly tied to a pointer, so the translator has to deduce how the C program tracks the length to convert that into a slice. It's non-trivial even if the length is an explicit variable, and even trickier if it's calculated or changes representations (e.g. sometimes used in the form of one-past-the-end pointer).

Other C patterns like `bool should_free_this_pointer` can be translated to Rust's enum of `Owned`/`Borrowed`, but again it requires deducing which allocation is tied to which boolean, and what's the true safe scope of the borrowed variant.

reply
smolder
10 hours ago
[-]
It's not that simple. In fact it's impossible in some cases if you don't sprinkle unsafe everywhere and defeat the purpose. Rusts restrictions are so that it can be statically analyzed to guarantee safety. The superset of all allowable C program behaviors includes lots of things that are impossible to guarantee the safety of through static analysis.

Formally verified C involves sticking to a strict subset of the capabilities of C that is verifiable, much like Rust enforces, so it makes sense that programs meeting that standard could be translated.

reply
pizlonator
13 hours ago
[-]
Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

That’s just one example of how C code is nowhere near meeting Rust’s requirements. There are lots of others.

reply
orf
13 hours ago
[-]
> Rust’s ownership model forbids things like doubly linked lists, which C programs use a lot.

It’s literally in the standard library

https://doc.rust-lang.org/std/collections/struct.LinkedList....

reply
quuxplusone
12 hours ago
[-]
But it's not in C's standard library. So the exercise isn't merely to auto-translate one language's standard library to another language's standard library (say, replacing C++ std::list with Rust LinkedList) — which would already be very hard. The exercise here is to auto-identify-and-refactor idioms open-coded in one language, into idioms suited for the other language's already-written standard library.

Imagine refactoring your average C program to use GLib for all (all!) of its data structures. Now imagine doing that, but also translating it into Rust at the same time.

reply
Animats
11 hours ago
[-]
> The exercise here is to auto-identify-and-refactor idioms open-coded in one language, into idioms suited for the other language's already-written standard library.

That's what LLMs are for - idiom translation. You can't trust them to do it right, though.

[Pan et al . 2024] find that while GPT-4 generates code that is more idiomatic than C2Rust, only 61% of it is correct (i.e., compiles and produces the expected result), compared to 95% for C2Rust.

This problem needs both AI-type methods to help with the idioms and formal methods to insure that the guessed idioms correctly capture the semantics.

A big advance in this project is that they can usually translate C pointer arithmetic into Rust slices. That's progress on of one of the hardest parts of the problem. C2Rust did not do that. That system just generates unsafe raw pointer arithmetic, yielding ugly Rust code that replicates C pointer semantics using function calls.

DARPA is funding research in this area under the TRACTOR program. Program awards in April 2025, so this is just getting started. It's encouraging to see so much progress already. This looks do-able.

reply
fuhsnn
10 hours ago
[-]
>That's what LLMs are for - idiom translation. You can't trust them to do it right, though.

Optimizing C compilers also happened to be good at idiom recognition, and we can probably trust them a little more. The OP paper does mention future plan to use clang as well: >We have plans for a libclang-based frontend that consume actual C syntax.

If such transformation can be done at IR level it might be more efficient to be to C-IR > idiom transform to Rust-IR > run safe-checks in Rust-IR > continue compilation in C-IR or Rust-IR or combining both for better optimization properties.

reply
swiftcoder
8 hours ago
[-]
I'm definitely bullish on this angle of compiling C down to LLVM assembly, and then "decompiling" it back to Rust (with some reference to the original C to reconstruct high-level idioms like for loops)
reply
saghm
10 hours ago
[-]
Oh god, I can't even imagine trying to have formally-verified LLM-generated code. It's not surprising that even incremental progress for that would require quite a lot of ingenuity.
reply
CodesInChaos
3 hours ago
[-]
Why does C2Rust produce so much incorrect code? Getting 5% wrong sounds terrible, for a 1:1 translation to unsafe Rust. What does it mis-translate?

https://dl.acm.org/doi/pdf/10.1145/3597503.3639226

> As for C2Rust, the 5% unsuccessful translations were due to compilation errors, the majority of them caused by unused imports.

I'm rather confused by what that's supposed to mean, since unused imports cause warnings, not errors in Rust.

reply
immibis
9 hours ago
[-]
Actually, LLMs are for generating humorous nonsense. Putting them in charge of the world economy was not intended, but we did it anyway.
reply
dhosek
8 hours ago
[-]
Given that in my (small, employer-mandated) explorations with Copilot autocompletions it’s offered incorrect suggestions about a third of the time and seems to like to also suggest deprecated APIs, I’m skeptical about the current generation’s ability to be useful at even this small task.
reply
glouwbug
40 minutes ago
[-]
Sure but it takes two copilots to fly a plane
reply
singron
13 hours ago
[-]
This implementation uses unsafe. You can write a linked list in safe rust (e.g. using Rc), but it probably wouldn't resemble the one you write in C.

In practice, a little unsafe is usually fine. I only bring it up since the article is about translating to safe rust.

reply
orf
13 hours ago
[-]
Safe rust isn’t “rust code with absolutely 0 unsafe blocks in any possible code path, ever”. Rc uses unsafe code every time you construct one, for example.

Unsafe blocks are an escape hatch where you promise that some invariants the compiler cannot verify are in fact true. If the translated code were to use that collection, via its safe interfaces, it would still be “safe rust”.

More generally: it’s incorrect to say that the rust ownership model forbids X when it ships with an implementation of X, regardless of if and how it uses “unsafe” - especially if “unsafe” is a feature of the ownership model that helps implement it.

reply
andrewflnr
12 hours ago
[-]
No one here is confused about what unsafe means. The point is, they're not implemented by following Rust's ownership model, because Rust's ownership model does in fact forbid that kind of thing.

You can nitpick the meaning of "forbids", but as far as the current context is concerned, if you translate code that implements a doubly linked list (as opposed to using one from a library) into Rust, it's not going to work without unsafe. Or an index-based graph or something.

reply
oneshtein
11 hours ago
[-]
It's easy to implement doubly linked lists in safe Rust. Just ensure that every element has one OWNER, to avoid «use after free» bugs, or use a garbage collector, like a reference counter.

Unlike C++ or Rust, C has no references, only pointers, so developer must release memory manually at some arbitrary point. This is the problem and source of bugs.

reply
saghm
10 hours ago
[-]
While I might agree that it's easy if you use a reference counter, this is not going to be as performant as the typical linked list written in C, which is why the standard library uses unsafe for its implementation of stuff like this. If it were "easy" to just write correct `unsafe`, then it would be easy to do it in C as well.

Note that the converse to this isn't necessarily true! People I trust way more to write unsafe Rust code than me than me have argued that unsafe Rust can be harder than writing C in some ways due to having to uphold certain invariants that don't come up in C. While there are a number of blog posts on the topic that anyone interested can probably find fairly easily by googling "unsafe Rust harder than C", I'll break my usual rule of strongly preferring articles to video content to link a talk from youtube because the speaker is one of those people I mention who I'd trust more than me to write unsafe code and I remember seeing him give this talk at the meetup: https://www.youtube.com/watch?v=QAz-maaH0KM

reply
bonzini
5 hours ago
[-]
> unsafe Rust can be harder than writing C in some ways due to having to uphold certain invariants that don't come up in C.

Yes, this is absolutely correct and on top of this you sometimes have to employ tricks to make the compiler infer the right lifetime or type for the abstraction you're providing. On the other hand, again thanks to the abstraction power of Rust compared to C, you can test the resulting code way more easily using for example Miri.

reply
oconnor663
9 hours ago
[-]
More important than whether you use a little unsafe or a lot, is whether you can find a clean boundary above which everything can be safe. Something like a hash function or a block cipher can be piles and piles of assembly under the covers, but since the API is bytes-in-bytes-out, the safety concerns are minimal. On the other hand, memory-mapping a file is just one FFI function call, but the uncontrollable mutability of the whole thing tends to poison everything above it with unsafety.
reply
imtringued
5 hours ago
[-]
I don't really see it as a big "owning" of Rust that a complex pointer heavy structure with runtime defined ownership cannot be checked statically. Almost every language that people use doubly linked lists in has a GC, making the discussion kind of meaningless.

So C and C++ are the exceptions to the rule, but how do they make it easy to write doubly linked lists? Obviously, the key assumption is that that the developer makes sure that node->next->prev = node->prev->next = node (Ignoring nullptr).

With this restriction, you can safely write a doubly linked list even without reference counting.

However, this isn't true on the pointer level. The prev pointers could be pointing at the elements in a completely random order. For example tail->prev = head, head->prev = second_last and so on. So that going backwards from the tail is actually going forwards again!

Then there is also the problem of having a pointer from the outside of the linked list pointing directly at a node. You would need a weak pointer, because another pointer could have requested deletion from the linked list, while you're still holding a reference.

If you wanted to support this generic datastructure, rather than the doubly linked list you have in your head, then you would need reference counting in C/C++ as well!

What this tells you, is that Rust isn't restrictive enough to enforce these memory safe contracts. Anyone with access to the individual nodes could break the contract and make the code unsafe.

reply
pizlonator
12 hours ago
[-]
Good luck inferring how to use that from some C programmer’s deranged custom linked list.

C programmers don’t do linked lists by using libraries, they hand roll them, and often they are more complex than “just” a linked list. Lots of complex stuff out there.

reply
oneshtein
11 hours ago
[-]
Rus's ownership model doesn't forbid doubly linked lists. It forbids doubly owned lists, or, in other words, «use after free» bug.
reply
jerf
12 hours ago
[-]
That's a classic example of an argument that looks really good from the 30,000 foot view, but when you're actually on the ground... no, basically none of that beautiful idea can actually be manifested into reality.
reply
bloppe
13 hours ago
[-]
Is this sarcastic? There's a reason why the lifetime checker is so annoying to people with a lot of C experience. You absolutely cannot just use your familiar C coding styles in Rust.
reply
orf
13 hours ago
[-]
You’ve misread the comment.

The ownership model is close enough, but the way that model is expressed by the developer is completely arbitrary (and thus completely nuts).

reply
whatisyourwork
15 hours ago
[-]
It can be good as an interface language. Good for bindings.
reply
titzer
13 hours ago
[-]
Meh, you know people are just going to throw LLMs at it and they'll be fine with it hallucinating correctish code by the ton-load. But I agree that they are going to have tough time making idiomatic Rust from random C. Like I said, correct-ish.
reply
pizlonator
12 hours ago
[-]
Great way to introduce novel security vulnerabilities!

If that’s the Rust way, then I’m all for it. Will make it easier for Fil-C to have some epic kill shots.

reply
zoom6628
9 hours ago
[-]
I wonder how this compares to the zig-to-C translate function.

Zig seems to be awesome at creating mixed environs of zig for new code and C for old, and translating or interop, plus being a C compiler.

There must be some very good reasons why Linux kernel maintainers aren't looking to zig as a C replacement rather than Rust.

I don't know enough to even speculate so would appreciate those with more knowledge and experiencing weighing in.

reply
devjab
4 hours ago
[-]
As I understand it most kernel maintainers aren’t looking to replace C with anything.

Zig has much better interoperability with C than Rust, but it’s not memory safe or stable. I think we’ll see quite a lot of Zig adoption in the C world, but I don’t think it’s in direct competition with Rust as such. In my region of the world nobody is adopting Rust, the C++ people are remaining in C++. There was some interest in Rust originally but it never really caught on in any company I know of. Likely for the same reason Go has become huge in younger companies but will not really make its way into companies which are traditionally Java/C# because even if it made sense technically (and it probably doesn’t) it’s a huge change management task. Zig is seeing traction for programs without the need for dynamic memory allocation, but not much beyond that.

reply
ChristianJacobs
9 hours ago
[-]
> looking to zig as a C replacement rather than Rust

Rust isn't a "replacement for C", but an addition to it. It's a tool that Torvalds et. al. has recognised the value of and thus it's been allowed in the kernel. The majority of the kernel code will still be written in C.

I'm no kernel maintainer, but I can speculate that two of the main reasons for Rust over Zig are the compile time guarantees that the language provides being better as well as the rate of adoption. There is a lot of work done by many leading companies in the industry to provide Rust native code or maintained Rust bindings for their APIs. Windows devs are re-writing parts of _their_ kernel in Rust. There's a "movement" going on that has been going on for a while. I only hope it doesn't stop.

Maybe the maintainers feel like Zig doesn't give them enough over C to be worth the change? Many of them are still opposed to Rust as well.

reply
DonaldPShimoda
8 hours ago
[-]
> Rust isn't a "replacement for C"

Hmm I think to clarify I would say that Rust _is_ intended as a replacement for C in general, but that this isn't how the Linux kernel developers are choosing to use it.

As for why the kernel developers would choose Rust, I would think another one of the primary benefits is that the type system guarantees the absence of a wide class of memory-related errors that are prevalent in C, and this type system (as well as those of its predecessors) has been subjected to significant scrutiny by the academic community over the last couple of decades to help iron out problems. I suspect this is also part of why Rust has a relatively large and passionate community compared to other C alternatives.

reply
burakemir
7 hours ago
[-]
Agreed. The large and passionate community may have multiple factors but "things actually work" is probably a factor.

It is hard to get a full picture of how academic research influenced Rust and vice versa. Two examples:

- The use of linearity for tracking ownership in types has been known to academics but had never found its way into a mainstream language.

- researchers in programming language semantics pick Rust as a target of formalization, which was only possible because of design choices around type system. They were able to apply techniques that resulted from decades of trying to get a certified C. They have formalized parts of the standard library, including unsafe Rust, and found and fixed bugs.

So it seems fair to say that academic research on safety for C has contributed much to what makes Rust work today, and in ways that are not possible for C and C++ because these languages do not offer static guarantees where types Transport information about exclusive access to some part of memory.

reply
josefx
8 hours ago
[-]
> It's a tool that Torvalds et. al. has recognised the value of and thus it's been allowed in the kernel.

Has there actually been a successfull contribution to the mainline kernel? The last two big projects I heard of (ext2 / Apple drivers) seemed to have issues getting their code accepted.

reply
shakna
5 hours ago
[-]
rnull is in the kernel. And I believe one of the generic Realtek drivers is Rust as well.
reply
spiffyk
9 hours ago
[-]
Zig is nowhere near mature enough to be considered for the kernel yet. There are breaking changes to it regularly still - which is a good thing for Zig now, but not so good for huge, long-lived codebases like Linux. Also compiler bugs happen.

Saying this as someone who generally likes Zig's direction.

reply
3836293648
6 hours ago
[-]
Zig isn't 1.0 and has zero backcompat guarantees. It's also barely used anywhere and hasn't proven its value, even if parts of it may seem promising
reply
capitol_
9 hours ago
[-]
Maybe because zig isn't memory safe.
reply
Ar-Curunir
1 hour ago
[-]
Rust is stable and used by a number of big players, while Zig is not stable, and as a result hasn’t seen widespread adoption yet
reply
mbana
7 hours ago
[-]
Can something like `C2Rust` then use this to generate formally correct code?

Also, is much of the authors did manual or was it run through something to produce the Rust code? If so, where is the code that generates Rust, I do not see any links to any source repos.

reply
zozbot234
7 hours ago
[-]
> If so, where is the code that generates Rust, I do not see any links to any source repos.

The paper states that these developments will be released under open source licenses after the review process is completed, i.e. most likely, after the paper is formally published.

reply
p0w3n3d
6 hours ago
[-]
I wonder, if a C library is working (i.e. is not formally proven to be not having problems, but works in most ways) why shouldn't we translate it using rust unsafe? I would say there is a value in it as rust lacks of libraries generally. And this would not be different from using a dll/so that was written in c and can be unsafe in some circumstances after all
reply
jtrueb
15 hours ago
[-]
Interesting how higher optimization levels didn’t really help speed up rust in the O level comparison
reply
vlovich123
15 hours ago
[-]
As they say it’s likely that the code they’re outputting is pessimizing rustc’s ability. Namely it sounds like they’re inlining code as part of the conversion
reply
jtrueb
14 hours ago
[-]
Yes, I’m just saying how it kicks in basically immediately (O1).
reply
ljlolel
16 hours ago
[-]
I wonder how well O3 can do just compiling C to rust in one shot
reply
saagarjha
15 hours ago
[-]
Probably pretty bad.
reply
ojosilva
15 hours ago
[-]
Funny, I came here to say just the opposite, that I'm glad algorithmic computing is still a thing in research and that not everything is AI.

Ironically, AI is able to produce research-grade algorithms and will probably become an authority on the subject, helping take more traditional CS to the next level.

reply
Garlef
13 hours ago
[-]
I think it would make sense to evaluate if the the 'surgical' rewrites mentioned in the article can be carried out by or assisted by an LLM based process.
reply
sunshowers
13 hours ago
[-]
There's a lot of code in the world where correctness is a requirement. :)

I agree with the sibling -- I think LLMs may be able to help automate some parts of it, but humans are still 95% of it. At least for now.

reply
dmezzetti
6 hours ago
[-]
Interesting concept. But for a working system in C, why do we need to "convert" it to Rust. Seems like an effort where juice isn't worth the squeeze. Probably will create more problems than we're fixing.
reply
nickpsecurity
12 hours ago
[-]
The thing I wonder about is why we would do this. The technology to really convert industrial-grade apps from C to Rust could probably bullet proof the C apps more easily. They’d just have to do some analyses that fed into existing tooling, like static analyzers and test generators.

Similarly, it they might generate safe wrappers that let teams write new code in Rust side by side with the field-proven C. New code has the full benefits, old code is proven safe, and the interfaces are safer.

A full on translator might be an ideal option. We’d want one language for the codebase in the future. Push-button safety with low, false positives for existing C and C++ is still the greatest need, though. Maybe auto-correcting bad structure right in the C, too, like Google’s compiler tool and ForAllSecure’s Mayhem do.

reply
DonaldPShimoda
8 hours ago
[-]
> The technology to really convert industrial-grade apps from C to Rust could probably bullet proof the C apps more easily.

No, some C programs cannot be made safe. This can be due to dependency on undefined or unspecified behaviors, or it can be because introducing proper safety checks would limit the domain of possible inputs too much to be useful, or other things.

Translating to a safe language can maintain the expressive capabilities of the inputs while statically guaranteeing correct operation at run-time. It is objectively better in these cases.

> field-proven C

I don't think this exists, as the numerous critical vulnerabilities over the years have shown. All we have is C that seems to work pretty well often enough to be useful.

> old code is proven safe

Old code is assumed to be safe due to luck, actually. "Prove" has a specific meaning (especially on a post for a paper about proving things), and the overwhelming majority of C code is not proven to any rigorous mathematical standard. In contrast, the Rust type system has been mathematically proven to be correct.

> A full on translator might be an ideal option.

It depends on what you're willing to give up. If you don't mind losing performance, limiting your domain of inputs or range of outputs, giving up code legibility, and so on, then sure, this can probably be done to some extent. But when you start wanting your translator to be both sound and complete over all of these concerns, you run into problems.

reply
_flux
4 hours ago
[-]
> In contrast, the Rust type system has been mathematically proven to be correct.

Is this the case? E.g. the issue "Prove the Rust type system sound" https://github.com/rust-lang/rust/issues/9883 is closed with comment "This will be an open issue forever. Closing." in 2016: https://github.com/rust-lang/rust/issues/9883#issuecomment-2... .

At least nowadays (since 2022) we do have a language specification for Rust: https://ferrous-systems.com/blog/the-ferrocene-language-spec...

reply
aw1621107
2 hours ago
[-]
The closest thing is probably RustBelt [0], which proved the soundness of a subset of Rust that included borrowing/lifetimes. This was later extended to include relaxed memory accesses [1].

Neither of these papers include the trait system, unfortunately, and I'm not aware of another line of research that does (yet?).

[0]: https://dl.acm.org/doi/10.1145/3158154

[1]: https://plv.mpi-sws.org/rustbelt/rbrlx/paper.pdf

reply
uecker
4 hours ago
[-]
> No, some C programs cannot be made safe. This can be due to dependency on undefined or unspecified behaviors, or it can be because introducing proper safety checks would limit the domain of possible inputs too much to be useful, or other things.

You can certainly replace code using undefined behavior in C code by using defined constructs.

> I don't think this exists, as the numerous critical vulnerabilities over the years have shown. All we have is C that seems to work pretty well often enough to be useful.

I think this highly misleading. Some of the most reliable programs I know are written in C and Rust projects will also have critical vulnerabilities. Most vulnerabilities are not actually related to memory safety and the use of unsafe Rust will also lead to memory safety issues in Rust code. So I see some advantage to Rust but to me it is obviously overhyped.

reply
Alifatisk
16 hours ago
[-]
c2rust.com, but it uses things like libc::c_int
reply
love2read
16 hours ago
[-]
C2Rust is mentioned in the second paragraph of the related work section.
reply
rat87
9 hours ago
[-]
How is c2rust doing these days? For practical codebases?
reply
kelnos
6 hours ago
[-]
Ugh. They didn't compile any C to Rust. They modified the F*-to-C compiler to emit Rust instead. So they compiled F* to safe Rust. And they couldn't even do that 100% reliably; some valid F* constructs couldn't be translated into Rust properly. They could either translate it into Rust code that wouldn't compile, or translate it into similar-looking Rust code that would compile, but would produce incorrect results.

Flagged, this is just a lie of a title.

reply