CBMC: C bounded model checker (2021)
106 points
14 days ago
| 6 comments
| cprover.org
| HN
nanolith
13 days ago
[-]
I have been using CBMC to formally verify C for six, make that seven years now (edit: time flies). The latest release is actually pretty good.

The secret to model checking is to understand that you're building and verifying a specification in code. It works best if specifications are built function by function. For each function, I'll build "shadow functions" that simplify any functions it calls with a range of potential non-deterministic behavior that matches that function's contract. In this way, formal specifications can be built from the ground up by first verifying a lower-level function and then replacing it with a shadow function that mimics its behavior in a non-deterministic but SMT solver friendly way.

There are things that model checking doesn't do well. Anything with recursion or looping that can't be easily broken down will not perform well with a model checker. While algorithms like balanced binary trees or sorting can be broken down in a way that can be model checked, other algorithms may require the sort of induction one gets with constructive proofs to sufficiently verify. There is an art to this which will lead to style changes in C and a slightly different way of building up functions from pieces that can be trivially model checked. Honestly, that's not a bad thing.

For the parts that can't be easily model checked, I'm working on a C parser library that can be formally verified, which will import C directly into Coq or Lean. I consider that the "last mile" before formal methods in firmware and system software is to the point that it's generally available.

reply
tooltower
13 days ago
[-]
> [...] other algorithms may require the sort of induction one gets with constructive proofs to sufficiently verify. There is an art to this which will lead to style changes in C and a slightly different way of building up functions from pieces that can be trivially model checked.

Is there a book or article that talks more about this? I.e. how to write code in a way that is more amenable to model-checking (bounded or otherwise)?

reply
nanolith
13 days ago
[-]
Unfortunately, I have not found any good tutorials. If it helps, I plan on writing one very soon.
reply
drdrey
12 days ago
[-]
where can we follow you to hear more about this?
reply
zzz95
13 days ago
[-]
I am very curious about what kind of industry domains lets you have this kind of fun? Unless of course, you are in academia.
reply
nanolith
12 days ago
[-]
If the tool exists and has minimal overhead, I don't think it is a matter of permission but a matter of necessity. CBMC adds about 30% overhead over just unit testing, in my experience. It does require a different idiomatic programming style to use correctly, and there is a learning curve. Some intuition is required to learn how to use the tool effectively and efficiently, but this can be taught.

For system programming or firmware, it's incredibly useful. I've used this tool in conjunction with threat modeling and attack surface minimization to significantly improve the safety and security of software written in C. At this point, I would not consider working on any C project that did not make use of this tool.

reply
rurban
13 days ago
[-]
Not fun, a serious tool, and extremely easy to use, unlike other formal methods. It works with your source code, not on some rewritten abstraction of it.

Embedded, automotive, space use it regularly.

I've setup cbmc and goto-analyzer for string searching algos here https://github.com/rurban/smart and really found some bugs in old published algos.

reply
xxmarkuski
14 days ago
[-]
CBMC ist pretty cool, I've used it in my bachelor's thesis for verification of fairness properties in proportional voting. Another use case I've seen is using CBMC in multiparty computation. What you have there is methods that work on boolean circuits. Flange CBMC in front of this and you got multiparty computation of C code.
reply
quotemstr
14 days ago
[-]
Wouldn't this project be more usefully done at the LLVM IR level so we could plug in any existing LLVM frontend and gain full support for language features like C++ templates?
reply
nickpsecurity
14 days ago
[-]
Another problem with LLVM I’ve heard about is that it’s intermediate language or API or something is a moving, informally-specified target. People who know LLVM internals might weigh in on that claim. If true, it’s actually easier to target C or a subset of Rust just because it’s static and well-understood.

Two projects sought to mitigate these issues by going in different directions. One was a compiler backend that aimed to be easy to learn with well-specified IL. The other aimed to formalize LLVM’s IL.

http://c9x.me/compile/

https://github.com/AliveToolkit/alive2

There have also been typed, assembly languages to support verification from groups like FLINT. One can also compile language-specific analysis with a certified to LLVM IL compiler. Integrating pieces from different languages can have risks. That (IIRC) is being mitigated by people doing secure, abstract compilation.

reply
sweetjuly
13 days ago
[-]
Yes, the textual IR format is not an API. It's an internal implementation detail which changes somewhat rapidly. It's well specified but there's no guarantees that if you write some IR which works in LLVM 16 that it'll work as expected in LLVM 17, etc. Inside of LLVM itself IR and its related manipulation and analysis functions are "API" in so far as there's an expectation that if you change how the IR works that you won't break existing in-tree passes/that you'll work with maintainers to fix them.
reply
nickpsecurity
13 days ago
[-]
That makes more sense. Thanks!
reply
flohofwoe
14 days ago
[-]
My guess is that LLVM IR preserves too little semantic information to implement such a feature.
reply
tkz1312
14 days ago
[-]
klee is a similar tool that operates on llvm bytecode: http://klee-se.org/
reply
xvilka
13 days ago
[-]
It doesn't work on real software though. Anything non-trivial.
reply
rjmunro
14 days ago
[-]
I think the LLVM level doesn't work as it throws away too much semantic information, but much of the CLANG front end can be used via it's API.

There is a fork of CBMC called ESBMC that does this and many other changes: https://ssvlab.github.io/esbmc/documentation.html

reply
quotemstr
14 days ago
[-]
I'm curious about what relevant details get thrown away in translation to LLVM IR and that we can't add back with annotations. It's a shame that it's not possible to use these analysis tools on modern C++ codebases.
reply
rocqua
14 days ago
[-]
Undefined behavior around void pointers springs to mind.
reply
SnowflakeOnIce
14 days ago
[-]
It predates LLVM. That's one reason, anyway.
reply
kldx
13 days ago
[-]
VCLLVM was presented at ETAPS 2024. The goal is to compile both languages and specifications to LLVM IR.
reply
IshKebab
14 days ago
[-]
I've used this before. It's pretty great. Also used as the backend for Kani I believe (Amazon's Rust formal verification tool).
reply
mrnoone
13 days ago
[-]
Some other tool (Coq-based) to formally verify unsafe Rust https://gitlab.mpi-sws.org/lgaeher/refinedrust-dev
reply
PartiallyTyped
14 days ago
[-]
This is achieved by latching on to rustc, and then compiling to gotoc, which is then used with CBMC.

Removed due to my misunderstanding:

~~I don't think Kani counts as a "formal" verification tool, emphasis on "formal", but it is a verification tool for rust code and is powered by CBMC.~~

reply
IshKebab
14 days ago
[-]
I'm not sure what you're implying. Kani absolutely is formal verification. CBMC stands for C Bounded Model Checker and model checking is one of the main forms of formal verification.
reply
PartiallyTyped
14 days ago
[-]
Edit:

Seems like I misread the documentation...

~~What I was implying is that Kani just can't figure out all of the paths;~~ ~~Consider this example from Kani documentation [1]:~~

    fn estimate_size(x: u32) -> u32 {
       if x < 256 {
            if x < 128 {
                return 1;
            } else {
                return 3;
            }
        } else if x < 1024 {
            if x > 1022 {
                panic!("Oh no, a failing corner case!");
            } else {
                return 5;
            }
        } else {
            if x < 2048 {
                return 7;
            } else {
                return 9;
            }
        }
    }
~~Kani is very unlikely to find x=1023.~~

https://model-checking.github.io/kani/tutorial-first-steps.h...

reply
accelbred
14 days ago
[-]
Kani will find 1023 unless you constrain the input. The link you posted is saying a property test is unlikely to find it but Kani will find it.
reply
IshKebab
14 days ago
[-]
Yeah you misread that. Read it again, it's saying normal fuzzing is unlikely to find it but Kani finds it instantly.
reply
PartiallyTyped
14 days ago
[-]
Oh this is embarrassing.
reply
ranger_danger
14 days ago
[-]
How does this compare to scan-build, clazy etc.?
reply
IshKebab
14 days ago
[-]
Those are different kinds of tools. Clazy is pretty much a linter. Scan-build is a traditional static analyser. It tries to find bugs or possible bugs but it's not trying to disprove the presence of bugs.

CBMC is a formal verification tool which tries to prove that there aren't any bugs (where "bugs" means things like UB).

reply
PartiallyTyped
14 days ago
[-]
CBMC also powers Kani <3
reply