FilterHN

Constant-time support coming to LLVM: Protecting cryptographic code

76 points

by ahlCVA

20 hours ago

| past

| 5 comments

| blog.trailofbits.com

| HN

https://archive.is/tIXt7

▲

frabert

20 hours ago

[-]

This has been a sore point in a lot of discussions regarding compiler optimizations and cryptographic code, how compilers and compiler engineers are sabotaging the efforts of cryptographers in making sure there are no side-channels in their code. The issue has never been the compiler, and has always been the language: there was never a way to express the right intention from within C (or most other languages, really).

This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.

▲

fanf2

9 hours ago

[-]

What happened to the blog post? It was moved and now it has disappeared :-(

▲

foresto

3 hours ago

[-]

https://web.archive.org/web/20251125224147/https://blog.trai...

▲

Meneth

8 hours ago

[-]

Archived copy here: https://archive.is/tIXt7

▲

Asooka

31 minutes ago

[-]

There really ought to be a subset of C that lets you write portable assembly. One where only a defined set of optimisations are allowed and required to be performed, "inline" means always inline, the "register" and "auto" keywords have their original meanings, every stack variable is allocated unless otherwise indicated, every expression has defined evaluation order, every read/write from/to an address is carried out, nothing is ever reordered, and undefined behaviour is switched to machine-specific behaviour. Currently if you need that level of control, your only option is writing it in assembly, which gets painful when you need to support multiple architectures, or want fancy features like autocomplete or structs and functions.

▲

delta_p_delta_x

1 minute ago

[-]

[delayed]

▲

saagarjha

8 minutes ago

[-]

Why would you like register and auto to have meaning?

▲

jfindper

11 hours ago

[-]

>how compilers and compiler engineers are sabotaging the efforts of cryptographers

I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?

Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)

▲

chowells

11 hours ago

[-]

It's more a viewpoint thing. Any construct cryptographers find that runs in constant time is something that could be optimized to run faster for non-cryptographic code. Constant-time constructs essentially are optimizer bug reports. There is always the danger that by popularizing a technique you are drawing the attention of a compiler contributor who wants to speed up a benchmark of that same construct in non-cryptographic code. So maybe it's not intended as sabotage, but it can sure feel that way when everything you do is explicitly targeted to be changed after you do it.

▲

stouset

11 hours ago

[-]

It’s not intentional. The motivations of CPU designers, compiler writers, and optimizers are at odds with those of cryptographers. The former want to use every trick possible to squeeze out additional performance in the most common cases, while the latter absolutely require indistinguishable performance across all possibilities.

CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.

When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.

▲

layer8

10 hours ago

[-]

“Sabotage” can be used in a figurative sense that doesn’t insinuate intent. An adjacent example is “self-sabotage”, which doesn’t imply intent.

▲

layer8

8 hours ago

[-]

Since the sibling comment is dead and thus I can’t reply to it: Search for “unintentional sabotage”, which should illustrate the usage. Despite appearances, it isn’t an oxymoron. See also meaning 3a on https://www.merriam-webster.com/dictionary/sabotage.

▲

fooker

11 hours ago

[-]

> making sure there are no side-channels in their code

Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.

Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.

▲

amluto

9 hours ago

[-]

Too bad that Intel chips more or less reserve the right to take LLVM’s nice output and make it non-constant-time anyway. See:

https://www.intel.com/content/www/us/en/developer/articles/t...

Sure, you could run on some hypothetical OS that supports DOITM and insert syscalls around every manipulation of secret data. Yeah, right.

▲

JoshTriplett

8 hours ago

[-]

Last I saw, it seemed like the plan was to unconditionally enable it, and on the off chance there's ever a piece of hardware where it's a substantial performance win, offer a way to opt out of it.

▲

amluto

6 hours ago

[-]

I advocated for that, and then I completely lost track of the status.

The whole design is ridiculous.

▲

stingraycharles

8 hours ago

[-]

Sorry, I may be missing the point here, but reading that page doesn’t immediately make it obvious to me what that feature is. Is it some constant time execution mechanism that you can enable / disable on a per-thread basis to do… what exactly?

▲

seanhunter

44 minutes ago

[-]

As a concrete example, say I have a (very naive and bad) password-checker that works like this pseudocode:

  > for i = 1 to len(real_password) {
  >   if entered_password[i] != real_password[i] {
  >      return FAILURE
  >   }
  > }
  >
  > return SUCCESS

OK now an alert attacker with the ability to very accurately record the time it takes to check the password can determine the length at least of the real password, because the time complexity of this check is O(length of the real password), and they could also gradually determine the password itself because the check would take longer as the attacker got each successive character correct.

Taking this general idea and expanding it, there are lots of places where the timing of branches of code can leak information about some secret, so in cryptographic code in particular, it’s often beneficial to be able to ensure that two branches (the success and failure branches in the above) take exactly the same amount of time so the timing doesn’t leak information. So to fix the above you would probably want to do two things. Firstly set a boolean to failure and still continue the checking to ensure the “return failure quickly” problem doesn’t leak information and also change your password check to check against a fixed-width hash or something so the length of the password itself wasn’t a factor.

The problem is lots of performance optimizations (pipelining, branch prediction etc) work specifically against this goal- they aim to take branches quickly in the happy path of the code because normally that’s what you want to ensure optimal performance.

So say instead of the above I do

  > bool status = SUCCESS
  > for i = 1 to hash_length {
  >   if hash_of_entered_password[i] != hash_of_real_password[i] {
  >      status = FAILURE
  >   }
  > }
  >
  > return status

…I don’t want the optimizer to realize that when status becomes FAILURE it can never become SUCCESS again and the loop doesn’t do anything else so just return early. I want it to actually run the pointless comparison of the rest of the hash so the timing is exactly the same each time.

But now my check is constant time but I’ve shifted the burden onto the person who writes the hash function. That has to run in constant time or my check will once again leak. So in general people want the ability to tell the compiler that they want a particular piece of code to run in constant time. At the moment, in the general case I think you have to break into inline assembly to achieve this.

▲

wat10000

7 hours ago

[-]

It turns off CPU features that could cause execution time to vary in a way that depends on the data being operated on.

▲

ethin

10 hours ago

[-]

So this makes me curious: is there a reason we don't do something like a __builtin_ct_begin()/__builtin_ct_end() set of intrinsics? Where the begin intrinsic begins a constant-time code region, and all code within that region must be constant-time, and that region must be ended with an end() call? I'm not too familiar with compiler intrinsics or how these things work so thought I'd ask. The intrinsic could be scoped such that the compiler can use it's implementation-defined behavior freedom to enforce the begin/end pairs. But Idk, maybe this isn't feasible?

▲

zzo38computer

10 hours ago

[-]

Maybe it might be better to be implemented as a function attribute instead

▲

ethin

9 hours ago

[-]

Or a pragma? Like how OpenMP did it?

▲

charcircuit

3 hours ago

[-]

These are meaningless without guarantees that the processor will run the instructions in constant time and not run the code as fast as possible. Claims like cmov on x86 always being constant time are dangerous because a microcode update could change that to not be the case anymore. Programmers want an actual guarantee that the code will take the same amount of time.

We should be asking our CPU vendors to support enabling a constant time mode of some sort for sensitive operations.

▲

zzo38computer

11 hours ago

[-]

I think __builtin_ct_select and __builtin_ct_expr would be good ideas. (They could also be implemented in GCC in future, as well as LLVM.)

In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)

▲

connicpu

10 hours ago

[-]

The side channel from memory access timings are exactly why cmov is its own instruction on x86_64. It retrieves the memory regardless of the condition value. Anything else would change the timings based on condition. If you're going to segfault that's going to be visible to an attacker regardless because you're going to hang up.

▲

wahern

10 hours ago

[-]

AFAIU, cmov wasn't originally intended to be a guaranteed constant-time operation, Intel and AMD won't commit to keeping it constant-time in the future, but it just so happened that at one point it was implemented in constant-time across CPUs, cryptographers picked up on this and began using it, and now Intel and AMD tacitly recognize this dependency. See, e.g., https://www.intel.com/content/www/us/en/developer/articles/t...

> The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.

▲

zzo38computer

10 hours ago

[-]

I mean the possibility that the rest of the program guarantees that the address is valid if the condition is true but otherwise it might be valid or invalid. This is probably not important for most applications, but I don't know if there are some unusual ones where it would matter.