This primitive we're trying to introduce is meant to make up for this shortcoming without having to introduce additional rules in the standard.
I'm not exposed to this space very often, so maybe you or someone else could give me some context. "Sabotage" is a deliberate effort to ruin/hinder something. Are compiler engineers deliberately hindering the efforts of cryptographers? If yes... is there a reason why? Some long-running feud or something?
Or, through the course of their efforts to make compilers faster/etc, are cryptographers just getting the "short end of the stick" so to speak? Perhaps forgotten about because the number of cryptographers is dwarfed by the number of non-cryptographers? (Or any other explanation that I'm unaware of?)
CPUs love to do branch prediction to have computation already performed in the case where it guesses the branch correctly, but cryptographic code needs equal performance no matter the input.
When a programmer asks for some register or memory location to be zeroed, they generally just want to be able to use a zero in some later operation and so it doesn’t really matter that a previous value was really overwritten. When a cryptographer does, they generally are trying to make it impossible to read the previous value. And they want to be able to have some guarantee that it wasn’t implicitly copied somewhere else in the interim.
Any side effect is a side channel. There are always going to be side channels in real code running on real hardware.
Sure you can change your code, compiler, or, or even hardware to account for this but at it's core that is security by obscurity.
https://www.intel.com/content/www/us/en/developer/articles/t...
Sure, you could run on some hypothetical OS that supports DOITM and insert syscalls around every manipulation of secret data. Yeah, right.
The whole design is ridiculous.
> for i = 1 to len(real_password) {
> if entered_password[i] != real_password[i] {
> return FAILURE
> }
> }
>
> return SUCCESS
OK now an alert attacker with the ability to very accurately record the time it takes to check the password can determine the length at least of the real password, because the time complexity of this check is O(length of the real password), and they could also gradually determine the password itself because the check would take longer as the attacker got each successive character correct.Taking this general idea and expanding it, there are lots of places where the timing of branches of code can leak information about some secret, so in cryptographic code in particular, it’s often beneficial to be able to ensure that two branches (the success and failure branches in the above) take exactly the same amount of time so the timing doesn’t leak information. So to fix the above you would probably want to do two things. Firstly set a boolean to failure and still continue the checking to ensure the “return failure quickly” problem doesn’t leak information and also change your password check to check against a fixed-width hash or something so the length of the password itself wasn’t a factor.
The problem is lots of performance optimizations (pipelining, branch prediction etc) work specifically against this goal- they aim to take branches quickly in the happy path of the code because normally that’s what you want to ensure optimal performance.
So say instead of the above I do
> bool status = SUCCESS
> for i = 1 to hash_length {
> if hash_of_entered_password[i] != hash_of_real_password[i] {
> status = FAILURE
> }
> }
>
> return status
…I don’t want the optimizer to realize that when status becomes FAILURE it can never become SUCCESS again and the loop doesn’t do anything else so just return early. I want it to actually run the pointless comparison of the rest of the hash so the timing is exactly the same each time.But now my check is constant time but I’ve shifted the burden onto the person who writes the hash function. That has to run in constant time or my check will once again leak. So in general people want the ability to tell the compiler that they want a particular piece of code to run in constant time. At the moment, in the general case I think you have to break into inline assembly to achieve this.
We should be asking our CPU vendors to support enabling a constant time mode of some sort for sensitive operations.
In some cases it might be necessary to consider the possibility of invalid memory accesses (and avoid the side-channels when doing so). (The example given in the article works around this issue, but I don't know if there are any situations where this will not help.)
> The CMOVcc instruction runs in time independent of its arguments in all current x86 architecture processors. This includes variants that load from memory. The load is performed before the condition is tested. Future versions of the architecture may introduce new addressing modes that do not exhibit this property.