cmov: implement constant-time equality comparisons#873
cmov: implement constant-time equality comparisons#873tarcieri merged 11 commits intoRustCrypto:masterfrom
Conversation
|
Looks great at first glance. I can take a more detailed look later.
The portable implementation could borrow Though I agree finding some way to get all of this work upstream into |
Thank you! I'll take a look into the volatile write, and also take a look at how I don't assume it'd be overly difficult, and it seems very beneficial so I'm all for it. |
|
I have some very promising news! Using the Here's the benchmark. Obviously my methodology could be flawed, or I'm unsure why |
|
FWIW Intel guarantees the CMOV family of instructions is constant-time on all extant CPUs, and the use of inline assembly should ensure LLVM does not alter the emitted assembly code |
I recall seeing it in the crate docs, but it never hurts to validate the implementation I suppose (even if just a sanity check). I wonder how this will act on
Which part of the portable implementation were you referring to here? I had a look but couldn't quite find where it'd fit in. On another note, I ran the benches using the portable backend, and it also looks promising. It's not collecting the full 100M measurements here either - I'm not too sure what's going on, but I assume there's enough to be reliable. |
|
Was just adding |
|
@brxken128 is there a minimal repro of that? I'd like to take a look in Godbolt |
This should do it: use std::arch::asm;
macro_rules! cmov_eq {
($instruction:expr, $lhs:expr, $rhs:expr, $condition:expr, $dst:expr) => {
let mut tmp = *$dst as u16;
unsafe {
asm! {
"xor {0:e}, {1:e}",
$instruction,
in(reg) *$lhs,
in(reg) *$rhs,
inlateout(reg) tmp,
in(reg) $condition as u16,
options(pure, nomem, nostack),
};
};
*$dst = tmp as u8;
};
}
pub fn cmovne_u32(src: &u32, rhs: &u32, input: u8, output: &mut u8) {
cmov_eq!("cmovnz {2:e}, {3:e}", src, rhs, input, output);
}
pub fn main() {
let mut dst = 1u8;
cmovne_u32(&1, &2, 0u8, &mut dst);
}Inlining |
|
@brxken128 I went through in godbolt and didn't notice anything broken in particular. It looked like it inlined the What's failing exactly? |
I had Once I got to the Take this impl from impl CmovEq for u32 {
fn cmovne(&self, rhs: &Self, input: Condition, output: &mut Condition) {
cmov_eq!("cmovnz {2:e}, {3:e}", self, rhs, input, output);
}
}This works great, until I'm unsure if this failing is down to an error in my code, an error with Rust, or an error with even my testing methodology but I don't think inline-always should be causing something such as this - everything works great without it. |
|
It'd be good to get the whole test into godbolt so you can see the difference in the emitted assembly |
It runs fine in godbolt, but not locally. Extremely odd. I compared the two ASM outputs and don't see anything awry. I'll commit the changes, push and see if CI shows similar failures. Edit: it's reproducible at least haha It appears to happen here: cond.cmovne(&cond, 0, &mut o);
assert_eq!(o, cond as u8);
If the It only happens on Replacing |
|
If you can just compose it now in terms of the existing CMOV instructions (i.e. not define any additional assembly), doing XOR in Rust and using e.g. We could investigate doing optimized assembly as a separate pass. |
|
Done as of my latest commit - I'll keep the ASM for Is the macro usage fine? I changed quite a bit in 0df5579 and I'm not sure if it's ideal. I'm also unsure over 59cad3d (dereferencing within the macros), I can easily revert both if it'd be better! Edit: it may also be wise to cancel this action, it appears there's no end in sight. |
Following on from #872, this is heavily a WIP, and a lot of the implementation is sub-par.
I cleaned the macro usage up, although it seemed intentionally designed that way so I can revert those changes if needs be.
I'm not 100% sure on the portable implementation, nor am I sure that
EORis the most suitable instruction for Aarch64 (but it works).Additionally, I think we should note that direct usage won't provide the most ideal outcomes - I believe the compiler will still attempt to optimise in some places, although I'm not 100% sure on that. Using this crate within
subtle(or similar) sounds the most suitable to me.Adding benches via
dudect_benchercould be very handy, just as a sanity-check (but I'd be happy to hear any ideas regarding that).Edit: in hindsight, I'm not sure that dereferencing within the macros is the best approach either (implementation-wise).
Feel free to rip this work to shreds - it's best that we get it right and make no mistakes 🙏