Add length field to sha256::Midstate#3010
Conversation
Kixunil
left a comment
There was a problem hiding this comment.
Definitely broken things about this PR:
- Doesn't change
from_midstateto remove the argument which was the entire point. - Doesn't maintain the invariant
Borrowissue.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Funny, this says sha256d but it's in non-d module. I'd say it was broken already.
What kinda annoys me that Satoshi did some weird thing and we've added it to midstate as well. Is this standardized across Bitcoin ecosystem?
There was a problem hiding this comment.
If the midstate is serialized backward then probably it's because we do this in Elements. I'm not sure where else (if anywhere) midstates are serialized in bitcoin. I'm a little hesitant to change it now, though now that we've moved to "you should be using newtypes for all hashes that mean something" I don't feel very strongly.
There was a problem hiding this comment.
Ok, I honestly thought serializing midstate is so obscure nobody is doing it in practice. Let's keep it.
There was a problem hiding this comment.
Well, properly speaking the "midstate" is actually an asset ID, and we have a newtype for it. But the newtype Display impl just passes through to the underlying midstate type.
We have unit tests for this, so I think that if we're breaking the API anyway we could take the opportunity to also change the direction of serialization (and then rust-elements will need to do a bit of extra work). But I'd rather not.
I suppose, we could change the comment on this field to reflect that it's about Elements' asset IDs rather than directly about sha256d ... but it's all the same original sin of storing hashes using a LE uint256 type.
There was a problem hiding this comment.
Do we even need to be able to Display a Midstate, can we just Debug and have to/from and getters and be done with it?
There was a problem hiding this comment.
Flagging this as unresolved however I hacked it up based on my personal understanding/idea that Midstate is super niche, we should just provide into_parts and Debug and let niche users do what they want with it.
There was a problem hiding this comment.
Nice, concept ACK dropping Display in favor of into_parts (and a pair of specific accessors, for convenience) and debug.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Nope, We can't have both comparison traits and Borrow since they don't act the same. I think we should just remove Borrow.
There was a problem hiding this comment.
Oh of course, because borrow returns just the bytes and equivalence uses the whole struct. Remove Borrow and keep AsRef<[u8]>, right?
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Should panic if length is not multiple of 64.
There was a problem hiding this comment.
Ah the whole Pr does not enforce the invariant, that was intentional, I should have explained that explicitly.
There was a problem hiding this comment.
Sorry man, I feel like I abused your time by pushing this up without a better explanation of the state, I'll try to be better.
11c56ad to
f8a950e
Compare
|
I've broken the PR up into super small pieces so my actions are super explicit and we can discuss each separately. |
|
@Kixunil if I'm using/understanding
phew |
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Sounds too gatekeepish to me. Having at least an example where this is useful (tags) would be nicer.
There was a problem hiding this comment.
/// The [`Midstate`] type is obscure and specialized and should not be used unless you are sure of what you are d
doing.
///
/// It represents "partially hashed data" but does not itself have properties of cryptographic hashes. For example,
/// when (ab)used as hashes, midstates are vulnerable to trivial length-extension attacks. They are typically used
/// to optimize the computation of full hashes. For example, when implementing BIP-340 tagged hashes, which
/// always begin by hashing the same fixed 64-byte prefix, it makes sense to hash the prefix once, store
/// the midstate as a constant, and hash any future data starting from the constant rather than from a fresh
/// hash engine.
There was a problem hiding this comment.
Yeah, that sounds good. Plus a note about not really needing to do that manually for BIP-340 since our sha256t_newtype macro does the hashing for you in const context.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
A better name would be appreciated. E.g. midstate_unchecked.
There was a problem hiding this comment.
Maybe midstate_inner or midstate_internal.
There was a problem hiding this comment.
The method doesn't perform % 64 check, so I thin _unchecked is appropriate.
There was a problem hiding this comment.
Damn, I saw this style in another crate and hoped I'd get it past you :) Will fix.
There was a problem hiding this comment.
If it's our crate we should fix it. If it's a foreign crate it's a tiny little red flag. But really a very little one.
There was a problem hiding this comment.
Note please, at first I had midstate_internal but PR now uses midstate_unchecked, I think it helps at internal call sites.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
In this case I think it'd be fine to call this just new. If users are calling this with a non-multiple-of-64 length that is a programming error. Typically the length should even be a known fixed constant. Typically 64.
We could also add a try_new variant which returns an Option instead of panicking, though I'm unsure how useful it would be.
There was a problem hiding this comment.
As an aside, we should have an accessor for the length. Maybe fn n_bytes_hashed or n_bits_hashed or something. Because AFAICT currently users can't even tell how many bytes they've hashed, and we don't want to encourage users to construct whole midstates just to get this information.
There was a problem hiding this comment.
If users are calling this with a non-multiple-of-64 length that is a programming error.
Yes, I think so.
currently users can't even tell how many bytes they've hashed,
They can count manually externally but it's arguably annoying and wasteful, so concept ACK.
There was a problem hiding this comment.
We have HashEngine::n_bytes_hashed() and also, with this PR applied, HashEngine::can_extract_midstate() (now const thanks to review).
hashes/src/sha256.rs
Outdated
f8a950e to
0d30a42
Compare
|
If you have the time or the inclination could you two retro the process in this PR. It was mildly complex work with differing views, and required discussion. Was there anything that I/we could have done to make the process better?
|
|
My usual process with draft PRs is to ignore them. In this case, from the title, it looked like a small-diff change that I'd be willing to review even on the assumption that the code I was reading was going to be thrown away. And it's relevant to I think the process is fine -- though we are at 34 comments and once these things get beyond 50 or so I find they become impossible to keep track of, and then I just wait for somebody else to ACK before doing my own review. One thing I like is that this PR just does one thing. You can imagine that if it did multiple things, the 34 comments could easily become many times that amount, and probably we'd fail to discuss some important things that got lost in the noise. |
I generally quickly skim over the high-level ideas and some key things in the code and comment on those, ignore the rest until the PR is out of draft.
I do this kind of review process even if I know this because I assume the previous PR might change. However this specific PR looked like a real draft to me because of how different it was from the proposal. |
|
Cool, thanks. So I'm getting:
Thanks fellas |
|
This PR is IMO ready, just waiting on #3009 to merge. |
Perhaps this is should be a sign to me that the PR needs breaking up into smaller chunks? |
Yeah. Not always -- sometimes we hit a gazillion comments over the same philosophical point (but then that's a sign to all of us that we should split off into an issue/discussion) -- but usually when PRs have too many comments it's because there are multiple issues at play, and if we split up the PR we could solve them one by one. |
0d30a42 to
bf2f8e2
Compare
bf2f8e2 to
436e150
Compare
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Why do we even have two blocks? If this PR didn't have 10 commits I would've suggested joining the blocks instead.
There was a problem hiding this comment.
I've looked at that more than once over the years, I believe its because the process() function is so low level and long that its better having it down the bottom. Obviously I didn't write it though. @apoelstra ?
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Shouldn't this say something like "Unfinalized output"?
There was a problem hiding this comment.
Should this function even be public? Is there a usecase outside of Midstate::hash_tag?
There was a problem hiding this comment.
It's not a function, it's the Midstate struct.
There was a problem hiding this comment.
I've got no idea what that message means :) I created an issue because its slightly out of scope for this PR.
There was a problem hiding this comment.
oooo, I'm the goose, I thought we were talking about docs on the const_hash function :( I missed line 140/152
There was a problem hiding this comment.
Added a trailing patch, I don't actually know what finalizing means so I wrote that in the commit log. Is it just "check digest is not all zero" done in sha256::Hash::from_engine or is it a crypto concept?
There was a problem hiding this comment.
The not all zeroes is a fuzzing-related hack. Finalization is a crypto concept required for a hash to be secure.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Should rather say something like "macro will create the midstate for you in const context and use it for tagged hashing".
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Almost makes it sound as if this field must be a multiple of 64 but actually this field is guaranteed to not be multiple of 64.
There was a problem hiding this comment.
Thanks, I've improved the comment and write! output.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
This kinda defeats the point of the change. One should define static MIDSTATE: Midstate = Midstate::new(...); so that it can be passed here without a panic. This is a test code but it may inspire people to less ideal things, so I'd prefer it to be fixed or a good example provided in the doc.
There was a problem hiding this comment.
I'm skeptical about trying to hyperoptimize codegen in unit tests. In practice I do not believe there would be a measurable difference in code size (unless the Midstate::new function could be compiled out entirely which requires 100% adherence in all dependencies, ok) nor in performance (where we have a branch that can be predicted 100% of the time since it's checking whether a constant value has some low bits clear).
But yes, in example code we ought to use a static or const or whatever, to emphasize that you almost certainly want to be calling this function with a fixed pre-vetted constant and not trying to be clever.
There was a problem hiding this comment.
That's not really the point. If it's the best way to do so in examples it's probably the best in tests as well. If not for any other reason, at least to not give readers bad ideas.
The current doc doesn't have an example demonstrating proper use so people may look into the code and find this.
There was a problem hiding this comment.
If it's the best way to do so in examples it's probably the best in tests as well.
This is definitely not true in general. The point of tests are to find edge cases and abuse the API to the extent allowable (and maybe beyond that).
In this case, sure, we can make the test a little noisier if it makes you happy. But I don't believe that anybody who somehow finds this type without knowing what they're doing, reads through the "don't use this type unless you know what you're doing" docs, then goes digging through unit tests to find "example usage", will benefit from us trying to microoptimize their code by proxy.
There was a problem hiding this comment.
Fair point given this is niche. I would like to avoid us making it a habit since I found that better code also makes refactoring tests easier (typically not using vec! like many people love to do, which is unneeded in maybe 99% of cases). I'd still really like having an example in the doc.
Not something I would hold a PR for (I would've clicked "request changes" otherwise).
There was a problem hiding this comment.
I added a patch using static in one of the unit tests.
There was a problem hiding this comment.
FWIW vec! used to be needed in Rust 1.22. Possibly also in 1.41. Because bare arrays did not implement IntoIterator and using slices meant that you had to stick iter().copied() onto the end (or rather, .iter().map(|x| *x) because .copied is pretty new as well.
Anyway clippy warns about this now, so this is an historic wart.
b3986af to
bac337b
Compare
Explicitly import `sha256t` in docs builds and remove explicit link target. This patch is code churn on its own but the `sha256t` module will be used again in proceeding patches, done separately to reduce the size/complexity of proceeding patches.
|
No acks on this yet so rebased to get the CI semver fix recently merged. |
|
🚨 API BREAKING CHANGE DETECTED |
hashes/src/sha256.rs
Outdated
| /// | ||
| /// If `length` is not a multiple of the block size. | ||
| pub fn from_midstate(midstate: Midstate, length: usize) -> HashEngine { | ||
| assert!(length % BLOCK_SIZE == 0, "length is no multiple of the block size"); |
There was a problem hiding this comment.
Isn't this kinda slangish? "not a multiple" would sound more natural to me but I don't know English enough to tell and even it I'm right, I'd prefer it in a followup PR unless it needs to be rebased anyway.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
Do you mean Midstate invariant violated?
There was a problem hiding this comment.
Yes, "Midstate invariant ..." reads better, thanks.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
This doesn't do anything interesting. A better usage would have const TAP_LEAF_MIDSTATE: Midstate = ...; which then gets passed around.
hashes/src/sha256.rs
Outdated
There was a problem hiding this comment.
FYI hashes get finalized at the end by inputting a bunch of zeroes (you can see it in the Hash::from_midstate method). Midstate doesn't do this so it isn't protected against length extension attacks as mentioned in the doc.
There was a problem hiding this comment.
Ah yes, in Hash::from_engine - thanks man. So back in the day when I tried to move the logic in Hash::from_engine to HashEngine::finalize I had the name correct, nice :) (I think you were away when I was doing that.)
In preparation for patching the `Debug` implementation of `Midstate` and a regression test.
Done in preparation for adding a `length` field to `Midstate` and also in preparation for removing the `Display` implementation (will be justified in the patch that does it). Currently in the `Debug` impl of `Midstate` we are calling through to `Display` using the alternate form of printing, we can use `as_hex` to achieve almost the same thing. Note that in `LowerHex` we use the `fmt_hex_exact` macro that allows us to reverse the iterator however when we later attempt to use `f.debug_struct` we cannot use the macro. Elect to change the current behaviour to `Debug` forwards, shown by the change to the regression test.
As is convention in this repo use the third person when describing the `sha256::HashEngine::from_midstate` function.
In a `HashEngine` the `length` field represents number of bytes input into the hash engine. Note also: > the midstate bytes are only updated when the compression function is run, which only happens every 64 bytes. Currently our midstate API allows extracting the midstate after any amount of input bytes, this is probably not what users want. Note also that most users should not be using the midstate API anyways. With all this in mind, add a private `length` field to the `Midstate` struct and enforce an invariant that it is modulo 64. Add a single const `Midstate` constructor that panics if the invariant is violated. The `Midstate` is niche enough that panic is acceptable. Remove the `from_slice`, `from_byte_array`, and `to_byte_array` functions because they no longer make sense. Keep `AsRef<[u8]>` for cheap access to the midstate's inner byte slice. Note change to `Debug`: `bytes` field now does not include the `0x` prefix because `as_hex` because of the use of `debug_struct`. Enjoy nice warm fuzzy feeling from hacking on crypto code.
As a bit more of an example of how to use the `sha256::Midstate` use a `static` in one of the unit tests.
The midstate has not been finalized [0], so use the term in the struct header. FTR I don't know _exactly_ what "finalized" means in the context of sha256 hashing (or hashing in general). This change came from a review suggestion and we have other mentions of "finalized" in the code.
288661f to
98fe617
Compare
|
Force push is all review suggestions from @Kixunil |
|
🚨 API BREAKING CHANGE DETECTED |
| pub const fn as_parts(&self) -> (&[u8; 32], usize) { (&self.bytes, self.length) } | ||
|
|
||
| /// Deconstructs the [`Midstate`], returning the underlying byte array and number of bytes hashed. | ||
| pub const fn into_parts(self) -> ([u8; 32], usize) { (self.bytes, self.length) } |
There was a problem hiding this comment.
The type is Copy, so it should be to_parts.
There was a problem hiding this comment.
Ah correct, tracking issue created!
| fn engine_with_state() { | ||
| let mut engine = sha256::Hash::engine(); | ||
| let midstate_engine = sha256::HashEngine::from_midstate(engine.midstate(), 0); | ||
| let midstate_engine = sha256::HashEngine::from_midstate(engine.midstate_unchecked()); |
There was a problem hiding this comment.
Why is this not .midstate().unwrap()?
There was a problem hiding this comment.
I read this as randomly/adhoc choosing which one to call in the tests, so that both are covered.
There was a problem hiding this comment.
The checked version calls into unchecked so it shouldn't be needed? Also it's better to test public API than private (but the private will probably stay)
|
@tcharding let me know when I should test this. It takes many hours to test 12 commits and every time you force-push I have to throw away the work. |
|
I think some of the commits could be squashed without losing too much. Like little doc fixes. |
Damn. The patch separation was just to be super careful that we weren't inadvertently breaking/removing things and missing it in review. Lets merge it as is and follow up. I can keep this in mind, as well as the review burden of rebaseing, for any future PRs that include a bunch of patches. |
|
Can this one go in @apoelstra? |
|
Lol, took 758 minutes to test this one. Definitely some intermittent bug in my local CI. But yep, lemme just quickly re-review the code, post an ACK, and I'll merge it. |
Ouch |
In rust-bitcoin#3010 we added a `length` field to the `sha256::Midstate` which broke the `schemars` test but we did not notice because of the current bug [0] in the `run_task` CI script. [0] rust-bitcoin/rust-bitcoin-maintainer-tools#10
In rust-bitcoin#3010 we added a `length` field to the `sha256::Midstate` which broke the `schemars` test but we did not notice because of the current bug [0] in the `run_task` CI script. [0] rust-bitcoin/rust-bitcoin-maintainer-tools#10
…it derives Copy be94c9e Rename Midstate::into_parts to Midstate::to_parts since it derives Copy (Shing Him Ng) Pull request description: This is a follow up to #3010 Fixes #3079 ACKs for top commit: tcharding: ACK be94c9e apoelstra: ACK be94c9e successfully ran local tests Tree-SHA512: 0d838a2064136d050d319116f6df3d598323b04a137e7bb7cb5f3f1a87d72ad1ee4d2f3b228a2f9d68e7ca117c0f922ef6551f783eb39c8db0db1188e4732c41
"overhaul the midstate API" might be a better description. PR is broken up into many patches to the extreme to make sure all API the changes are easily noticed.
Resolve: #2918