Skip to content

Split Hash trait#2496

Closed
tcharding wants to merge 1 commit intorust-bitcoin:masterfrom
tcharding:02-22-split-hash-traits
Closed

Split Hash trait#2496
tcharding wants to merge 1 commit intorust-bitcoin:masterfrom
tcharding:02-22-split-hash-traits

Conversation

@tcharding
Copy link
Copy Markdown
Member

@tcharding tcharding commented Feb 22, 2024

This is Kix's work from #1555, picking it up to help out. I've just rebased it and bumped the version of hashes to 0.14.0.

The Hash trait requires for the hash type to support hashing arbitrary data. This is not great, since hash newtypes usually hash specific data. Hashing arbitrary data into newtypes is usually a bug or misunderstanding of the API.

This change separates methods performing hashing of arbitrary data to a separate RawHash trait and provides a few helpers to make usage easier:

  • Add a private engine() method to the hash newtype
  • Add a finalize() method to HashEngine
  • Add a hashes::hash function

Moves TxMerkleNode hash to the merkle_tree module because that is where implement the custom hashing and need access to its private inner hash.

Closes #1481

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate test labels Feb 22, 2024
@tcharding
Copy link
Copy Markdown
Member Author

I know how you like rebasing @Kixunil so I picked this one up for you. Feel free to holla if you don't want me doing it.

@coveralls
Copy link
Copy Markdown

coveralls commented Feb 22, 2024

Pull Request Test Coverage Report for Build 7999898364

Details

  • -7 of 161 (95.65%) changed or added relevant lines in 32 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 84.041%

Changes Missing Coverage Covered Lines Changed/Added Lines %
hashes/src/lib.rs 12 15 80.0%
bitcoin/src/crypto/sighash.rs 24 28 85.71%
Totals Coverage Status
Change from base Build 7990823849: 0.05%
Covered Lines: 19427
Relevant Lines: 23116

💛 - Coveralls

@tcharding tcharding force-pushed the 02-22-split-hash-traits branch from 3e49d15 to 11378d1 Compare February 22, 2024 05:03
This is Kix's work from rust-bitcoin#1555, picking it up to help out. I've just
rebased it.

The `Hash` trait requires for the hash type to support hashing arbitrary
data. This is not great, since hash newtypes usually hash specific data.
Hashing arbitrary data into newtypes is usually a bug or
misunderstanding of the API.

This change separates methods performing hashing of arbitrary data to a separate RawHash trait and provides a few helpers to make usage easier:

- Add a private `engine()` method to the hash newtype
- Add a `finalize()` method to `HashEngine`
- Add a `hashes::hash` function

Moves `TxMerkleNode` hash to the `merkle_tree` module because that is
where implement the custom hashing and need access to its private inner
hash.

Closes rust-bitcoin#1481

Co-developed-by: Martin Habovstiak <martin.habovstiak@gmail.com>
@tcharding tcharding force-pushed the 02-22-split-hash-traits branch from 11378d1 to bc5bb0a Compare February 22, 2024 05:26
@apoelstra
Copy link
Copy Markdown
Member

After thinking about this for a bit I would suggest

  • We drop the RawHash and Hash traits entirely.
  • Then HashEngine has new/input/finalize methods. Maaybe also a convenience function for inputting a whole u8 iterator and a io::Read (though I'm unsure if it's kosher to feature-gate a trait method)
  • ...and an associated type for the "hash" (we can call this Digest if people want). I guess this should be bounded by Index<usize> and various ranges so people can access the raw bytes.
  • should we also have a From/Into<[u8; Self::DIGEST_LENGTH]> bound? Unsure. Maaybe these conversion methods are reason to retain the Hash trait? My feeling is that these conversions are not that useful generically. They're mainly useful for coercing a concrete hash type into a different concrete hash type, or secp256k1::Message, or something.
  • ...and an associated type for the midstate as well as a constructor that takes a midstate+length
  • ...and associated constants for the digest length and the block size (which is sometimes useful to know for perf reasons)

Basically our "generic hash" trait is replaced entirely with a "generic hash engine" trait, which will be sufficient to implement HMAC, which I think is the prototypical algorithm that requires a generic hash.

Then the concrete hash type would have intrinsic methods that do

  • Conversion to/from [u8; length]
  • Construction of a hash from raw data which to be hashed
  • Construction of a hash from an iterator of u8s
  • Construction of a hash from an io::Read, when std is on

And finally, our main lib.rs should have convenience methods which wrap the above constructors. This will make our API a little noisy but will help people who show up thinking "how the hell do I just sha2 hash some bytes?".

@tcharding
Copy link
Copy Markdown
Member Author

I'm good to try to implement all that, doing so will make this definitely not a "just rebase Kixs' work" PR so I'll wait for @Kixunil to respond first, also for any additional ideas.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 23, 2024

  • We drop the RawHash and Hash traits entirely.

I was considering this, however:

  • Merkle tree computation needs both
  • secp256k1 needs ThirtyTwoByteHash which is actually a redundant trait and I was going to suggest we remove it in favor of Hash<Bytes=[u8; 32]> + KnownPreimage

However we could move RawHash to merkle tree module in bitcoin and keep ThirtyTwoByteHash instead. Or just keep a trimmed version of Hash but have the same methods as inherent so nobody needs to import them.

though I'm unsure if it's kosher to feature-gate a trait method

Only if it has a default implementation.

  • bounded by Index<usize>

I think Index is just confusing without going through as_byte_array.

...and an associated type

IIRC it's unsupported outside traits

But generally I agree with let's make hash types trait-free and only have traits for absolutely required things.

@apoelstra
Copy link
Copy Markdown
Member

Merkle tree computation needs both

I think it could be generic over HashEngine and expect its intermediate hashes to have type E::Digest where E is the generic engine.

I think Index is just confusing without going through as_byte_array.

Agreed, though if as_byte_array is the only reason to have a RawHash trait that kinda sucks.

IIRC it's unsupported outside traits

All of my suggested associated types are on the HashEngine trait.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 24, 2024

though if as_byte_array is the only reason to have a RawHash trait that kinda sucks.

I don't understand this.

There's one more issue though: currently HashEngine can be pre-initialized with tag so one cannot use tagged hashes. But since I proposed to tag engines too this will be then resolved.

@apoelstra
Copy link
Copy Markdown
Member

I don't understand this.

I don't understand what the RawHash trait gets us, except the ability to convert a generic raw hash to bytes and back. Which seems like, if we want it, can be covered by From/Into (which is not a great API for the reasons you've stated many times) or maybe just AsRef<[u8]>.

If we say "From/Into is too unclear, we want an explicit to_byte_array/from_byte_array pair" and our solution is to introduce a RawHash trait with those methods, I feel like that's too much extra API surface for only a small benefit. Because honestly From<[u8; SIZE]> and fn from_byte_array read nearly the same.

There's one more issue though: currently HashEngine can be pre-initialized with tag so one cannot use tagged hashes. But since I proposed to tag engines too this will be then resolved.

Ah, you're right. If HashEngine is the "primary trait" then we need the ability to tag engines. I'm ok with this, though conceptually it seems a little bit weird (a "tagged sha256 engine" is really just a sha256 engine initialized with specific values).

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 25, 2024

I don't understand what the RawHash trait gets us, except the ability to convert a generic raw hash to bytes and back.

You seem to be confusing RawHash with Hash on my PR. RawHash has Engine type etc.

I feel like that's too much extra API surface for only a small benefit. Because honestly From<[u8; SIZE]> and fn from_byte_array read nearly the same.

I think we should reserve From for more obvious conversions (like Hash -> Message) but really, IIRC we don't need From<[u8; LEN]> for Hash only the other way around (which I never objected to). from_byte_array works better as an inherent method because it can be const.

@apoelstra
Copy link
Copy Markdown
Member

You seem to be confusing RawHash with Hash on my PR. RawHash has Engine type etc.

Yes, but in my proposal we invert this and then the HashEngine type has a RawHash type etc. Because by doing so, we are left with only one trait which (roughly) describes the hash algorithm, not the raw hash (which is just a newtyped bag of bytes and doesn't really feel like it needs a trait).

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 25, 2024

OK, that seems to make sense.

Thinking about it we probably only need:

pub trait HashEngine: Default {
    type Digest: AsRef<[u8]>;

    fn update(&mut self, bytes: &[u8]);
    fn finalize(self) -> Self::Digest;

    fn hash(bytes: &[u8]) -> Self::Digest {
        let mut engine = Self::default();
        engine.update(bytes);
        engine.finalize()
    }
}

// A general version of ThirtytwoByteHash
pub trait KnownPreimageHash: AsRef<Self::Bytes> {
    type Bytes: AsRef<[u8]>;
}

This should cover merkle trees, HMACs and Into<Message>.

@apoelstra
Copy link
Copy Markdown
Member

@Kixunil we also need the ability to extract midstates in order to do "compact merkle trees" in rust-elements, and to implement precomputed hash engines (such as tagged hashes, though because those will be in the library, we don't necessarily need to enable a nice API).

So I think we want a second Midstate associated type.

Alternately, we could just add a method fn midstate(&self) -> (usize, [u8; LEN]) and fn from_midstate(usize, [u8; LEN]) -> Self constructor and make the user do their own newtyping, but I think we ought to do it.

(Note that our current implementation treats only the bytes of the engine as a "midstate". But you've complained, I think correctly, that the midstate ought to also include the length. If the user thinks this doesn't matter for some reason they can ignore the length when extracting it and provide 0 when constructing.)

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Feb 26, 2024

I don't understand, Midstate is currently only available for SHA256 engine which doesn't need generics. Isn't it enough?

@apoelstra
Copy link
Copy Markdown
Member

I don't understand, Midstate is currently only available for SHA256 engine which doesn't need generics. Isn't it enough?

Hmm, yeah, I guess so. There are things I could imagine doing with midstates that I'd like to do generically (e.g. "fast merkle trees") but actually none of them are safe to do with generic hashes. If you are messing with midstates you should know exactly what hash you're using and therefore be using concrete types.

@tcharding
Copy link
Copy Markdown
Member Author

This is no longer just "pick up Kix's work and rebase" - leaving until after release of 0.32.0 (either to be done by myself or by @Kixunil).

@tcharding
Copy link
Copy Markdown
Member Author

This work is shelved in favour of #2770

@tcharding tcharding closed this May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Split Hash trait in two

4 participants