Skip to content

hashes: split Hash into GeneralHash and Hash#2910

Merged
apoelstra merged 8 commits intorust-bitcoin:masterfrom
apoelstra:2024-06--hashes-overhaul
Jun 25, 2024
Merged

hashes: split Hash into GeneralHash and Hash#2910
apoelstra merged 8 commits intorust-bitcoin:masterfrom
apoelstra:2024-06--hashes-overhaul

Conversation

@apoelstra
Copy link
Copy Markdown
Member

@apoelstra apoelstra commented Jun 24, 2024

I'm not thrilled with these names. Personally I would prefer having ByteArrayWrapper (for non-general hashes) and Hash (for general hashes). But using Hash and GeneralHash greatly minimizes the diff because most of our use of the Hash trait was only for non-general stuff.

Maybe that tradeoff will change as we move stuff to inherents? I hope to do that in the next PR (or maybe the one after that, since I still have some "drop GeneralHash work to do for tagged hashes). And after that the hashing API should be "clean" enough that we can figure out HashEngine, possibly even folding GeneralHash into it. But that's the part of #2770 that we didn't finish nailing down so I'm not sure.

But other than naming, I like this PR. I think, if you approve of this PR except the naming, it would be best to ACK it and then we can do a followup rename-only PR, rather than dealing with the review pain of mass-renaming in rebases.

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate test labels Jun 24, 2024
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 24, 2024

Pull Request Test Coverage Report for Build 9646113412

Details

  • 88 of 107 (82.24%) changed or added relevant lines in 17 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.009%) to 83.152%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bitcoin/src/blockdata/script/mod.rs 2 8 25.0%
hashes/src/sha256t.rs 19 32 59.38%
Files with Coverage Reduction New Missed Lines %
hashes/src/util.rs 1 87.32%
Totals Coverage Status
Change from base Build 9639464882: 0.009%
Covered Lines: 19564
Relevant Lines: 23528

💛 - Coveralls

@apoelstra apoelstra force-pushed the 2024-06--hashes-overhaul branch from 51ae63a to aed525d Compare June 24, 2024 13:36
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 24, 2024

Pull Request Test Coverage Report for Build 9646393134

Details

  • 88 of 107 (82.24%) changed or added relevant lines in 17 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.009%) to 83.152%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bitcoin/src/blockdata/script/mod.rs 2 8 25.0%
hashes/src/sha256t.rs 19 32 59.38%
Files with Coverage Reduction New Missed Lines %
hashes/src/util.rs 1 87.32%
Totals Coverage Status
Change from base Build 9639464882: 0.009%
Covered Lines: 19564
Relevant Lines: 23528

💛 - Coveralls

This commit illustrates the transformation I intend to make everywhere
we use newtyped hashes as "general hashes". *Within the module that the
newtype is defined* I encapsulate engine calls, which I do by calling
engine methods on the underlying general hash function. So within the
module there is a slight reduction in type safety, in the sense that I
need to make sure that I'm wrapping stuff properly.

But outside of the module, there will be no difference except that I
will no longer export engine/from_engine/hash/etc on newtyped hashes.
Instead callers will need to compute the newtyped hash only in ways
supported by the API.

In theory we could have a macro to produce engine/from_engine/etc for
newtypes that want to act as general hashes. But AFAICT there is no use
case for this.

Alternately, we could have a macro that produces *private* Engine types
and private engine/from_engine/etc methods for the hashes, which could
be used within the module and would provide stronger type safety within
the module. But in practice, raw hashing is usually only used within a
couple of methods, so all this infrastructure is way overkill and will
just make maintenance harder for everybody.
In the next commits we are going to stop exposing the ability to hash
arbitrary data into wrapped hash types like Txid etc. In preparation for
this, stop using these methods internally.

This makes our internal code a little bit uglier and less DRY. An
alternative approach would be to implement the from_engine and engine
methods, but privately (and maybe having a macro to provide this). But I
think this approach is more straightforward.

The one exception is for the Taproot hashes, which are tagged hashes and
currently do not have their own engine type. I will address these in a
later PR because this one is already too big.
This is a continuation of the previous commit, but separated to make
review a little easier. This one replaces test vectors that were
previously computed by hashing garbage into Txids and various other hash
types with new test vectors which are directly-computed garbage
converted to hashes with from_byte_array.

In one case (src/hash_types.rs) this results in changing a bunch of
fixed test vectors. This is okay; this test is supposed to check the
direction of string serialization, which is unaffected by this commit
(or any commit in this PR). The existing test vectors, because they hash
the empty string, result in different bytes depending on the underlying
hash algo (sha256, sha256d, sha256t, etc). The new ones just use the
same fixed test vector for all of them.

This commit also updates a doctest in crypto/sighash.rs which
demonstrates how to manually feed sighash data into a hash engine and
correctly handle the sighash single bug. Because you can no longer
directly get a sighash object from an engine, this particular example
should maybe be rewritten to just encode to a Vec rather than a hash
engine, explaining that maybe you'd do this when implementing a HWW, to
verify the exact data being hashed. Or something.

Unrelatedly, you can check that there are no API changes in this commit
or the last several. The next commit will remove GeneralHash impls and
that's when you'll see changes.
…in hash_newtype

We manually implement these methods (and the GeneralHash trait) on newtypes
around sha256t::Hash, because tagged hashes require a bit more work. In
the next commit (API diff) you will see that this affects two hashes,
which are the only things that appear green in the diff.

Users who want to implement their own engine/from_engine types now need
to do it on their own. We do this for the non-Taproot sighash types in
`bitcoin` (though only privately) to demonstrate that it's possible.
There are a few green lines for the Tap*Hash types, which are tagged
hashes, and whose engine/from_engine impls are replaced. The rest is
red, for the other hashtypes where these methods are just dropped.

A later PR will also drop the methods for the Tap*Hash types but that
seemed different enough from the rest to warrant its own PR.
@apoelstra apoelstra force-pushed the 2024-06--hashes-overhaul branch from aed525d to 8bd5c64 Compare June 24, 2024 14:02
@coveralls
Copy link
Copy Markdown

coveralls commented Jun 24, 2024

Pull Request Test Coverage Report for Build 9646767973

Details

  • 88 of 107 (82.24%) changed or added relevant lines in 17 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.009%) to 83.152%

Changes Missing Coverage Covered Lines Changed/Added Lines %
bitcoin/src/blockdata/script/mod.rs 2 8 25.0%
hashes/src/sha256t.rs 19 32 59.38%
Files with Coverage Reduction New Missed Lines %
hashes/src/util.rs 1 87.32%
Totals Coverage Status
Change from base Build 9639464882: 0.009%
Covered Lines: 19564
Relevant Lines: 23528

💛 - Coveralls

@apoelstra
Copy link
Copy Markdown
Member Author

Ok, ready to review. I had to do the API update, and in doing so, I noticed that there was some unfinished work with the Taproot hashes (which I decided to defer to a later PR, but I did update the commit messages in this one accordingly).

Copy link
Copy Markdown
Member

@tcharding tcharding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8bd5c64

@tcharding
Copy link
Copy Markdown
Member

I agreed with your view that eventually Hash should be called ByteArrayWrapper. It doesn't need discussing but I still think, and have thought for a long time now, that we should not need this trait at all. I'm happy to let the work progress and see what happens. FTR this belief is why I have not done this work myself even though you and Kix repeatedly suggested it - I'm totally ok to be proven wrong, at this stage, to me, it is purely an interesting API design question.

Thanks for picking this up, my sanity is much better preserved from reviewing this instead of coding it :)

@apoelstra
Copy link
Copy Markdown
Member Author

Let's split the traits, add inherent methods, and get sha256t working properly, and then explore dropping Hash. I agree philosophically with you but we really struggled with this in #2770 practically speaking.

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Jun 25, 2024

Yeah, I don't like the name much but I can agree with followup rename.

Maybe that tradeoff will change as we move stuff to inherents?

Not sure if entirely relevant but in primitives PR I've noted that we should be able to depend on hashes privately. The need for traits should be reduced to HMAC/HKDF and merkle trees (which use their own).

that we should not need this trait at all

In a perfect world, I agree. If From didn't imply Into, I'd just say that we should use From<[u8; N]> + AsRef<[u8; N]> bounds and be done with it. But some_random_array.into() without acknowledging the type name undermines our efforts to prevent type mixups.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 8bd5c64

+ convert::AsRef<[u8]>
{
/// The byte array that represents the hash internally.
type Bytes: hex::FromHex + Copy;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but I've just realized we can have a sealed IsByteArray trait that's implemented for [u8; N]. Maybe we can even enforce that the lengths match (even if we can't use LEN directly).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, yeah, though I'm not sure it's necessary. And possibly some future implementor will want to use ArrayVec or something for some reason.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a lint and possibly preparation for the future where we just force it so people don't have to worry about it at all. I doubt usefulness of anything other than array. There's a fixed LEN constant for a reason. We could say "hash is whatever byte slice" but that is silly.

fn from_byte_array(bytes: Self::Bytes) -> Self { Self::from_byte_array(bytes) }
}

// To be dropped in the next commit
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice diff-minimizing idea.


fn check_result(engine: sha256::HashEngine) {
let hash = TestType::from_engine(engine);
let hash = TestType(sha256::Hash::from_engine(engine));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is how I wanted it for the exact reason you stated in the commit message. However I also had .finalize method taking advantage of inference so you could just write TestType(engine.finalize()). I don't insist on this but it'd be nice to have eventually.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, sha256::Engine::finalize could return an arbitrary From<sha256::Hash>?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually the comment about inference isn't even relevant. Engine implies specific hash type.

header_data[0..32].copy_from_slice(&self[..]);
header_data[32..64].copy_from_slice(&previous_filter_header[..]);
FilterHeader::hash(&header_data)
FilterHeader(sha256d::Hash::hash(&header_data))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated but it'd be better to just feed the data into engine rather than copying arrays around.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promoted to issue

}
Self::from_engine(engine)
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could just make these (optionally) private in a subsequent PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants