Skip to content

primitives: Implement decoding traits #4998

Merged
apoelstra merged 2 commits intorust-bitcoin:masterfrom
tcharding:push-nprqntyknkup
Oct 13, 2025
Merged

primitives: Implement decoding traits #4998
apoelstra merged 2 commits intorust-bitcoin:masterfrom
tcharding:push-nprqntyknkup

Conversation

@tcharding
Copy link
Copy Markdown
Member

@tcharding tcharding commented Sep 17, 2025

Things that need careful attention please:

  • The state machine in TransactionDecoder
  • The WitnessDecoder::empty kludge

@tcharding
Copy link
Copy Markdown
Member Author

cc @nyonson, here is a quick and dirty go at starting on primitives. You can see from this more or less how I envisage the errors stuff will look. This might help you if you want to keep thinking about removing Either. I don't mean to take over from you with the decoding stuff so if you push anything I'll give you preference. I'm just pumped to get this done

@tcharding tcharding force-pushed the push-nprqntyknkup branch 2 times, most recently from f5cc627 to b2a6a6b Compare September 22, 2025 04:16
@github-actions github-actions bot added the C-p2p label Sep 22, 2025
@tcharding tcharding changed the title primitives: Start implementing decoding traits WIP: primitives: Start implementing decoding traits Sep 22, 2025
@nyonson
Copy link
Copy Markdown
Contributor

nyonson commented Sep 22, 2025

I am a little confused by the commit message in 3adce0b for the error type wrappers. You said "If two encoders return the same error it is not currently possible to compose them with our decoders", but I don't think that is true? Couldn't a composed decoder define a new type which just maps UnexpectedEofError in this case?

@tcharding
Copy link
Copy Markdown
Member Author

I was trying to describe the same problem we have in the composite error in your PR, if ADecoder and BDecoder both return FooError then one cannot do Decoder2<ADecoder, BDecoder>.

@nyonson
Copy link
Copy Markdown
Contributor

nyonson commented Sep 22, 2025

Decoder2<ADecoder, BDecoder>

Can't you just do Decoder2<ADecoder, BDecoder, FooError> like this test kinda does:

let mut decoder2: Decoder2<_, _, UnexpectedEof> =

I think this is enabled by a identity From blanket impl.

@tcharding
Copy link
Copy Markdown
Member Author

I'm not sure what we disagree about. If two decoders return the same error then there is currently no way when using the Decoder2 to tell which one errored. That's all I meant to say.

@tcharding tcharding force-pushed the push-nprqntyknkup branch 2 times, most recently from 0fcd759 to 838adc0 Compare September 23, 2025 03:55
@nyonson
Copy link
Copy Markdown
Contributor

nyonson commented Sep 23, 2025

Ah, my bad man, misinterpreted. We are on the same page.

@tcharding tcharding force-pushed the push-nprqntyknkup branch 2 times, most recently from f01fb47 to a2bfedf Compare September 23, 2025 04:45
@tcharding tcharding changed the title WIP: primitives: Start implementing decoding traits primitives: Implement decoding traits Sep 23, 2025
/// # Errors
///
/// Errors if `slice` is empty or encoding is invalid.
pub fn decode_compact_size(slice: &mut &[u8]) -> Result<u64, CompactSizeDecodeError> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kix wrote a pretty slick compact size decoder back in the day here: https://github.com/rust-bitcoin/rust-bitcoin/pull/2184/files#diff-97b1ea49ecd9b6b576f8f6dbd640e6d317da31348297d14cfbbdf47b909ae34bR12

Even though this is only used internally by a vec decoder, I think it makes sense to make it a full fledged decoder to handle weird cases when only a handful of bytes are read for some reason.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, my version does not correctly handle passing in a byte slice that is only part of the compact size. Nice reviewing Nick!

///
/// The encoding is expected to start with the number of encoded bytes (length prefix).
#[cfg(feature = "alloc")]
pub struct VecDecoder {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this just be VecBytesDecoder to differentiate from VecDecoder?

Copy link
Copy Markdown
Member Author

@tcharding tcharding Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are the names I briefly thought of also so that is a reasonable metric for their merit. @apoelstra want to opine on these names?

  • VecBytesDecoder
  • VecDecoder<T> where T: Decodable

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps ByteVecDecoder is better since an AmountDecoder decodes an Amount and the ByteVecDecoder decodes a byte vector?

Copy link
Copy Markdown
Member Author

@tcharding tcharding Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or does it decode a vector of bytes? - drats.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want me to make a decision I like ByteVecDecoder best. But I don't really care and don't think it matters. Almost nobody will ever name this type.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets call it two votes for ByteVecDecoder and one for VecBytesDecoder. ByteVecDecoder in is. Thanks

@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Sep 25, 2025

The test failure (name decode_segwit_transaction in primitives::transaction) is interesting because its the same test copied from blockdata::transaction. I.e., with the original consenus_deserialize we happily decode compact ints bigger than 4,000,000 but in the new code introduced in this PR we don't.

Copy link
Copy Markdown
Contributor

@nyonson nyonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 15b4f4c

Some minor comments, but I was able to wrap my head around the transaction state machine. I was hoping to think up some alternative which handled the Inputs/SegwitFlag dance and then passed it off to some simpler Decoder4/6s. But handling the "backtracking" kinda lead me back to where you landed.

match &self.state {
State::Version(decoder) => decoder.min_bytes_needed(),
State::Inputs(_, _, decoder) => decoder.min_bytes_needed(),
State::SegwitFlag(_) => 2,
Copy link
Copy Markdown
Contributor

@nyonson nyonson Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 15b4f4c, given how the state machine flows, I believe this should only be 1 since the marker has already been consumed and we are only analyzing the flag in this state.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOOM! Good reviewing bro, you are a useful bloke to have on the team, for real.


/// The state of the transiting decoder.
#[cfg(feature = "alloc")]
pub enum TransactionDecoderState {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 15b4f4c, do we have to expose TransactionDecoderState/Attempt/IsSegwit? Would be nice to give us the flexibility to tweak them in the future if we can hide them.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh BOOM! I thought I tried that before, that's nice!

/// Boolean used to track whether or not this transaction uses segwit encoding.
#[cfg(feature = "alloc")]
#[derive(Debug, Copy, Clone, PartialEq, Eq)]
pub enum IsSegwit {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 15b4f4c, another minor clarity idea for the state machine is to fold Attempt into IsSegwit by adding some sort of initial Unknown variant. Might make the Inputs/SegwitFlag dance a little clearer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with all these types private, your idea (which I'm not exactly understanding right now) becomes an internal change, right? So I can punt it down the field?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, can punt.

And said another way, instead of having the Attempt enum for the Inputs state, just use the IsSegwit state.

enum IsSegwit {
    /// Yes so uses segwit encoding.
    Yes,
    /// No segwit flag, marker, or witnesses.
    No,
    /// Initial state, inputs have not been decoded yet.
    Unknown,
}

This isn't a massive change, just maps well in my mind to how there are really 3 Segwit states when decoding.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that looks good.

@tcharding
Copy link
Copy Markdown
Member Author

Changes in force push:

  • Made auxiliary types private
  • Moved auxiliary types below where they are used
  • Rebase no master
  • Fixed bug in min_bytes_needed - props @nyonson

Copy link
Copy Markdown
Contributor

@nyonson nyonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 24e7b35

apoelstra added a commit that referenced this pull request Oct 8, 2025
6f78cf7 Use correct name in rustdocs (Tobin C. Harding)

Pull request description:

  Cut'n'pasta error slipped in, notice during review of #4998.


ACKs for top commit:
  apoelstra:
    ACK 6f78cf7; successfully ran local tests


Tree-SHA512: a5b3a5d3f7a724fff51973c73cb89b6f42095950ab076422c524d00029d1b0ef4c9fe71fb4f7c0aaf4abfaac963a296a3cb01a1df25573ba857f8e25fe38269d
apoelstra added a commit that referenced this pull request Oct 10, 2025
a4be941 Feature gate unused const (Jamil Lambert, PhD)
5e2eeb8 Fix clippy error: struct is never constructed (Jamil Lambert, PhD)
fdce327 Fix explicit lifetime clippy lints (Jamil Lambert, PhD)
f38221c Fix lint error: passing a unit value to a function (Jamil Lambert, PhD)

Pull request description:

  In preparation for adding crate specific clippy runs in CI with no-default-features, fix all the current lint errors when running it locally in each crate.
  
  There is one clippy error locally in primitives from an unused import (only used with alloc), but it is fixed in #4998.


ACKs for top commit:
  apoelstra:
    ACK a4be941; successfully ran local tests
  tcharding:
    ACK a4be941


Tree-SHA512: e5f601213782e7ccceecfbdd022020f830dc321d6e31d522a1b93cf56c15735bccb72f3987d8fc6f633dba7ec8b88369aa65429992d97a3c15883c4622c5fd90
@apoelstra
Copy link
Copy Markdown
Member

Will need rebase because #5107 renamed min_bytes_needed.

@tcharding tcharding force-pushed the push-nprqntyknkup branch 2 times, most recently from 3400588 to 0e97331 Compare October 10, 2025 17:15
@apoelstra
Copy link
Copy Markdown
Member

apoelstra commented Oct 11, 2025

Needs rebase again since #5108 (which rearranged some imports).

Stupid that we still need to manually resolve these kinds of conflicts..

@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Oct 11, 2025

Rebased and added a trivial cleanup patch up front even though its unrelated, it showed up when I did a search-and-replace while rebasing this PR.` Done properly in #5119

This terminology is not that widely used, lets use `scriptPubkey`
instead as is more common.
@apoelstra
Copy link
Copy Markdown
Member

Followups:

  • See if we can't improve the transaction state machine to be less panicky and indirect and full of "copy this to get mutable access" patterns (I spent a bit of time on this but don't have a complete solution)
  • See if we can avoid the From impls on the errors (extra API surface and prevents us from doing DecoderN where multiple of the sub-decoders have the same error type) (annoyingly we need to resolve this pre-1.0 but I think we can at least cut a rc with it)

Also lmao Kix would hate the OutpointDecoder which currently reads into a 36-byte array then copies the bytes out. We should be decoding directly into the right places, and using uninitialized memory beforehand, and so on. But whatever, our API is closed enough that we can fix this later.

@apoelstra
Copy link
Copy Markdown
Member

The impl of Decoder::end for Transaction is also wrong, but I'm willing to defer this to a followup. The docs say that this returns an error if you end too early but the implementation panics instead.

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 39bf86b; successfully ran local tests

@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Oct 12, 2025

Nice! thanks. I split up your comments into a bunch of bite sized issues.

@apoelstra
Copy link
Copy Markdown
Member

@nyonson in the interest of moving forward I'm gonna merge this, and any future review you do we can address in followups.

@apoelstra apoelstra merged commit 99544ff into rust-bitcoin:master Oct 13, 2025
25 checks passed
apoelstra added a commit that referenced this pull request Oct 14, 2025
27fbdf9 primitives: add unit test for incomplete transaction decoding (Andrew Poelstra)
93db1e0 primitives: fix TransactionDecoder::end to not panic on early calls to end (Andrew Poelstra)
e86a56c primitives: renamed TransactionDecoderError::Transitioning to Errored (Andrew Poelstra)
681af66 primitives: remove a bunch of panics from Transaction::decoder (Andrew Poelstra)

Pull request description:

  Refactors the `TransactionDecoder` state machine in a couple ways, eliminating some unreachable panic paths and also eliminating most of the reachable panic paths in `end()` (which should have been error returns).
  
  Fixes two of the four followup issues to #4998 -- the other two are optimizing `OutpointDecoder` which is non-essential/low-priority, and eliminating the `From` impls required by the composite `Decoder` types (which is a somewhat-hard API problem and I don't think we want to hold up the rc on it).
  
  Fixes #5125
  Fixes #5122


ACKs for top commit:
  nyonson:
    ACK 27fbdf9
  tcharding:
    ACK 27fbdf9


Tree-SHA512: 1fa7556cc8f47f578ff1293a09c76b07ae2a2de53a76eb0b52b0846ca73b203878791fd8bad350590aa9197ec173c007857059f2991515c9224d35686094f1b3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants