Skip to content

Move Witness to primitives#3406

Merged
apoelstra merged 2 commits intorust-bitcoin:masterfrom
tcharding:09-23-mv-witness-to-primitives
Oct 18, 2024
Merged

Move Witness to primitives#3406
apoelstra merged 2 commits intorust-bitcoin:masterfrom
tcharding:09-23-mv-witness-to-primitives

Conversation

@tcharding
Copy link
Copy Markdown
Member

@tcharding tcharding commented Sep 24, 2024

Patch 1 introduces a new policy to the codebase, we use foo__unstable for public unstable functions and there are zero semver guarantees if you call these functions.

Patch 2 does the move.

Close #3406

@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from a0f05ea to 7f8cf85 Compare September 24, 2024 04:18
@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-internals PRs modifying the internals crate C-primitives labels Sep 24, 2024
@tcharding
Copy link
Copy Markdown
Member Author

Note the unconditional dependency on hex added to internals - I believe we discussed this before.

@tcharding
Copy link
Copy Markdown
Member Author

I'm EOD will debug CI tomorrow.

@apoelstra
Copy link
Copy Markdown
Member

I think we can pull out several uncontroversial commits from this:

  • The first three are unambiguously good
  • Then bc97c2253bc2215070b6db2faa18379f1c5dd6db is pretty suspicious; it adds a bunch of extra allocations and crap that suggest that our API is lacking
  • Then the next two are unambiguous improvements
  • Then ac6fe3a8815127e8c4df5fbe3f8a7241679e7c48 seems to be mostly harmless but probably a net negative for readability
  • The next two are fine
  • And the final one does the actual move, which deserves its own PR (or at least a less crowded PR)

@tcharding
Copy link
Copy Markdown
Member Author

Thanks for the review man. I'll pull the non-controversial stuff out as suggested.

Then bc97c22 is pretty suspicious; it adds a bunch of extra allocations and crap that suggest that our API is lacking

Another idea is adding to the API a function that will go over to primitives that supports encoding. I thought that approach would lead to endless iterations but I can have a go at it.

Then ac6fe3a seems to be mostly harmless but probably a net negative for readability

Drats, that wasn't the aim.

@tcharding tcharding changed the title Moev Witness to primitives Move Witness to primitives Sep 24, 2024
This was referenced Sep 25, 2024
@tcharding
Copy link
Copy Markdown
Member Author

On ice until #3408 and #3409 go in.

@tcharding tcharding marked this pull request as draft September 25, 2024 01:58
apoelstra added a commit that referenced this pull request Sep 30, 2024
5fab6b1 Rename iter len unit test (Tobin C. Harding)
f6a74ef Refactor the serde Witness unit tests (Tobin C. Harding)
9860453 Improve Witness consensus encode unit test (Tobin C. Harding)
7e2899d Improve Witness::push unit test (Tobin C. Harding)
fe96727 Improve witness unit tests for single empty element (Tobin C. Harding)

Pull request description:

  In preparation for moving the `Witness` type over to `primitives` refactor and improve all the unit tests that will be moved, do not touch the ones that will stay behind.

  The first five patches are from #3406, the last is just a re-name of the test function I tried to refactor in ac6fe3a

ACKs for top commit:
  apoelstra:
    ACK 5fab6b1 successfully ran local tests

Tree-SHA512: bc00f81e3c5cc92ae58dd2fc876d368a487ae6c08cc0735d7227c3a89287e321dbfb5b571b951d0616af0ec7cf9a0ea2d0e724645b1c419933a212ece80a0fbf
@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from 7f8cf85 to 3906cb5 Compare October 1, 2024 03:58
@tcharding
Copy link
Copy Markdown
Member Author

I think github is confused, there should not be conflicts for this one because I just rebased it on top of #3432 and there are none for it.

apoelstra added a commit that referenced this pull request Oct 1, 2024
3f3f30d Use iter instead of accessing content field (Tobin C. Harding)
6389d1c Stop using push_slice (Tobin C. Harding)
be163ee Use Witness::len instead of accessing field (Tobin C. Harding)

Pull request description:

  Prepare `Witness` to be moved to `primitives`.

  This is the first three patches out of #3406. Patch 1 and 2 are internal changes, path 3 is also internal but introduces a slight perf hit by doing multiple writes.

ACKs for top commit:
  apoelstra:
    ACK 3f3f30d successfully ran local tests

Tree-SHA512: 25e2570a22797dbfa15d6e5af9b72938243192ca264bc5fe82c2f327486e52a73993b3b61ee1161e5d17b01a7f9843960cb4031e6550561de4ecafb9f935e375
@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from 3906cb5 to eaeef05 Compare October 1, 2024 22:30
@tcharding
Copy link
Copy Markdown
Member Author

Hey @apoelstra have you any suggestions for how to decouple the Decodable implementation from the Witness internal implementation details? For the Encodable side, if we wanted to, we could add this function without leaking the internal Witness representation too much [0]:

    /// Returns the number of witness elements and the serialization of the elements.
    ///
    /// This is an incomplete serialization because the number of witness elements must be
    /// serialized (in compact form) at the front of the elements in order be consensus valid.
    pub fn incomplete_serialization(&self) -> (usize, &[u8]) {
        (self.len(), self.content[..self.indices_start])
    }

But for Decodable I can't think of a constructor that is useable and does not leak the internal Witness representation.

[0] In 3f3f30d we removed the need for this but it could be added back in so that we don't iterate over the elements.

@apoelstra
Copy link
Copy Markdown
Member

I think we can use iter to implement Encode and from_iter/collect to implement Decode.

iter we already have (though the docs suck, we should improve them to say that we're yielding byte slices). from_iter we should add, and maybe even implement from_slice in terms of it.

@tcharding
Copy link
Copy Markdown
Member Author

Aight, I'll have a go. Thanks

@tcharding
Copy link
Copy Markdown
Member Author

I had a bit of a go, can I clarify that I've understood your suggestion? You mean inside Decodable for Witness implement an iterator that wraps the reader and pass this to a new Witness::from_iter constructor, right? If so its hard to do because we need to iterate multiple times (once to calculate total length of content, once to do the actual copies).

This led me to raise #3464 since I doubt this is going to be the only time that Encodable/Decodable is tightly coupled with a primitives type's abstraction.

@apoelstra
Copy link
Copy Markdown
Member

once to calculate total length of content, once to do the actual copies

Why do we need to know the total length upfront? Are we using variable-length size markers within our internal representation? If so we should stop doing that. (But I don't think we are. IIRC we are using a native-endian u32.)

@tcharding
Copy link
Copy Markdown
Member Author

We allocate the correct size array in Witness::from_slice, in Decodable we re-allocate the content array with exponential increase. We could copy the re-size stuff into Witness::from_iter but that would still mean we need to allocate a buffer in Decodable to read the data into. Without Decodable knowing about the Witness content field innards I don't see how to keep the current performance. Is adding additional allocations in order to crate smash acceptable? It seems like a step backwards.

@apoelstra
Copy link
Copy Markdown
Member

apoelstra commented Oct 13, 2024

Without Decodable knowing about the Witness content field innards I don't see how to keep the current performance.

When Iterator::size_hint returns a consistent result (one of the form (n, Some(n)) where n is the same in both cases and less than our maximum) we should pre-allocate that much.

In fact probably we should just take the minimum size hint, cap it at the maximum allocation size, and use that. Don't even bother looking at the max size hint.

This will cover all the common cases such as decoding from a vector of slices.

@tcharding
Copy link
Copy Markdown
Member Author

Its not just the number of elements though, its size of each element (+ size of varint encoding) and since we use AsRef<[u8]> one has to iterate to get at it.

@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Oct 14, 2024

I did have a go at adding from_iter and I don't think it works without adding additional allocations (I could be missing something though). So this PR now just adds the public unstable constructor. I can definitely revisit later if you are still keen on the from_iter idea.

@github-actions
Copy link
Copy Markdown

🚨 API BREAKING CHANGE DETECTED

To see the changes click details on "Check semver breaks / PR Semver - stable toolchain" job then expand "Run semver checker script" and scroll to the end of the section.

@apoelstra
Copy link
Copy Markdown
Member

and I don't think it works without adding additional allocations

I believe you're correct. from_iter takes an arbitrary IntoIterator and you cannot put any further constraints on it, e.g. to say that the input should be Clone or give you a double-ended iterator or anything like that. So you don't know any information up front other than a hint about the number of elements. (Also annoyingly, the witness script/taproot script appear at the end of the iterator, and this is the most unpredictable/longest element. If it were at the beginning, you could call iter.next() and then make allocation decisions after that.)

For now let's stick with the unstable constructor so that we are not messing with the API more than we need to. But before/after this PR I would like to revisit this:

  • As you say, consensus_decode uses an exponential allocation strategy while from_slice does not. Why the discrepancy?
  • Also consensus_decode enforces size limits while from_slice does not. Again, why the discrepancy?

Personally, I would suggest that we implement from_iter and reimplement both functions in terms of it, so that they're doing the same thing as far as allocations go. And then I would suggest allocating up front 520 + 64*iter.size_hint() (capped to MAX_VEC_SIZE) and doing resizing from there. But we should discuss that on a separate PR.

@tcharding
Copy link
Copy Markdown
Member Author

Yep lets do it, added #3469.

@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from 17ba251 to 312e5b7 Compare October 16, 2024 03:13
@tcharding
Copy link
Copy Markdown
Member Author

The diff is pretty hard to review even with colorMoved configured in git. Do you want me to split it up @apoelstra?

@github-actions
Copy link
Copy Markdown

🚨 API BREAKING CHANGE DETECTED

To see the changes click details on "Check semver breaks / PR Semver - stable toolchain" job then expand "Run semver checker script" and scroll to the end of the section.

@tcharding
Copy link
Copy Markdown
Member Author

During rebase I found that I had moved the deprecated to_vec to primitives but it should stay in bitcoin.

@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from 312e5b7 to 4260b06 Compare October 16, 2024 03:17
@github-actions
Copy link
Copy Markdown

🚨 API BREAKING CHANGE DETECTED

To see the changes click details on "Check semver breaks / PR Semver - stable toolchain" job then expand "Run semver checker script" and scroll to the end of the section.

@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from 4260b06 to d4145a5 Compare October 16, 2024 03:27
@github-actions
Copy link
Copy Markdown

🚨 API BREAKING CHANGE DETECTED

To see the changes click details on "Check semver breaks / PR Semver - stable toolchain" job then expand "Run semver checker script" and scroll to the end of the section.

@apoelstra
Copy link
Copy Markdown
Member

In 1c419370c6bc2dc54226dbf89f3b9d506b9102da:

Change , these to . These and , use to ; use in the README update.

Also I'd like to stick #[doc(hidden)] on the unstable methods.

@apoelstra
Copy link
Copy Markdown
Member

d4145a56736a4b0caed2c5ccc0f25e3187b94e26 looks good otherwise.

In preparation for moving the `Witness` to `primitives` we need a way to
construct the `Witness` when decoding. In order to maintain the current
performance and not introduce additional allocations we need to be able
to construct a `Witness` from the content buffer, this leaks the
implementation details of `Witness`.

Add a clearly marked unstable constructor to create a `Witness` from
parts. This introduces the concept of `foo__unstable` function names;
add a section to the README describing unstable functions and semver
guarantees.
Move the `Witness` over to `primitives` leaving behind any method that
takes or returns a `Script` or a signature.

Includes addition of a feature gate to unit test.
@tcharding tcharding force-pushed the 09-23-mv-witness-to-primitives branch from d4145a5 to c1eccfd Compare October 18, 2024 03:02
@tcharding
Copy link
Copy Markdown
Member Author

Changed as suggested, thanks.

@github-actions
Copy link
Copy Markdown

🚨 API BREAKING CHANGE DETECTED

To see the changes click details on "Check semver breaks / PR Semver - stable toolchain" job then expand "Run semver checker script" and scroll to the end of the section.

@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Oct 18, 2024

Its Friday afternoon, I do not have the inclination to look into WASM problems right now. We can merge this if it gets past your local CI @apoelstra can't we and ignore the WASM job until next week?

@apoelstra
Copy link
Copy Markdown
Member

Yeah, for sure. Probably we need to make the wasm test "allow fail" because the whole ecosystem is predicated on downloading random unpinnable bullshit from the Internet, and it's really just not compatible with this project's ethos.

I mean, we ought to fix it's straightforward enough or if somebody who needs it shows up to do it. But I wouldn't put too much effort into it.

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c1eccfd; successfully ran local tests

@apoelstra
Copy link
Copy Markdown
Member

Gonna one-ACK merge on the basis that this PR is 3 weeks old, and even though there have been substantial changes, nobody else has commented.

@apoelstra apoelstra merged commit 51d5037 into rust-bitcoin:master Oct 18, 2024
@tcharding tcharding deleted the 09-23-mv-witness-to-primitives branch October 21, 2024 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API break This PR requires a version bump for the next release C-bitcoin PRs modifying the bitcoin crate C-internals PRs modifying the internals crate C-primitives doc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants