Skip to content

Use index size rather than pointer size to enforce convertibility of u32 to usize#2972

Merged
apoelstra merged 3 commits intorust-bitcoin:masterfrom
tcharding:07-06-index-size
Aug 24, 2024
Merged

Use index size rather than pointer size to enforce convertibility of u32 to usize#2972
apoelstra merged 3 commits intorust-bitcoin:masterfrom
tcharding:07-06-index-size

Conversation

@tcharding
Copy link
Copy Markdown
Member

@tcharding tcharding commented Jul 6, 2024

3 patches in preparation for other size related work, this PR does not touch the ToU64 issue which will be handled separately.

  • Patch 1: Don't check pointer width in units because its not consensus code
  • Patch 2: Modify internal macro const_assert
  • Patch 3: Use index size to enforce not building on a 16 bit machine

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-units PRs modifying the units crate C-internals PRs modifying the internals crate labels Jul 6, 2024
@tcharding
Copy link
Copy Markdown
Member Author

cc @kayabaNerve, props for the idea.

@tcharding tcharding force-pushed the 07-06-index-size branch 2 times, most recently from 2705dd2 to 63578d9 Compare July 6, 2024 04:30
@tcharding
Copy link
Copy Markdown
Member Author

Face palm, I dug through the index to see who originally wrote the const_assert macro and it was me. I have a feeling someone else posted it and I just wrote it in code. @Kixunil any ideas please?

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no conditions far away from traits.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is double parentheses made it possible to use it in non-item context (can't remember the proper name), so you've just disabled one usage in favor of the other. I'm not saying it's wrong, it's just different and if we don't use the other one then I don't mind but we might one day end up with two macros.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again is far from the ToU64 trait which makes it hard to review and keep track of. The ToU64 for usize impl should be feature gated (so that units keep working) and the condition should be put there. Conversely we should have ToUsize trait doing the opposite. Finally the conditions should be mem::size_of::<usize> <= 8 for ToU64 and mem::size_of::<usize> >= 4 for ToUsize. I'm also tempted to do two feature gates in case some crate only needs to require one condition but I suspect that won't happen in practice since encoding mirrors decoding.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This again is far from the ToU64 trait which makes it hard to review and keep track of.

I don't quite understand this, we discussed that ToU64 is not the only reason we have this restriction, the other reason (the consensus stuff) is general to the library so therefore the check is crate wide and at the top of lib.rs - what am I misunderstanding?

Can use the conditions as suggested.

I'm also tempted to do two feature gates

This might actually be better because the ">= 4" is because of consensus and the "<= 8" is because of ToU64, right?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we need ToU64 in the first place is consensus stuff. We should rewrite all casts to use it. (Eventually, not really all at once.)

the ">= 4" is because of consensus and the "<= 8" is because of ToU64, right?

Consensus rules require both. Varint can't encode sizes above 264 and there are transactions in Bitcoin that have more than 216 items. The reason I'm tempted to split them is that it's cleaner and we may find that there is a place with just one of them somewhere.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to enforce that nothing has size exceeding 2^64 (which seems a little silly and unnecessary because of the impracticality of creating such objects, but ok) then we ought to do it by making sure that all of our data structures have APIs that enforce this invariant.

Personally I think we should just let it go. We can panic if we try to varint-encode a number that's too high. Or saturate, or encode nothing, or whatever. It won't ever happen.

Making the code literally impossible to compile on CHERI or something seems like the wrong approach.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making the code literally impossible to compile on CHERI or something seems like the wrong approach.

That's not really what's happening - it has 128-bit pointers, not sizes. We'll bounding by sizes.

Copy link
Copy Markdown

@kayabaNerve kayabaNerve Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an anecdote, my work uses u64::try_from().unwrap() as yes, that'd require an impossible amount of RAM to fail when a buffer length and a system whose usize exceeds u64. I don't mind enforcing usize <= 64-bit, and that should not break CHERI, but I think it's sufficiently edge case it doesn't need action in my own work.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I rescind my complaint and accept having the "usize must be 4 or 8 bytes" condition.

@tcharding
Copy link
Copy Markdown
Member Author

Making draft till I get back to this.

@tcharding tcharding marked this pull request as draft July 10, 2024 05:05
The `units` crate does not contain consensus logic and since our
requirement to only support 32-bit and 64-bit machines is due to
consensus logic we do not need to enforce the `target_pointer_width` in
the `units` crate.

Remove the compile time check on pointer width from the `units` crate.
The current `const_assert` macro is unused in the code base. We would
like to use it differently to how it was initially designed, remove the
parenthesis so it can be called directly in a module.
Currently we enforce that our code only runs on machines with a
certain pointer width (32 or 64 by failing to compile if pointer size
width is 16). One of the underlying reasons is because of requirements
in consensus code in Bitcoin Core which requires containers with more
than 2^16 (65536) items [0].

We can better express our requirements by asserting on Rust's index
size (the `usize` type).

As a side benefit, there is active work [1] to make Rust support
architectures where pointer width != idex size. With this patch applied
`rust-bitcoin` will function correctly even if that work progresses.

- [0] rust-bitcoin#2929 (comment)
- [1] rust-lang/rust#65473
@tcharding tcharding marked this pull request as ready for review August 4, 2024 19:37
@tcharding tcharding requested a review from Kixunil August 4, 2024 20:13
@tcharding
Copy link
Copy Markdown
Member Author

Bump, although non-urgent this is ready to go.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c427d8b

macro_rules! const_assert {
($x:expr) => {{
($x:expr) => {
const _: [(); 0 - !$x as usize] = [];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for adding stuff this late but I've just remembered that panic in const is stable in our MSRV so it would be very very very nice to have something like this

($x:expr $(; $message:expr)) => {
    if !$x {
        // We can't use formatting in const, only concating literals
        panic!(concat!("assertion ", stringify!($x), " failed" $(, $message)?))
    }
}

OK to do in a next PR though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Totally coincidentally I was going through easy issues to solve late in the day and picked up #2427 which is related to this, so I smashed them both out.

// does not equal index size.
//
// ref: https://github.com/rust-bitcoin/rust-bitcoin/pull/2929#discussion_r1661848565
internals::const_assert!(mem::size_of::<usize>() >= 4);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool for now, but I hope we will make ToUsize too eventually.

@Kixunil Kixunil changed the title Use index size to enforce comparability with Bitcoin Core Use index size rather than pointer size to enforce convertibility of u32 to usize Aug 22, 2024
@tcharding tcharding mentioned this pull request Aug 22, 2024
Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c427d8b successfully ran local tests; unsure if we want to merg this or wait for #3215

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK c427d8b though I think the last commit was kinda a waste of time and it should have been adding the trait instead or leave it for later.

@tcharding
Copy link
Copy Markdown
Member Author

the last commit was kinda a waste of time and it should have been adding the trait instead

No worries, how to split up and order PRs is definitely on my list of dev skills to improve on at the moment.

unsure if we want to merg this or wait for #3215

I rekon merge this first

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Aug 23, 2024

how to split up and order PRs

It's not about splitting or ordering. Just that commit only does very tiny improvement that will be relevant maybe in a few years rather than either ignoring it right now because we have other things to do or doing it completely - with the trait & stuff.

@apoelstra apoelstra merged commit 837fc9c into rust-bitcoin:master Aug 24, 2024
tcharding added a commit to tcharding/rust-bitcoin that referenced this pull request Aug 29, 2024
Now that we can panic in const context we can improve the `const_assert`
macro by adding a message string.

Original idea by Kix:

  rust-bitcoin#2972 (comment)
apoelstra added a commit that referenced this pull request Aug 31, 2024
3c7c8c4 Improve const_assert (Tobin C. Harding)

Pull request description:

  Now that we can panic in const context we can improve the `const_assert` macro by adding a message string.

  Original idea by Kix:

    #2972 (comment)

ACKs for top commit:
  Kixunil:
    ACK 3c7c8c4 in the sense that it does what it's supposed to with the only tiny issue being that the `bool` looks weird but not broken in any other way I can think of.
  apoelstra:
    ACK 3c7c8c4 successfully ran local tests

Tree-SHA512: 5ff721c0056f87d42c934818da6f780cd945f235291eb4b044752d67405a74f992d7f85853fec129e794ec3fcda1f319cc40daabc6a349d21bbdc977640d2572
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bitcoin PRs modifying the bitcoin crate C-internals PRs modifying the internals crate C-units PRs modifying the units crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants