Skip to content

Introduce ToU64 trait#2929

Merged
apoelstra merged 1 commit intorust-bitcoin:masterfrom
tcharding:06-27-support-32-64-bit-only
Aug 8, 2024
Merged

Introduce ToU64 trait#2929
apoelstra merged 1 commit intorust-bitcoin:masterfrom
tcharding:06-27-support-32-64-bit-only

Conversation

@tcharding
Copy link
Copy Markdown
Member

@tcharding tcharding commented Jun 27, 2024

The idea for this was pulled out of Steven's work in #2133

We already explicitly do not support 16 bit machines.

Also, because Rust supports u182s one cannot infallibly convert from a usize to a u64. This is unergonomic and results in a ton of casts.

We can instead limit our code to running only on machines where usize is less that or equal to 64 bits then the infallible conversion is possible.

Since 128 bit machines are not a thing yet this does not in reality introduce any limitations on the library.

Add a "private" trait to the internals crate to do infallible conversion to a u64 from usize.

Implement it for all unsigned integers smaller than u64 as well so we have the option to use the trait instead of u32::from(foo).

@github-actions github-actions bot added C-bitcoin PRs modifying the bitcoin crate C-hashes PRs modifying the hashes crate C-units PRs modifying the units crate C-internals PRs modifying the internals crate C-io PRs modifying the io crate C-base58 PRs modifying the base58 crate labels Jun 27, 2024
Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept ACK

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the commit message, 8-bit is already disallowed by Rust itself. It's also an API guarantee - From<u16> for usize exists.

Copy link
Copy Markdown
Member Author

@tcharding tcharding Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, thanks. I read this while learning: https://www.reddit.com/r/rust/comments/nltugw/first_ever_rust_code_compiled_for_8bit_6502/

And fixed up the terminology in the rustdocs in this PR to use "target" instead of "architecture" because I think it better describes what we are doing since its specifically the pointer width and the implications that has on Into<u64> for usize that concerns us - correct me if I'm wrong please.

We now have

// We only support 32-bit and 64-bit targets.
//
// - We can't guarantee this library works on 16-bit targets.
// - Rust does not implement `Into<u64>` for `usize` which is annoying so we explicitly do not
//   support 128-bit targets so that we can do the conversion infallibly.

I'd like to further improve the first bullet point though.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't like dumping random shit into crate root. This should either go to internals or shouldn't be public. internals would also take care of compile errors in the previous commit - internals would be compiled first and fail. Though maybe it's too restrictive since units could actually support 16-bit.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If units wants to support 16-bit processors it can just avoid using this macro. I think we should default to blocking 16-bit systems except in crates where we take active steps to support them.

Which is an argument for pulling this trait into internals and then making sure any casts to usize have a good justification for not using it, and then in units if we want we can use an alternate trait.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem is this trait is doing two things:

  1. Provide a function so we don't have to cast (this is a totally internal thing)
  2. Provide a trait for the consensus::encode module to use (this is a public thing)

In #2931 we want to do something like:

fn emit_varint(&mut self, v: impl ToU64) -> Result<usize, io::Error>;

It also turns out that we only do suspicious casting (at the moment) in bitcoin (note patch 2 in this PR only touches bitcoin/).

Thoughts please?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should default to blocking 16-bit systems except in crates where we take active steps to support them.

Yes, but then we should make the trait optional and only available with only_32_and_64_bit_systems feature flag which all our crates will enable. And we should make sure units API is ready for 16-bits by not using the trait in the API.

@tcharding if we seal the trait and decide to only implement it for those integers it's already implemented for then the problems of the trait being in a stable crate are very minimal. The only real problem I can think of is crate_a defining fn foo(x: impl ToU64) { crate_b::bar(x) } where bar, then if one of the crates updates this breaks. We could advise early conversions (which are good for performance anyway) to guard against this: foo(x: impl ToU64) { crate_b::bar(x.to_u64()) }. Maybe this is not a long-term solution though and we need a stable crate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this: #2938

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should make every single user and maintainer need to think about an extra feature just so that some crates in the ecosystem are able to work on 16-bit systems.

I wouldn't mind having a cfg flag to enable 16-bit compilation, and having most crates fail compilation if it's enabled. Though that does mean that we need to add a few lines of code to every crate in order to prevent compilation in 16-bit mode.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@apoelstra I meant it to be in internals, so downstream users don't need to worry about it. units, address etc would turn it off, so they can still use internals without breaking compilation other crates would enable it to trigger the error message.

Actually inverting it would cause more work for downstream users. With the feature as I described the users will get usable units and the code compiles because they don't depend on other crates with no additional work.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expect message should say ">64-bit architectures disabled by compile_err`. Also, I think the compile error should be placed here, maybe even inside the function to make it absolutely obvious and linked.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, now using

impl ToU64 for usize {
    fn to_u64(self) -> u64 {
        // Rust does not implement `Into<u64>` for `usize` which is annoying so we explicitly do not
        // support targets other than 32-bit and 64-bit.
        #[cfg(all(not(target_pointer_width = "32"), not(target_pointer_width = "64")))]
        compile_error!("rust-bitcoin only supports 32-bit and 64-bit targets");

        self.try_into().expect(">64-bit targets disabled by compile_err")
    }
}

Comment on lines 48 to 44
Copy link
Copy Markdown
Member Author

@tcharding tcharding Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had another think and I still can't work it out - why cant' we guarantee that rust-bitcoin works on 16-bit targets? Some ideas

  • because we assume 32 bits and do a bunch of casts?
  • because hashes would be slow as hell if u32 was two words?
  • because of some crypto dark-magic that I don't understand?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are using usize for the sizes of containers, which (by consensus) should be allowed to have more than 65536 entries in them.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • because we assume 32 bits and do a bunch of casts?

Yes, this one. And @apoelstra is correct too though there's not much choice in 16-bit architectures - you can't bypass the limitation by just using u32, because the hardware simply cannot address that.

What could work is allowing the small crates that are unaffected by consensus rules to compile on 16-bits. units, addresses, hashes, ... It could be useful for the DIY embedded enthusiasts. Use cases include making calculators, block clocks, POS devices... It's mostly toys though I don't think it'd be too much work if we restrict ourselves to the obvious crates.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only primitives should block it because that's where the consensus stuff is (and where the majority of the casts are, I'll bet).

Elsewhere we can audit the casts. I'll bet there aren't many cases outside of consensus parsing where we're casting numbers that might be big enough to overflow u16, but are guaranteed not to overflow u32.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Though crates depending on primitives will get the restriction transitively.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I think that's "working as intended".

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@tcharding tcharding force-pushed the 06-27-support-32-64-bit-only branch from fc03492 to 357ba2d Compare June 28, 2024 21:58
@tcharding
Copy link
Copy Markdown
Member Author

Draft until we sort out:

  • Better docs on the compile_err macro
  • Where to put ToU64

@tcharding tcharding marked this pull request as draft June 28, 2024 21:58
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CHERI Rust effort defined their usize as 128-bits as they technically have 128-bit pointers.

rust-lang/rust#65473

I don't believe there's a need to consider unsupported/broken platforms here, yet this comment could caveat better.

Considering the call is for usize to no longer directly align with pointer width for such cases, and considering the intent here, this check may be better implemented based off sizeof usize (which I believe can be done with the techniques of const assert)?

Sorry to throw in the irrelevant oddities. It's an interesting rabbit hole and it does kinda rear its head here. Feel free to close without comment.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the call is for usize to no longer directly align with pointer width for such cases, and considering the intent here, this check may be better implemented based off sizeof usize (which I believe can be done with the techniques of const assert)?

Oh, this is a good point, we should most likely do that. We're not messing with pointer casts in any library, so this would provide CHERI support out of the box.

@tcharding
Copy link
Copy Markdown
Member Author

Will wait for #2972 to merge/resolve before continuing with this work.

tcharding added a commit to tcharding/rust-bitcoin that referenced this pull request Aug 4, 2024
Currently we enforce that our code only runs on machines with a
certain pointer width (32 or 64 by failing to compile if pointer size
width is 16). One of the underlying reasons is because of requirements
in consensus code in Bitcoin Core which requires containers with more
than 2^16 (65536) items [0].

We can better express our requirements by asserting on Rust's index
size (the `usize` type).

As a side benefit, there is active work [1] to make Rust support
architectures where pointer width != idex size. With this patch applied
`rust-bitcoin` will function correctly even if that work progresses.

- [0] rust-bitcoin#2929 (comment)
- [1] rust-lang/rust#65473
@tcharding tcharding force-pushed the 06-27-support-32-64-bit-only branch from 357ba2d to 6584a25 Compare August 4, 2024 22:58
@github-actions github-actions bot removed C-hashes PRs modifying the hashes crate C-units PRs modifying the units crate C-io PRs modifying the io crate C-base58 PRs modifying the base58 crate labels Aug 4, 2024
@tcharding tcharding force-pushed the 06-27-support-32-64-bit-only branch from 6584a25 to e62d272 Compare August 4, 2024 22:59
@tcharding tcharding changed the title Explicitly only support 32 and 64 bit machines and introduce ToU64 Introduce ToU64 trait Aug 4, 2024
@tcharding tcharding force-pushed the 06-27-support-32-64-bit-only branch from e62d272 to 9079060 Compare August 4, 2024 23:02
@tcharding
Copy link
Copy Markdown
Member Author

tcharding commented Aug 4, 2024

PR re-done, new description and new title. Now puts the trait in internals - note the use of "private" even though the trait is public to try to highlight that this is an internal private thing.

@tcharding tcharding marked this pull request as ready for review August 4, 2024 23:03
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message mentions 16 bit but the entire commit is about not supporting more than 64 bit sizes.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks, I woke up in the night wondering if I had this patch wrong - maybe my brain remembered that I miss described it :)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make debug builds slower, not sure how much of a problem it is.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain further please? I don't know what will be different here between the debug and release build.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flagging that this is not resolved by the force push, other comments were addressed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In release build the compiler sees that the panic is unreachable and removes the condition but in debug build there's no optimization. The alternative is to use as which doesn't produce the condition in the first place. If I was writing it I would prefer as and a comment but it's not something I'd block a PR for.

Copy link
Copy Markdown
Member Author

@tcharding tcharding Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL, thanks man. To help me apply this knowledge can I ask how, in this specific instance, do you know this? Is it because you know the exact compiler optimizations or is it just that you know that in release mode there are a lot of optimizations and this seems like a reasonable thing to be optimized out?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it because you know the exact compiler optimizations

I don't know all possible optimizations but I happen to know all that apply here:

  • inlining - when a function is just a few instructions compiler will determine it's not worth the call overhead and just copies the body
  • const propagation - expressions consisting purely of constants are simplified. E.g. 42 + 21 will become 63 but this also applies for let foo = 42; foo + 21
  • dead code elimination - if a branch can be proven unreachable (e.g. thanks to const propagation evaluating the condition to false; but another famous one is if the same condition was checked already) then the branch is thrown away with the condition
  • splitting up a struct into individual registers - if the struct doesn't have any pointers its fields might fit in registers so the struct disappears
  • removing unused writes - if something writes to a variable which is never read afterwards then the write was useless and is thrown away
  • removing unused variables - unused variables don't allocate even stack
  • removing expressions without side effects (this one is funny because infinite loop behaves differently in C and Rust)

So if we desugar (I write it in Rust for clarity but the optimizations happen in LLVM IR) foo_usize.to_u64() on a 64-bit platform (BTW I just checked and std is using pointer size):

// This is what Result becomes after monomorphization and layout optimization (none here)
struct Result_u64_TryFromIntError {
    tag: ResultDiscriminant,
    data: ResultData_u64_TryFromIntError
}

enum ResultDiscriminant {
    Ok,
    Err,
}

union ResultData_u64_TryFromIntError {
    ok: u64,
    err: TryFromIntError,
}

fn TryFrom_usize_for_u64_try_from(x: usize) -> Result_u64_TryFromIntError {
    Result_u64_TryFromIntError {
        tag: ResultDiscriminant::Ok, // notice that this is constant
        data: ResultData_u64_TryFromIntError { ok: x as u64 },
    }
}

// After inlining
fn usize_ToU64_to_u64(x: usize) -> u64 {
    // this is inlined <u64 as TryFrom<usize>::try_from
    let result = Result_u64_TryFromIntError {
        tag: ResultDiscriminant::Ok,
        data: ResultData_u64_TryFromIntError { ok: x as u64 },
    };
    // this is inlined `Result::<u64, TryFromIntError>::unwrap`
    match result.tag {
        ResultDiscriminant::Ok => unsafe { result.data.ok },
        ResultDiscriminant::Err => panic!("called `Result::unwrap` on an `Err` value: {:?}", unsafe { result.data.err }),
    }
}

// After constant propagation
fn usize_ToU64_to_u64(x: usize) -> u64 {
    // this is inlined <u64 as TryFrom<usize>::try_from
    let result = Result_u64_TryFromIntError {
        tag: ResultDiscriminant::Ok,
        data: ResultData_u64_TryFromIntError { ok: x as u64 },
    };
    // this is inlined `Result::<u64, TryFromIntError>::unwrap`
    match ResultDiscriminant::Ok {
        ResultDiscriminant::Ok => unsafe { result.data.ok },
        ResultDiscriminant::Err => panic!("called `Result::unwrap` on an `Err` value: {:?}", unsafe { result.data.err }),
    }
}

// After dead code elimination
fn usize_ToU64_to_u64(x: usize) -> u64 {
    // this is inlined <u64 as TryFrom<usize>::try_from
    let result = Result_u64_TryFromIntError {
        tag: ResultDiscriminant::Ok,
        data: ResultData_u64_TryFromIntError { ok: x as u64 },
    };

    unsafe { result.data.ok }
}

// After splitting up the struct:
fn usize_ToU64_to_u64(x: usize) -> u64 {
    // this is never read
    let tag = ResultDiscriminant::Ok;
    let data = ResultData_u64_TryFromIntError { ok: x as u64 };

    unsafe { data.ok }
}

// After removing useless write
fn usize_ToU64_to_u64(x: usize) -> u64 {
    // no side effects here
    ResultDiscriminant::Ok;
    let data = ResultData_u64_TryFromIntError { ok: x as u64 };

    unsafe { data.ok }
}

// After removing side-effect-free expressions
fn usize_ToU64_to_u64(x: usize) -> u64 {
    let data = ResultData_u64_TryFromIntError { ok: x as u64 };

    unsafe { data.ok }
}

// I'm pretty sure the union gets cleaned up too I just don't know how exactly does it work. It should end up as a cast at the end and as a no-op in hardware.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mad, thanks man. I learned a bunch from this.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true of any type or trait in the crate so not sure why this is needed.

BTW there's a WIP Rust feature to detect types from private dependencies in public interface. IIRC it can detect use of the trait already. Maybe we should try to use it regardless of its incompleteness.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both good points, I"ll re-spin. Thanks

Copy link
Copy Markdown
Member Author

@tcharding tcharding Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean this one https://doc.rust-lang.org/stable/nightly-rustc/rustc_lint_defs/builtin/static.PRIVATE_INTERFACES.html

It doesn't fire if we add it along with this code to bitcoin/src/lib.rs

#![deny(private_interfaces)]

pub fn bar(_: impl internals::ToU64) {}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that does something else. The feature is in nightly and requires marking dependencies as public = true in Cargo.toml or not. So we will definitely need a script to inject those into our Cargo.toml when linting based on a list defined in [metadata] or externally. Also there are some compiler switches I haven't figured out completely yet.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets leave that until we are closer to v1.0s, I'd definitely like to use it then to help catch us.

@tcharding
Copy link
Copy Markdown
Member Author

Oh, did we decide that we want to seal the ToU64 trait?

@tcharding tcharding force-pushed the 06-27-support-32-64-bit-only branch from 9079060 to 5876bf8 Compare August 5, 2024 18:38
@tcharding
Copy link
Copy Markdown
Member Author

Now that we put ToU64 in internals we cannot use it in #2931. Some options:

  • Put it in primitives/src/lib.rs (same argument exists against this as when I originally tried to put in in bitcoin/src/lib.rs)
  • Add a primitives::consensus module and put it in there.
  • Some other option?

@Kixunil
Copy link
Copy Markdown
Collaborator

Kixunil commented Aug 6, 2024

Sealing is not important because it's in a private unstable dependency and the trait is unlikely to change however I still do like doing so.

Regarding public use, consensus will use push_decode which kinda sidesteps the issue but there is still VarIntEncoder which currently takes u64. If we don't want to call to_u64 explicitly maybe we should have an extension trait EncodeVarint and have methods returning encoders instead. (So you'd write foo.len().encode_varint().)

We already explicitly do not support 16 bit machines.

Also, because Rust supports `u182`s one cannot infallibly convert from a
`usize` to a `u64`. This is unergonomic and results in a ton of casts.

We can instead limit our code to running only on machines where `usize`
is less that or equal to 64 bits then the infallible conversion is
possible.

Since 128 bit machines are not a thing yet this does not in reality
introduce any limitations on the library.

Add a "private" trait to the `internals` crate to do infallible
conversion to a `u64` from `usize`.

Implement it for all unsigned integers smaller than `u64` as well so
we have the option to use the trait instead of `u32::from(foo)`.
@tcharding tcharding force-pushed the 06-27-support-32-64-bit-only branch from 5876bf8 to 579b76b Compare August 8, 2024 05:33
@tcharding
Copy link
Copy Markdown
Member Author

Lets push forward with this, it is a step in the right direction. We can worry about the VarInt usage in #2931.

Copy link
Copy Markdown
Collaborator

@Kixunil Kixunil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 579b76b

Copy link
Copy Markdown
Member

@apoelstra apoelstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 579b76b successfully ran local tests

@apoelstra apoelstra merged commit c59b9e3 into rust-bitcoin:master Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-bitcoin PRs modifying the bitcoin crate C-internals PRs modifying the internals crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants