Conversation
base58/src/lib.rs
Outdated
There was a problem hiding this comment.
Regarding the commit message, 8-bit is already disallowed by Rust itself. It's also an API guarantee - From<u16> for usize exists.
There was a problem hiding this comment.
TIL, thanks. I read this while learning: https://www.reddit.com/r/rust/comments/nltugw/first_ever_rust_code_compiled_for_8bit_6502/
And fixed up the terminology in the rustdocs in this PR to use "target" instead of "architecture" because I think it better describes what we are doing since its specifically the pointer width and the implications that has on Into<u64> for usize that concerns us - correct me if I'm wrong please.
We now have
// We only support 32-bit and 64-bit targets.
//
// - We can't guarantee this library works on 16-bit targets.
// - Rust does not implement `Into<u64>` for `usize` which is annoying so we explicitly do not
// support 128-bit targets so that we can do the conversion infallibly.I'd like to further improve the first bullet point though.
bitcoin/src/lib.rs
Outdated
There was a problem hiding this comment.
I really don't like dumping random shit into crate root. This should either go to internals or shouldn't be public. internals would also take care of compile errors in the previous commit - internals would be compiled first and fail. Though maybe it's too restrictive since units could actually support 16-bit.
There was a problem hiding this comment.
If units wants to support 16-bit processors it can just avoid using this macro. I think we should default to blocking 16-bit systems except in crates where we take active steps to support them.
Which is an argument for pulling this trait into internals and then making sure any casts to usize have a good justification for not using it, and then in units if we want we can use an alternate trait.
There was a problem hiding this comment.
I think the problem is this trait is doing two things:
- Provide a function so we don't have to cast (this is a totally internal thing)
- Provide a trait for the
consensus::encodemodule to use (this is a public thing)
In #2931 we want to do something like:
fn emit_varint(&mut self, v: impl ToU64) -> Result<usize, io::Error>;It also turns out that we only do suspicious casting (at the moment) in bitcoin (note patch 2 in this PR only touches bitcoin/).
Thoughts please?
There was a problem hiding this comment.
I think we should default to blocking 16-bit systems except in crates where we take active steps to support them.
Yes, but then we should make the trait optional and only available with only_32_and_64_bit_systems feature flag which all our crates will enable. And we should make sure units API is ready for 16-bits by not using the trait in the API.
@tcharding if we seal the trait and decide to only implement it for those integers it's already implemented for then the problems of the trait being in a stable crate are very minimal. The only real problem I can think of is crate_a defining fn foo(x: impl ToU64) { crate_b::bar(x) } where bar, then if one of the crates updates this breaks. We could advise early conversions (which are good for performance anyway) to guard against this: foo(x: impl ToU64) { crate_b::bar(x.to_u64()) }. Maybe this is not a long-term solution though and we need a stable crate.
There was a problem hiding this comment.
I don't think we should make every single user and maintainer need to think about an extra feature just so that some crates in the ecosystem are able to work on 16-bit systems.
I wouldn't mind having a cfg flag to enable 16-bit compilation, and having most crates fail compilation if it's enabled. Though that does mean that we need to add a few lines of code to every crate in order to prevent compilation in 16-bit mode.
There was a problem hiding this comment.
@apoelstra I meant it to be in internals, so downstream users don't need to worry about it. units, address etc would turn it off, so they can still use internals without breaking compilation other crates would enable it to trigger the error message.
Actually inverting it would cause more work for downstream users. With the feature as I described the users will get usable units and the code compiles because they don't depend on other crates with no additional work.
bitcoin/src/lib.rs
Outdated
There was a problem hiding this comment.
The expect message should say ">64-bit architectures disabled by compile_err`. Also, I think the compile error should be placed here, maybe even inside the function to make it absolutely obvious and linked.
There was a problem hiding this comment.
Nice, now using
impl ToU64 for usize {
fn to_u64(self) -> u64 {
// Rust does not implement `Into<u64>` for `usize` which is annoying so we explicitly do not
// support targets other than 32-bit and 64-bit.
#[cfg(all(not(target_pointer_width = "32"), not(target_pointer_width = "64")))]
compile_error!("rust-bitcoin only supports 32-bit and 64-bit targets");
self.try_into().expect(">64-bit targets disabled by compile_err")
}
}
bitcoin/src/lib.rs
Outdated
There was a problem hiding this comment.
I've had another think and I still can't work it out - why cant' we guarantee that rust-bitcoin works on 16-bit targets? Some ideas
- because we assume 32 bits and do a bunch of casts?
- because hashes would be slow as hell if
u32was two words? - because of some crypto dark-magic that I don't understand?
There was a problem hiding this comment.
Because we are using usize for the sizes of containers, which (by consensus) should be allowed to have more than 65536 entries in them.
There was a problem hiding this comment.
- because we assume 32 bits and do a bunch of casts?
Yes, this one. And @apoelstra is correct too though there's not much choice in 16-bit architectures - you can't bypass the limitation by just using u32, because the hardware simply cannot address that.
What could work is allowing the small crates that are unaffected by consensus rules to compile on 16-bits. units, addresses, hashes, ... It could be useful for the DIY embedded enthusiasts. Use cases include making calculators, block clocks, POS devices... It's mostly toys though I don't think it'd be too much work if we restrict ourselves to the obvious crates.
There was a problem hiding this comment.
Maybe only primitives should block it because that's where the consensus stuff is (and where the majority of the casts are, I'll bet).
Elsewhere we can audit the casts. I'll bet there aren't many cases outside of consensus parsing where we're casting numbers that might be big enough to overflow u16, but are guaranteed not to overflow u32.
There was a problem hiding this comment.
Good point! Though crates depending on primitives will get the restriction transitively.
There was a problem hiding this comment.
True, but I think that's "working as intended".
fc03492 to
357ba2d
Compare
|
Draft until we sort out:
|
base58/src/lib.rs
Outdated
There was a problem hiding this comment.
A CHERI Rust effort defined their usize as 128-bits as they technically have 128-bit pointers.
I don't believe there's a need to consider unsupported/broken platforms here, yet this comment could caveat better.
Considering the call is for usize to no longer directly align with pointer width for such cases, and considering the intent here, this check may be better implemented based off sizeof usize (which I believe can be done with the techniques of const assert)?
Sorry to throw in the irrelevant oddities. It's an interesting rabbit hole and it does kinda rear its head here. Feel free to close without comment.
There was a problem hiding this comment.
Considering the call is for usize to no longer directly align with pointer width for such cases, and considering the intent here, this check may be better implemented based off sizeof usize (which I believe can be done with the techniques of const assert)?
Oh, this is a good point, we should most likely do that. We're not messing with pointer casts in any library, so this would provide CHERI support out of the box.
|
Will wait for #2972 to merge/resolve before continuing with this work. |
Currently we enforce that our code only runs on machines with a certain pointer width (32 or 64 by failing to compile if pointer size width is 16). One of the underlying reasons is because of requirements in consensus code in Bitcoin Core which requires containers with more than 2^16 (65536) items [0]. We can better express our requirements by asserting on Rust's index size (the `usize` type). As a side benefit, there is active work [1] to make Rust support architectures where pointer width != idex size. With this patch applied `rust-bitcoin` will function correctly even if that work progresses. - [0] rust-bitcoin#2929 (comment) - [1] rust-lang/rust#65473
357ba2d to
6584a25
Compare
6584a25 to
e62d272
Compare
ToU64ToU64 trait
e62d272 to
9079060
Compare
|
PR re-done, new description and new title. Now puts the trait in |
bitcoin/src/bip152.rs
Outdated
There was a problem hiding this comment.
The commit message mentions 16 bit but the entire commit is about not supporting more than 64 bit sizes.
There was a problem hiding this comment.
Oh thanks, I woke up in the night wondering if I had this patch wrong - maybe my brain remembered that I miss described it :)
internals/src/lib.rs
Outdated
There was a problem hiding this comment.
This will make debug builds slower, not sure how much of a problem it is.
There was a problem hiding this comment.
Can you explain further please? I don't know what will be different here between the debug and release build.
There was a problem hiding this comment.
Flagging that this is not resolved by the force push, other comments were addressed.
There was a problem hiding this comment.
In release build the compiler sees that the panic is unreachable and removes the condition but in debug build there's no optimization. The alternative is to use as which doesn't produce the condition in the first place. If I was writing it I would prefer as and a comment but it's not something I'd block a PR for.
There was a problem hiding this comment.
TIL, thanks man. To help me apply this knowledge can I ask how, in this specific instance, do you know this? Is it because you know the exact compiler optimizations or is it just that you know that in release mode there are a lot of optimizations and this seems like a reasonable thing to be optimized out?
There was a problem hiding this comment.
Is it because you know the exact compiler optimizations
I don't know all possible optimizations but I happen to know all that apply here:
- inlining - when a function is just a few instructions compiler will determine it's not worth the call overhead and just copies the body
- const propagation - expressions consisting purely of constants are simplified. E.g.
42 + 21will become63but this also applies forlet foo = 42; foo + 21 - dead code elimination - if a branch can be proven unreachable (e.g. thanks to const propagation evaluating the condition to
false; but another famous one is if the same condition was checked already) then the branch is thrown away with the condition - splitting up a struct into individual registers - if the struct doesn't have any pointers its fields might fit in registers so the struct disappears
- removing unused writes - if something writes to a variable which is never read afterwards then the write was useless and is thrown away
- removing unused variables - unused variables don't allocate even stack
- removing expressions without side effects (this one is funny because infinite loop behaves differently in C and Rust)
So if we desugar (I write it in Rust for clarity but the optimizations happen in LLVM IR) foo_usize.to_u64() on a 64-bit platform (BTW I just checked and std is using pointer size):
// This is what Result becomes after monomorphization and layout optimization (none here)
struct Result_u64_TryFromIntError {
tag: ResultDiscriminant,
data: ResultData_u64_TryFromIntError
}
enum ResultDiscriminant {
Ok,
Err,
}
union ResultData_u64_TryFromIntError {
ok: u64,
err: TryFromIntError,
}
fn TryFrom_usize_for_u64_try_from(x: usize) -> Result_u64_TryFromIntError {
Result_u64_TryFromIntError {
tag: ResultDiscriminant::Ok, // notice that this is constant
data: ResultData_u64_TryFromIntError { ok: x as u64 },
}
}
// After inlining
fn usize_ToU64_to_u64(x: usize) -> u64 {
// this is inlined <u64 as TryFrom<usize>::try_from
let result = Result_u64_TryFromIntError {
tag: ResultDiscriminant::Ok,
data: ResultData_u64_TryFromIntError { ok: x as u64 },
};
// this is inlined `Result::<u64, TryFromIntError>::unwrap`
match result.tag {
ResultDiscriminant::Ok => unsafe { result.data.ok },
ResultDiscriminant::Err => panic!("called `Result::unwrap` on an `Err` value: {:?}", unsafe { result.data.err }),
}
}
// After constant propagation
fn usize_ToU64_to_u64(x: usize) -> u64 {
// this is inlined <u64 as TryFrom<usize>::try_from
let result = Result_u64_TryFromIntError {
tag: ResultDiscriminant::Ok,
data: ResultData_u64_TryFromIntError { ok: x as u64 },
};
// this is inlined `Result::<u64, TryFromIntError>::unwrap`
match ResultDiscriminant::Ok {
ResultDiscriminant::Ok => unsafe { result.data.ok },
ResultDiscriminant::Err => panic!("called `Result::unwrap` on an `Err` value: {:?}", unsafe { result.data.err }),
}
}
// After dead code elimination
fn usize_ToU64_to_u64(x: usize) -> u64 {
// this is inlined <u64 as TryFrom<usize>::try_from
let result = Result_u64_TryFromIntError {
tag: ResultDiscriminant::Ok,
data: ResultData_u64_TryFromIntError { ok: x as u64 },
};
unsafe { result.data.ok }
}
// After splitting up the struct:
fn usize_ToU64_to_u64(x: usize) -> u64 {
// this is never read
let tag = ResultDiscriminant::Ok;
let data = ResultData_u64_TryFromIntError { ok: x as u64 };
unsafe { data.ok }
}
// After removing useless write
fn usize_ToU64_to_u64(x: usize) -> u64 {
// no side effects here
ResultDiscriminant::Ok;
let data = ResultData_u64_TryFromIntError { ok: x as u64 };
unsafe { data.ok }
}
// After removing side-effect-free expressions
fn usize_ToU64_to_u64(x: usize) -> u64 {
let data = ResultData_u64_TryFromIntError { ok: x as u64 };
unsafe { data.ok }
}
// I'm pretty sure the union gets cleaned up too I just don't know how exactly does it work. It should end up as a cast at the end and as a no-op in hardware.There was a problem hiding this comment.
Mad, thanks man. I learned a bunch from this.
internals/src/lib.rs
Outdated
There was a problem hiding this comment.
This is true of any type or trait in the crate so not sure why this is needed.
BTW there's a WIP Rust feature to detect types from private dependencies in public interface. IIRC it can detect use of the trait already. Maybe we should try to use it regardless of its incompleteness.
There was a problem hiding this comment.
Both good points, I"ll re-spin. Thanks
There was a problem hiding this comment.
Did you mean this one https://doc.rust-lang.org/stable/nightly-rustc/rustc_lint_defs/builtin/static.PRIVATE_INTERFACES.html
It doesn't fire if we add it along with this code to bitcoin/src/lib.rs
#![deny(private_interfaces)]
pub fn bar(_: impl internals::ToU64) {}There was a problem hiding this comment.
No, that does something else. The feature is in nightly and requires marking dependencies as public = true in Cargo.toml or not. So we will definitely need a script to inject those into our Cargo.toml when linting based on a list defined in [metadata] or externally. Also there are some compiler switches I haven't figured out completely yet.
There was a problem hiding this comment.
Lets leave that until we are closer to v1.0s, I'd definitely like to use it then to help catch us.
|
Oh, did we decide that we want to seal the |
9079060 to
5876bf8
Compare
|
Now that we put
|
|
Sealing is not important because it's in a private unstable dependency and the trait is unlikely to change however I still do like doing so. Regarding public use, consensus will use |
We already explicitly do not support 16 bit machines. Also, because Rust supports `u182`s one cannot infallibly convert from a `usize` to a `u64`. This is unergonomic and results in a ton of casts. We can instead limit our code to running only on machines where `usize` is less that or equal to 64 bits then the infallible conversion is possible. Since 128 bit machines are not a thing yet this does not in reality introduce any limitations on the library. Add a "private" trait to the `internals` crate to do infallible conversion to a `u64` from `usize`. Implement it for all unsigned integers smaller than `u64` as well so we have the option to use the trait instead of `u32::from(foo)`.
5876bf8 to
579b76b
Compare
|
Lets push forward with this, it is a step in the right direction. We can worry about the |
The idea for this was pulled out of Steven's work in #2133
We already explicitly do not support 16 bit machines.
Also, because Rust supports
u182s one cannot infallibly convert from ausizeto au64. This is unergonomic and results in a ton of casts.We can instead limit our code to running only on machines where
usizeis less that or equal to 64 bits then the infallible conversion is possible.Since 128 bit machines are not a thing yet this does not in reality introduce any limitations on the library.
Add a "private" trait to the
internalscrate to do infallible conversion to au64fromusize.Implement it for all unsigned integers smaller than
u64as well so we have the option to use the trait instead ofu32::from(foo).