Skip to content

hashbrown's support for borsh is unsound #576

@kayabaNerve

Description

@kayabaNerve

borsh is a codec advertised for and frequently used in security-critical projects. One of the advertised features is "consistency", where an encoded object only has a single representation. The implementation here breaks that property by encoding via iteration over the HashMap and not in a consistent (sorted) order. This is proven by https://github.com/kayabaNerve/hashbrown-borsh-poc. Additionally, the decode function does not perform any checks on the order of the keys.

This also breaks the borsh specification for HashMap/HashSet types. This claim can be dismissed if it's argued hashbrown is a HashMap/HashSet yet is not a borsh HashMap/HashSet, yet I'll continue on the assumption that yes, they should be the same. Not only is the ordering broken, yet hashbrown serializes the hasher (not defined in the borsh spec, producing non-compliant encodings for any hashers of non-0-encoded-length) and uses a usize for the length. usize is not defined in the codec yet is serialized as a u64 by the Rust borsh crate. The codec specifies HashMap lengths should be u32s.

The inclusion of borsh was also flawed from a design standpoint as I noted here: #525 (comment). The Rust borsh crate already supported HashMap/HashSet. This makes borsh have a cyclic dependency (as even though they won't import hashbrown 0.15, their deps will now that those have updated to 0.15. I had to pin older deps for my PoC to build). It also is less featured as the borsh crate also supported HashSet yet only HashMap was supported here.


I submitted a responsible disclosure on the correctness issues on October 5th/6th. Amanieu informed me a few hours ago that they and Pietro (one of the two coordinators of the Rust security team) believe it's okay to discuss publicly, hence my creation of an issue documenting all of this. Ever since I raised hashbrown already had this functionality (without raising the correctness issues), the plan appears to be to remove this functionality (#570), mostly resolving this.

Considering this bug could come up in real-world patterns (canonical encodings can be relied on to prevent 'double-spends' in cryptocurrency and prevent consensus splits), I personally recommended yanking 0.15. That opinion has been disagreed with and I ack it's not my decision. I solely want my stance noted.

For examples, please see:

A recent issue on Ethereum regarding non-canonical encodings.

A 2021 issue in the Polkadot ecosystem (which did cause their network to halt) due to differing results from Rust's binary_search_by function dependent on compiler version. While this is unrelated to HashMaps/codecs, it highlights how any inconsistent behavior can have notable impact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions