Skip to content

Canonical encoding rules for ambiguous situations #34

@eth-r

Description

@eth-r

In some encodings without existing standards (like RFC4648), there is a potential for ambiguous encodings and this should be handled in some way. For example, the common test case "yes mani !" is usually base-2 encoded to "01111001011001010111001100100000011011010110000101101110011010010010000000100001" in implementations, implying that leading zeros are dropped. However, this means that "\x00yes mani !" also encodes to "01111001011001010111001100100000011011010110000101101110011010010010000000100001" which will absolutely cause problems for someone somewhere at some point.

There are a few different ways to resolve this ambiguity.

One would be to override the existing example and require fixed-length encoding so "yes mani !" would encode to "001111001011001010111001100100000011011010110000101101110011010010010000000100001" and "\x00yes mani !" to "00000000001111001011001010111001100100000011011010110000101101110011010010010000000100001". This would require decoding variable-width encodings as the shortest matching string, for backwards compatibility.

Another would be to permit dropping leading zeros as long as the encoding remains unambiguous. In this case the existing example would still be valid, "\x00yes mani !" would encode to "0001111001011001010111001100100000011011010110000101101110011010010010000000100001" and "\x00\x00yes mani !" to "000000000001111001011001010111001100100000011011010110000101101110011010010010000000100001".

A similar problem applies to base-8 and base-10, with the added complexity of the latter case not even mapping cleanly to bits to begin with. A canonical set of test vectors covering these and other edge cases would be extremely useful for implementers; #24

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions