Conversation
Specify length limit to unify version string handling across implementations. License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
7142a33 to
9f717e2
Compare
marten-seemann
left a comment
There was a problem hiding this comment.
Can we make this UTF-8? I'd really like to not introduce any surprises here.
|
@marten-seemann a̹̖̪͔͖̝͘̕͞r̛̛̫̞̬̝͎̘̹͞ę̴͉͉̼̀͟ ̛͏̴͜͏͉̣͉̮͖̳̝͍̭̮̮͍͔͈̲͉͙̰ͅy̞̠̮̥̯̞̦͈̮̣̗̤͚͓̻͝ǫ̶̸̻̫͍̞͎̘͡u̦̮̭̬͔̝͇̠͟͡ ̦͇̘̤͔̝̥̜̘̙̭͙̱́͢s̸̛̠̝͓͚̬̦͇͘͜u͠͏̧̪͇̙͖̳̘̘͓͞͝ͅr̶̷̛̪̩̭̮̱͖̼̙̬̣͔̫͇̳̳̦͢ͅe̸̡̧̯̝͖̮̱͖̲̜̙̹͜͝ͅ ̧̰̟̻̳̗̱͕̰̜̠͇̘͔́͜͟y̡̢͉͙͎͎̹̖̲͇̰͟͢o̶̷͍̼̙̦̫̹̯̭͓̺̞͢ù͏̡̠̭̤̗͖͈͕͙̭͢ ̡̨̛͇͚̦̠̗͢͢w͏̦̗͍̫̀á̢͔͚̰̬͔͕̼̕ņ̴̛̲̜̮͈͙̰̀͘t̶̤̜̯͙̝̺̜̠̘̼͙̞͍̖̟͓̝̤͘͞ ͜͞͏͏͕̣͍̝͍̜͉̳̙̘̥͜t̛̪͔̯̱͓͝ǫ̸̨̢̤̘̰͉̬̤͓̙̼̖̞͖͟ ̷̨̢̳̣͈̪͙̦̻͉̗̹͓̤͖͕͘͢a̢̛͈̲͉͎̲͕͘l͠҉̧̛̞̥̟̜̠̯̰͙͎̬͜ͅl̸̵͍͓͓̼̬͙͍̰̭̟̖̪͍̀o͡͏̡̜̩̗̤͚̪̟̘̥̲̘̥͕̘͎̗̬͉͘͟w̡̡͖̼̟̣͙͕̥͈̮̜̩̟͈̫͘̕͠ ̘̻͕̬͓̗̠̕͞u̸̗͎̜̗͍̝͔̘͎̙̠̭̗͕t̴̤̺̫̮̩̀́͟͞ͅf̸͙̞̞̣̦͟8͏̞̥̮̙̙̲͉̰͖͉͚̤͇͠ ? |
|
Nice UTF-8 art! :) I don't really see reason not to. Limiting ourselves to ASCII is so 1990s style. |
|
Perhaps this could be phrased as: "Implementations should discard non-ASCII characters and trim the string to 64 characters, but may choose to allow UTF-8 characters if potential for UTF-8 art/mimicry is acceptable" I definitely wouldn't want UTF-8 support to be outright gone, using UTF-8 in protocol names (maybe containing CIDs with UTF-8 encodings) could have a lot of potential use-cases. For the agent and version strings though, that's perfectly understandable |
|
In fact, maybe better idea, what about: "Implementations should trim the string to 64 characters. Implementations MAY allow UTF-8 characters in the string, however, these strings should be visible to users as both UTF-8 and ASCII punycode (per IETF RFC 3492) to protect against UTF-8 mimicry." |
|
I'd say let's fully embrace UTF-8. This is 2022, and we finally have a standard encoding that's universally supported. Building on @Winterhuman's proposal:
|
…ring-limits License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
Adds guidance for safely handling untrusted Unicode strings from remote peers in the identify protocol: - Add 128 rune (Unicode code points) limit as SHOULD requirement - Add Unicode Sanitization section with guidance to: - Remove dangerous Unicode categories (Cc, Cf, Co, Cs) - Preserve legitimate international characters and emojis - Reference Unicode Standard Annex #44 for formal definitions - Note that sanitization is for display safety, not protocol restriction - Update revision to r2, 2025-09-12 - Add @lidel to Interest Group License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
|
Agreed: we need to keep unicode to ensure this is not limited to latin alphabet.
|
|
This could be a good reference: https://www.rfc-editor.org/rfc/rfc9839.html |
- add RFC 9839 reference for unicode string handling - replace problematic characters with U+FFFD instead of removing - remove private use characters (Co) from restricted list per RFC - add note that deletion is a known security risk per RFC 9839 License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
remove whitespace trimming to align with RFC 9839's principle of avoiding silent deletion License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
|
@MarcoPolo thanks! pushed changes to reference and follow idea from that RFC: replacement with |
replace problematic unicode characters with U+FFFD instead of deleting them, following RFC 9839 guidance that deletion is a security risk. moves sanitization to version.go and adds comprehensive tests. refs: https://www.rfc-editor.org/rfc/rfc9839.html refs: libp2p/specs#491
Shows the AgentVersion from libp2p identify protocol below successful connections. Basic sanitization inspired by libp2p/specs#491
MarcoPolo
left a comment
There was a problem hiding this comment.
I'd just call them Unicode code points rather than also call them runes.
|
besides that, this lgtm |
License: MIT Signed-off-by: Marcin Rataj <lidel@lidel.org>
|
done |
preserve private use characters as specified in libp2p/specs#491 enforce 128 rune limit on untrusted peer data
This PR formalizes
normalization andlength limit per string field (agentVersion,protocolVersionandprotocolsarray).The goal is to reduce surprises and unify behavior across implementations.