fix: escape invalid UTF-8 bytes in debug output for Match#1203
fix: escape invalid UTF-8 bytes in debug output for Match#1203BurntSushi merged 3 commits intorust-lang:masterfrom
Conversation
src/regex/bytes.rs
Outdated
| fmt.field("bytes", &s); | ||
|
|
||
| let bytes = self.as_bytes(); | ||
| let formatted = bytes_to_string_with_invalid_utf8_escaped(bytes); |
There was a problem hiding this comment.
Can you use regex_automata::util::escape::DebugHaystack instead? It will basically do what you have here, but will only escape invalid UTF-8. What you've implemented here will escape not only invalid UTF-8, but all UTF-8 that isn't ASCII. (I think that would be a cure worse than the disease.)
There was a problem hiding this comment.
Modified to use DebugHaystack. I thought there would be such a feature but couldn't find it. Thanks for your suggestion. 88112b3
| debug_str, | ||
| r#"Match { start: 7, end: 13, bytes: "\\xFFworld" }"# | ||
| ); | ||
| } |
There was a problem hiding this comment.
Please add some tests with non-ASCII UTF-8.
src/regex/bytes.rs
Outdated
| fn bytes_to_string_with_invalid_utf8_escaped(bytes: &[u8]) -> String { | ||
| let mut result = String::new(); | ||
| for &byte in bytes { | ||
| if byte.is_ascii() { |
There was a problem hiding this comment.
outputs valid UTF-8 characters as is
This is why what you said isn't accurate here. This only outputs ASCII characters as-is. Everything else, including valid UTF-8 that isn't ASCII, is emitted as escape byte sequences.
|
This PR is on crates.io in |
Description
Debugimplementation forMatchhas been updated to useDebugHaystack. This provides a way to handle the formatting of&[u8]for debug output.\xHH).\t,\n) are properly escaped.