[pydocstyle] Escaped docstring in docstring (D301 )#12192
[pydocstyle] Escaped docstring in docstring (D301 )#12192charliermarsh merged 3 commits intoastral-sh:mainfrom
Conversation
| let body = docstring.body(); | ||
| let bytes = body.as_bytes(); | ||
| let mut backslash_index = 0; | ||
| let escaped_docstring_backslashes_pattern = b"\"\\\"\\\""; |
There was a problem hiding this comment.
We likely also need to handle single quotes here (i.e., escaped single quotes within single-quote docstrings).
There was a problem hiding this comment.
But D301 is based on double quotes, do we need to cover single-quote docstring here?
There was a problem hiding this comment.
I haven't verified this myself but the way I read the code is that docstrings are extracted from any string literal
ruff/crates/ruff_linter/src/docstrings/extraction.rs
Lines 6 to 15 in 7d16f83
84ba7f5 to
0d94f8e
Compare
| let escaped_triple_quotes = | ||
| &bytes[position.saturating_add(1)..position.saturating_add(4)]; | ||
| if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" { | ||
| return false; | ||
| } | ||
|
|
There was a problem hiding this comment.
This will panic if what comes after the \ is shorter than 3 characters. I would rewrite this to something like
| let escaped_triple_quotes = | |
| &bytes[position.saturating_add(1)..position.saturating_add(4)]; | |
| if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" { | |
| return false; | |
| } | |
| let after_first_backslash = &bytes[position + 1..]; | |
| let is_escaped_triple = after_first_backslash.starts_with(b"\"\"\"") | |
| || after_first_backslash.starts_with(b"\'\'\'"); | |
| if is_escaped_triple { | |
| return false; | |
| } |
| // For the `"\"\"` pattern, each iteration advances by 2 characters. | ||
| // For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`. |
There was a problem hiding this comment.
I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to any will be called twice for \\, once for each backslash position but the offsets aren't to indices apart.
What I understand is that you want to track if you're at the beginning of an escape sequence.
This is not fully fledged out, but I think we may have to rewrite the entire loop
while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) {
let after_escape = &body[position + 1..];
let next_char_len = after_escape.chars().next().unwrap_or_default();
let Some(escaped_char) = &after_escape.chars().next() else {
break;
};
if matches!(escaped_char, '"' | '\'') {
let is_escaped_triple =
after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'");
if is_escaped_triple {
// don't add a diagnostic
}
if position != 0 && offset == position {
// An escape sequence, e.g. `\a\b`
}
}
offset = position + escaped_char.len_utf8();
}There was a problem hiding this comment.
Thank you. This helps a lot!
Summary
This PR updates D301 rule to allow inclduing escaped docstring, e.g.
\"""Foo.\"""or\"\"\"Bar.\"\"\", within a docstring.Related issue: #12152
Test Plan
Add more test cases to D301.py and update the snapshot file.