[pydocstyle] Escaped docstring in docstring (D301 ) by ukyen8 · Pull Request #12192 · astral-sh/ruff

ukyen8 · 2024-07-04T21:27:03Z

Summary

This PR updates D301 rule to allow inclduing escaped docstring, e.g. \"""Foo.\""" or \"\"\"Bar.\"\"\", within a docstring.

Related issue: #12152

Test Plan

Add more test cases to D301.py and update the snapshot file.

charliermarsh · 2024-07-04T23:09:44Z

    let body = docstring.body();
    let bytes = body.as_bytes();
+    let mut backslash_index = 0;
+    let escaped_docstring_backslashes_pattern = b"\"\\\"\\\"";


We likely also need to handle single quotes here (i.e., escaped single quotes within single-quote docstrings).

But D301 is based on double quotes, do we need to cover single-quote docstring here?

I haven't verified this myself but the way I read the code is that docstrings are extracted from any string literal

ruff/crates/ruff_linter/src/docstrings/extraction.rs

Lines 6 to 15 in 7d16f83

/// Extract a docstring from a function or class body.

pub(crate) fn docstring_from(suite: &[Stmt]) -> Option<&ast::ExprStringLiteral> {

let stmt = suite.first()?;

// Require the docstring to be a standalone expression.

let Stmt::Expr(ast::StmtExpr { value, range: _ }) = stmt else {

return None;

};

// Only match strings.

value.as_string_literal_expr()

}

MichaReiser · 2024-07-15T06:59:37Z

+            let escaped_triple_quotes =
+                &bytes[position.saturating_add(1)..position.saturating_add(4)];
+            if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" {
+                return false;
+            }
+


This will panic if what comes after the \ is shorter than 3 characters. I would rewrite this to something like

Suggested change

let escaped_triple_quotes =

&bytes[position.saturating_add(1)..position.saturating_add(4)];

if escaped_triple_quotes == b"\"\"\"" || escaped_triple_quotes == b"\'\'\'" {

return false;

}

let after_first_backslash = &bytes[position + 1..];

let is_escaped_triple = after_first_backslash.starts_with(b"\"\"\"")

|| after_first_backslash.starts_with(b"\'\'\'");

if is_escaped_triple {

return false;

}

MichaReiser · 2024-07-15T07:12:38Z

+            // For the `"\"\"` pattern, each iteration advances by 2 characters.
+            // For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`.


I don't think this assumption is correct and this might actually a bug in the existing implementation. For example, the function passed to any will be called twice for \\, once for each backslash position but the offsets aren't to indices apart.

What I understand is that you want to track if you're at the beginning of an escape sequence.

This is not fully fledged out, but I think we may have to rewrite the entire loop

while let Some(position) = memchr::memchr(b'\\', &bytes[offset..]) { let after_escape = &body[position + 1..]; let next_char_len = after_escape.chars().next().unwrap_or_default(); let Some(escaped_char) = &after_escape.chars().next() else { break; }; if matches!(escaped_char, '"' | '\'') { let is_escaped_triple = after_escape.starts_with("\"\"\"") || after_escape.starts_with("\'\'\'"); if is_escaped_triple { // don't add a diagnostic } if position != 0 && offset == position { // An escape sequence, e.g. `\a\b` } } offset = position + escaped_char.len_utf8(); }

Thank you. This helps a lot!

charliermarsh

Thanks!

[pydocstyle] Escaped docstring in docstring (D301 )

e09ffe4

ukyen8 marked this pull request as ready for review July 4, 2024 21:44

charliermarsh added bug Something isn't working docstring Related to docstring linting or formatting labels Jul 4, 2024

charliermarsh reviewed Jul 4, 2024

View reviewed changes

ukyen8 requested a review from charliermarsh July 10, 2024 23:41

Escaped single-quote docsting within single-quote docstring

0d94f8e

ukyen8 force-pushed the d301-docstring-in-docstring branch from 84ba7f5 to 0d94f8e Compare July 12, 2024 23:00

MichaReiser reviewed Jul 15, 2024

View reviewed changes

Rewrite checking loop to allow escaped triple quotes

c164fbd

charliermarsh approved these changes Jul 18, 2024

View reviewed changes

charliermarsh merged commit 0ba7fc6 into astral-sh:main Jul 18, 2024

charliermarsh mentioned this pull request Jul 18, 2024

escape-sequence-in-docstring (D301) reports escaped docstrings within docstrings #12152

Closed

BrewTestBot mentioned this pull request Jul 20, 2024

ruff 0.5.4 Homebrew/homebrew-core#177920

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pydocstyle] Escaped docstring in docstring (D301 )#12192

[pydocstyle] Escaped docstring in docstring (D301 )#12192
charliermarsh merged 3 commits intoastral-sh:mainfrom
ukyen8:d301-docstring-in-docstring

ukyen8 commented Jul 4, 2024

Uh oh!

charliermarsh Jul 4, 2024

Uh oh!

ukyen8 Jul 5, 2024

Uh oh!

MichaReiser Jul 15, 2024

Uh oh!

MichaReiser Jul 15, 2024

Uh oh!

MichaReiser Jul 15, 2024

Uh oh!

ukyen8 Jul 17, 2024

Uh oh!

charliermarsh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	/// Extract a docstring from a function or class body.
	pub(crate) fn docstring_from(suite: &[Stmt]) -> Option<&ast::ExprStringLiteral> {
	let stmt = suite.first()?;
	// Require the docstring to be a standalone expression.
	let Stmt::Expr(ast::StmtExpr { value, range: _ }) = stmt else {
	return None;
	};
	// Only match strings.
	value.as_string_literal_expr()
	}

		// For the `"\"\"` pattern, each iteration advances by 2 characters.
		// For example, the sequence progresses from `"\"\"` to `"\"` and then to `"`.

Conversation

ukyen8 commented Jul 4, 2024

Summary

Test Plan

Uh oh!

charliermarsh Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

ukyen8 Jul 5, 2024

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

MichaReiser Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

ukyen8 Jul 17, 2024

Choose a reason for hiding this comment

Uh oh!

charliermarsh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants