Handling of Windows (and legacy Mac) line endings#47
Conversation
See PEP 278: Universal Newline Support, but in particular want to support Linux/macOS with LF only, and Windows with CR LF.
Streamline slightly
stephenfin
left a comment
There was a problem hiding this comment.
One nit but this otherwise looks sane to me. If my comment is correct, a follow-up change to remove the __init__ would be appreciated 😄
doc8/checks.py
Outdated
| if "\r" in line: | ||
| yield ("D004", "Found literal carriage return") | ||
| def __init__(self, cfg): | ||
| super(CheckCarriageReturn, self).__init__(cfg) |
There was a problem hiding this comment.
This __init__ is unnecessary given that it's simply calling the superclass implementation, right?
There was a problem hiding this comment.
Possibly. I just followed the pattern of CheckNewlineEndOfFile which has the same __init__. I'll check this.
There was a problem hiding this comment.
I'd removed the __init__ as suggested.
| self._lines = list(fh) | ||
| fh.seek(0) | ||
| self._raw_content = fh.read() | ||
| self._lines = self._raw_content.splitlines(True) |
There was a problem hiding this comment.
Just that it was a good improvement
There was a problem hiding this comment.
Thanks, I'm familiar with +1 in that context, this was new to me.
peterjc
left a comment
There was a problem hiding this comment.
I'd asked earlier why change to subclass CheckCarriageReturn(ContentCheck): and understand that this is essential to get access to the lines including the trailing assorted new line characters.
ssbarnea
left a comment
There was a problem hiding this comment.
I think we should not process files as binary and rely on python to deal with newlines. Anything we do at low level would only create a maintenance burden.
| @@ -80,17 +80,20 @@ def _read(self): | |||
| with self._read_lock: | |||
| if not self._has_read: | |||
| with open(self.filename, "rb") as fh: | |||
There was a problem hiding this comment.
Why we are opening text files as binary? Opening as text should allow us to use universal newlines support in python.
There was a problem hiding this comment.
I'm sure Markus will correct me if I am wrong, but wouldn't that disable D004 (Literal carriage return)?
There was a problem hiding this comment.
Yes, otherwise you will lose the original line endings (they are converted to \n) and D004 becomes obsolete. Would be fine for those who have filed the issue, but probably not for you?
ssbarnea
left a comment
There was a problem hiding this comment.
If we are to support various newlines, we clearly need a test that assures no regression happens.
|
@ssbarnea could you expand on what further testing you had in mind (on top of the new tests already included)? |
@ssbarnea Maybe you have missed it, but I included already some tests (added cases to existing tests). |
|
🎉 |
This PR addresses issue #33.
Files with Windows line endings (CRLF, \r\n) raise
D004(Literal carriage return) errors (which is OK), but alsoD002(trailing whitespace) and they can also riseD001(line too long).With this PR
This allows users that have Windows line endings to turn off detection of
D004and usedoc8to check their text files.Rationale:
The standard of a
gitinstallation on Windows is that line endings in text files are converted to CRLF in the local copy. There are ways, local and repository wide, to prevent that the committed files will contain CRLF (they are back-converted to LF during the commit). Unfortunately, style checkers likedoc8check the local files, and these still contain the CRLF. This is also true when these style checkers are used within a pre-commit hook or with the softwarepre-commit.Therefore, it is desirable for Windows users to have a way to check their files without manually converting their line endings. The proposed PR does this (Windows users can turn off the detection of
D004) but still respects the wish of thedoc8developers to flag non-Linux line endings in the standard settings.This PR has manually been tested with an
.rstfile which was converted to different line endings (Linux/Window/old Mac). In case of Windows and legacy Mac line endings, each line was flagged with D004 but not other error was raised. Using--ignore D004let them pass without any error.