Backend: add line index#63726
Conversation
0a2f560 to
bbcecde
Compare
| var newlineCount int | ||
| switch v := any(content).(type) { | ||
| case string: | ||
| newlineCount = strings.Count(v, "\n") | ||
| case []byte: | ||
| newlineCount = bytes.Count(v, []byte("\n")) | ||
| } |
There was a problem hiding this comment.
The type switching is pretty awkward, but I think the single entrypoint to the package is worth it. Willing to be convinced otherwise though. The alternative is probably a NewLineIndex and NewStringLineIndex separation, similar to the regexp package.
There was a problem hiding this comment.
I think you should be able to get away with only the last line here since both can be converted to []byte which I believe strings.Count does under the hood anyways: https://go.dev/play/p/M7XJQv6YC7F
There was a problem hiding this comment.
I actually tried that! But something about the generic seems to break the "don't allocate a new []byte" optimization, so it was allocating on every call when passing in a string (that's why the benchmark has two parts).
There was a problem hiding this comment.
I see! can we add a comment about that? 😇
There was a problem hiding this comment.
Added a comment, and also added a test asserting that there is only a single allocation (TIL about testing.AllocsPerRun!)
| "lineindex_test.go", | ||
| "linereader_test.go", | ||
| ], | ||
| embed = [":byteutils"], |
There was a problem hiding this comment.
can someone explain to me why this switched from deps to embed? :D
| var newlineCount int | ||
| switch v := any(content).(type) { | ||
| case string: | ||
| newlineCount = strings.Count(v, "\n") | ||
| case []byte: | ||
| newlineCount = bytes.Count(v, []byte("\n")) | ||
| } |
There was a problem hiding this comment.
I think you should be able to get away with only the last line here since both can be converted to []byte which I believe strings.Count does under the hood anyways: https://go.dev/play/p/M7XJQv6YC7F
| // | ||
| // line numbers are 0-indexed, and the returned range includes the terminating | ||
| // newline (if it exists). | ||
| func (l LineIndex) Lines(startLine, endLine int) (int, int) { |
There was a problem hiding this comment.
Wonder if a better name would be Range since that is what it returns, not the lines itself but no blocker :)
There was a problem hiding this comment.
Renamed to LineRange and LinesRange since LineRange also returns a range 👍
This adds a line index utility. Frequently, I want to be able to efficiently index a file to extract a specific line or range of lines, but it's surprisingly tricky to get exactly right given weird definitions of "what even is a line" and edge conditions around out-of-bounds and such.
So this adds a general-purpose utility to pre-calculate the locations of lines in the file, making extracting a line range a zero-allocation,
O(1)operation.Not implemented: the same index can also be used to find the line that contains an offset, which I've also needed to do before. But I'll save that for when I actually have an immediate use for it.
Test plan
Added quickcheck tests, fuzz tests (particularly useful for finding out-of-bounds conditions), and a few manually-defined tests.