Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Backend: add line index#63726

Merged
camdencheek merged 4 commits into
mainfrom
cc/line-index
Jul 9, 2024
Merged

Backend: add line index#63726
camdencheek merged 4 commits into
mainfrom
cc/line-index

Conversation

@camdencheek

@camdencheek camdencheek commented Jul 9, 2024

Copy link
Copy Markdown
Member

This adds a line index utility. Frequently, I want to be able to efficiently index a file to extract a specific line or range of lines, but it's surprisingly tricky to get exactly right given weird definitions of "what even is a line" and edge conditions around out-of-bounds and such.

So this adds a general-purpose utility to pre-calculate the locations of lines in the file, making extracting a line range a zero-allocation, O(1) operation.

Not implemented: the same index can also be used to find the line that contains an offset, which I've also needed to do before. But I'll save that for when I actually have an immediate use for it.

Test plan

Added quickcheck tests, fuzz tests (particularly useful for finding out-of-bounds conditions), and a few manually-defined tests.

@cla-bot cla-bot Bot added the cla-signed label Jul 9, 2024
@camdencheek camdencheek changed the title General: add line index Backend: add line index Jul 9, 2024
@camdencheek camdencheek marked this pull request as ready for review July 9, 2024 18:01
@camdencheek camdencheek requested a review from a team July 9, 2024 18:01
Comment on lines +16 to +22
var newlineCount int
switch v := any(content).(type) {
case string:
newlineCount = strings.Count(v, "\n")
case []byte:
newlineCount = bytes.Count(v, []byte("\n"))
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type switching is pretty awkward, but I think the single entrypoint to the package is worth it. Willing to be convinced otherwise though. The alternative is probably a NewLineIndex and NewStringLineIndex separation, similar to the regexp package.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to get away with only the last line here since both can be converted to []byte which I believe strings.Count does under the hood anyways: https://go.dev/play/p/M7XJQv6YC7F

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually tried that! But something about the generic seems to break the "don't allocate a new []byte" optimization, so it was allocating on every call when passing in a string (that's why the benchmark has two parts).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! can we add a comment about that? 😇

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, and also added a test asserting that there is only a single allocation (TIL about testing.AllocsPerRun!)

"lineindex_test.go",
"linereader_test.go",
],
embed = [":byteutils"],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can someone explain to me why this switched from deps to embed? :D

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea 🤷

Comment on lines +16 to +22
var newlineCount int
switch v := any(content).(type) {
case string:
newlineCount = strings.Count(v, "\n")
case []byte:
newlineCount = bytes.Count(v, []byte("\n"))
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should be able to get away with only the last line here since both can be converted to []byte which I believe strings.Count does under the hood anyways: https://go.dev/play/p/M7XJQv6YC7F

Comment thread internal/byteutils/lineindex.go Outdated
//
// line numbers are 0-indexed, and the returned range includes the terminating
// newline (if it exists).
func (l LineIndex) Lines(startLine, endLine int) (int, int) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if a better name would be Range since that is what it returns, not the lines itself but no blocker :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to LineRange and LinesRange since LineRange also returns a range 👍

@camdencheek camdencheek enabled auto-merge (squash) July 9, 2024 19:51
@camdencheek camdencheek merged commit 5d8286b into main Jul 9, 2024
@camdencheek camdencheek deleted the cc/line-index branch July 9, 2024 19:59
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants