shards: only trigger rescan on .zoekt files changing#801
Merged
Conversation
Any write to the index dir triggered a scan. This means on busy instances we are constantly rescanning, leading to an over-representation in CPU profiles around watch. The events are normally writes to our temporary files. By only considering events for .zoekt files (which is what scan reads) we can avoid the constant scan calls. Just in case we also introduce a re-scan every minute in case we miss an event. There is error handling around this, but I thought it is just more reliable to call scan every once in a while. Note: this doesn't represent significant CPU use, but it does muddy the CPU profiler output. So this makes it easier to understand trends in our continuous cpu profiling. Test Plan: CI
17274a6 to
0a3142e
Compare
stefanhengl
approved these changes
Aug 2, 2024
| } | ||
| } | ||
|
|
||
| ticker := time.NewTicker(time.Minute) |
Member
There was a problem hiding this comment.
That seems fairly frequent for a fail-safe? Isn't this roughly in the order of magnitude we scanned before this PR? I might be totally off though ;-)
Member
Author
There was a problem hiding this comment.
We are always scanning right now on dotcom. IE as soon as one scan is done another starts. The scanning doesn't take long (~50ms?) but effectively we do this for { scan() }. So once a minute seems good?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Any write to the index dir triggered a scan. This means on busy instances we are constantly rescanning, leading to an over-representation in CPU profiles around watch. The events are normally writes to our temporary files. By only considering events for .zoekt files (which is what scan reads) we can avoid the constant scan calls.
Just in case we also introduce a re-scan every minute in case we miss an event. There is error handling around this, but I thought it is just more reliable to call scan every once in a while.
Note: this doesn't represent significant CPU use, but it does muddy the CPU profiler output. So this makes it easier to understand trends in our continuous cpu profiling.
Test Plan: CI