-
Notifications
You must be signed in to change notification settings - Fork 19
Best effort chunk coalescing implementation #515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e286152 to
558c694
Compare
558c694 to
c480358
Compare
| path.push(cache_key.as_filename()); | ||
|
|
||
| // Should this happen normally? | ||
| // TODO: when does this happen? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess when:
thread 1 evicts the item from cache, notifies the eviction listener (but that doesn't delete the file yet)
thread 2 tries to get the item, cache miss, downloads it, tries to write to the file
I wish there were some kind of moka semantics where the cache entry stays there as long as and only as long as the eviction listener hasn't finished :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed with both comments.
| warn!( | ||
| "Re-downloading cache value for {key:?}: {err}" | ||
| ); | ||
| let data = self | ||
| .inner | ||
| .get_range(location, chunk_range.clone()) | ||
| .await?; | ||
| self.cache_chunk_data(key, data.clone()).await; | ||
| Some(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth adding a comment here to clarify the race condition this mitigates against:
thread 1 gets a value from cache, it's File(some-path) (self.cache.get on line 254)
thread 2 runs eviction, deletes the value and the file
thread 1 tries to read some-path, it's gone now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a shame there isn't some get-with Moka functionality that takes in a closure that prevents the value from getting evicted while it's running (dual to the comment above)
This is a naive implementation of #514 (and therefore less complex), where we don't use synchronization to avoid request de-duplucation, but instead try do a best-effort chunk coalescing, and batch them into a single call.
Compared to current main, this implementation seems to be more stable, and doesn't experience the hanging in TPC-H q18 and q21, since it does not use any synchronization primitives.
As for perf, in the (idealized) no latency case it is on par with the current caching object store. However in more realistic scenarios of modest delays (15ms) it consistently outperforms it:
