Skip to content

Fail early when encountering out-of-storage during optimization#4578

Merged
generall merged 6 commits intoqdrant:test/low-disk-indexingfrom
xhjkl:ood
Jun 29, 2024
Merged

Fail early when encountering out-of-storage during optimization#4578
generall merged 6 commits intoqdrant:test/low-disk-indexingfrom
xhjkl:ood

Conversation

@xhjkl
Copy link
Copy Markdown
Contributor

@xhjkl xhjkl commented Jun 27, 2024

/claim #4297

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

What this does is checks if there is twice as much free space on the volume as what the collection being optimized already takes.

I'm not happy with this, and I'm not sure this submission worth the hassle as it is, but I'd like to compare opinions.

Why it's bad

  • The free space checking works synchronously; even though the optimizations themselves take up background threads, I feel uneasy about it.
  • The checking happens in the moment the optimization is about to start, not well beforehand.
  • We still might run into OOD down the line because FS is non-atomic.
  • It still logs the error, perhaps we want to special-case it as we do with CollectionError::Cancelled.

What else have I tried

  • Capture write error from db.put_cf_opt; it first happens somewhere around lib/segment/src/segment_constructor/segment_builder.rs:197, but I could not find a way to cleanly rollback a partially done optimization.
  • Utilize the existing DiskUsageWatcher to do the checking before the optimization starts, but it does not have a dynamic threshold, so far as I understand it, and overall it gets a larger change that doing how it's done now.
  • Add a method check_environment to the trait SegmentOptimizer in the same spirit as the existing SegmentOptimizer::check_condition so that all the implementors could tailor it to their taste, but for now this approach adds no value.

@algora-pbc
Copy link
Copy Markdown

algora-pbc bot commented Jun 27, 2024

💵 To receive payouts, sign up on Algora, link your Github account and connect with Stripe.

@xhjkl xhjkl changed the base branch from dev to test/low-disk-indexing June 28, 2024 17:07
@xhjkl
Copy link
Copy Markdown
Contributor Author

xhjkl commented Jun 28, 2024

@generall, @timvisee, I've rebased my changes upon the branch with the test — the test seems to pass now.
Also, I dropped all the styling changes so that the diff is minimal now; hope this helps.

@generall generall self-requested a review June 29, 2024 10:52
@generall
Copy link
Copy Markdown
Member

The free space checking works synchronously; even though the optimizations themselves take up background threads, I feel uneasy about it.

I don't see problem with it. Async FS is fake anyway.

We still might run into OOD down the line because FS is non-atomic.

There are no guarantees about it indeed, some external process can occupy the disk and we can do nothing about it

@xhjkl
Copy link
Copy Markdown
Contributor Author

xhjkl commented Jun 29, 2024

I don't see problem with it. Async FS is fake anyway.

Understood. Thank you for commenting on that 🙏

There are no guarantees about it indeed, some external process can occupy the disk and we can do nothing about it

Yeah, that's my thoughts precisely: that's why I made the needed space as two times the space already taken — just to be doubly sure there's no jitter.

@generall generall merged commit 1d0080f into qdrant:test/low-disk-indexing Jun 29, 2024
@xhjkl xhjkl deleted the ood branch June 29, 2024 11:12
generall added a commit that referenced this pull request Jun 29, 2024
* Add test for OOD during indexing

* Start indexing right after 1st OOD message

* Only send search request after insert loop

* Fail early when encountering out-of-storage during optimization (#4578)

* fs4@0.8.4

* fail early on low storage

* move `dir_size` to `common`

* move the ood bailout to `SegmentOptimizer::optimized_segment_builder`

* drop dead code

* move dir_size to common subcrate

---------

Co-authored-by: generall <andrey@vasnetsov.com>

---------

Co-authored-by: xhjkl <xhjkl@users.noreply.github.com>
Co-authored-by: generall <andrey@vasnetsov.com>
generall added a commit that referenced this pull request Jun 29, 2024
* Add test for OOD during indexing

* Start indexing right after 1st OOD message

* Only send search request after insert loop

* Fail early when encountering out-of-storage during optimization (#4578)

* fs4@0.8.4

* fail early on low storage

* move `dir_size` to `common`

* move the ood bailout to `SegmentOptimizer::optimized_segment_builder`

* drop dead code

* move dir_size to common subcrate

---------

Co-authored-by: generall <andrey@vasnetsov.com>

---------

Co-authored-by: xhjkl <xhjkl@users.noreply.github.com>
Co-authored-by: generall <andrey@vasnetsov.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants