Skip to content

Make update_worker_fn a blocking task, instead of async#7015

Merged
ffuugoo merged 3 commits intodevfrom
update-worker-spawn-blocking
Aug 27, 2025
Merged

Make update_worker_fn a blocking task, instead of async#7015
ffuugoo merged 3 commits intodevfrom
update-worker-spawn-blocking

Conversation

@ffuugoo
Copy link
Contributor

@ffuugoo ffuugoo commented Aug 11, 2025

This PR tweaks UpdateHandler::update_worker_fn to make it a blocking call instead of async.

update_worker_fn is an async fn, even though it's almost entirely a blocking call (e.g., CollectionUpdater::update call, which makes up 99% of work that update_worker_fn does, is a blocking call).

I don't expect this will have any immediate noticeable effect, but it might improve perf somehow or resolve minor/rare issues, because we've seen a bunch of weird bugs caused by blocking async runtime before, and update is a pretty hot path (even if it's run on its own dedicated runtime).

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you formatted your code locally using cargo +nightly fmt --all command prior to submission?
  3. Have you checked your code using cargo clippy --all --all-features command?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

@ffuugoo ffuugoo force-pushed the update-worker-spawn-blocking branch from 2e6605d to 3c1842e Compare August 11, 2025 13:56
Comment on lines +46 to +71
tokio::task::block_in_place(|| {
// Allow only one update at a time, ensure no data races between segments.
// let _lock = self.update_lock.lock().unwrap();

let operation_result = match operation {
CollectionUpdateOperations::PointOperation(point_operation) => {
process_point_operation(segments, op_num, point_operation, hw_counter)
}
CollectionUpdateOperations::VectorOperation(vector_operation) => {
process_vector_operation(segments, op_num, vector_operation, hw_counter)
}
CollectionUpdateOperations::PayloadOperation(payload_operation) => {
process_payload_operation(segments, op_num, payload_operation, hw_counter)
}
CollectionUpdateOperations::FieldIndexOperation(index_operation) => {
process_field_index_operation(segments, op_num, &index_operation, hw_counter)
}
};
let scroll_lock = segments.read().scroll_read_lock.clone();
let _scroll_lock = scroll_lock.blocking_write();

let operation_result = match operation {
CollectionUpdateOperations::PointOperation(point_operation) => {
process_point_operation(segments, op_num, point_operation, hw_counter)
}
CollectionUpdateOperations::VectorOperation(vector_operation) => {
process_vector_operation(segments, op_num, vector_operation, hw_counter)
}
CollectionUpdateOperations::PayloadOperation(payload_operation) => {
process_payload_operation(segments, op_num, payload_operation, hw_counter)
}
CollectionUpdateOperations::FieldIndexOperation(index_operation) => {
process_field_index_operation(segments, op_num, &index_operation, hw_counter)
}
};

CollectionUpdater::handle_update_result(segments, op_num, &operation_result);
CollectionUpdater::handle_update_result(segments, op_num, &operation_result);

operation_result
operation_result
})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionUpdate::update is blocking, but we call it from async context in a few places. Seems reasonable to wrap the whole update into block_in_place.

block_in_place does have a performance hit when called from async, but

  1. we already use block_in_place inside update, so we already "pay" for this perf hit, we just extend block_in_place scope here, which may actually improve perf when called from async
  2. update_worker_fn (which is the most important update path) is a blocking task now, and other places where we still call update from async are less critical (e.g., LocalShard::load_from_wal)

@ffuugoo ffuugoo requested review from generall and timvisee August 11, 2025 14:24
@ffuugoo ffuugoo marked this pull request as ready for review August 11, 2025 14:24
@coderabbitai

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as spam.

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do a benchmark just to confirm we aren't regressing in any way?

@KShivendu KShivendu self-requested a review August 11, 2025 19:27
@ffuugoo ffuugoo force-pushed the update-worker-spawn-blocking branch from eb37c6c to 05e46fc Compare August 27, 2025 10:13
coderabbitai[bot]

This comment was marked as resolved.

@ffuugoo ffuugoo force-pushed the update-worker-spawn-blocking branch from 05e46fc to 2af6deb Compare August 27, 2025 13:52
coderabbitai[bot]

This comment was marked as resolved.

@ffuugoo
Copy link
Contributor Author

ffuugoo commented Aug 27, 2025

Can we do a benchmark just to confirm we aren't regressing in any way?

I've made a few bfb runs, seems about the same? Similar RPS, similar total upload time.

dev Screenshot 2025-08-27 at 16 27 04
first PR run Screenshot 2025-08-27 at 16 33 56
second PR run Screenshot 2025-08-27 at 16 38 48

Copy link
Member

@timvisee timvisee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Yeah the bench differences are within margin of error. Nice.

@ffuugoo ffuugoo merged commit 1327604 into dev Aug 27, 2025
16 checks passed
@ffuugoo ffuugoo deleted the update-worker-spawn-blocking branch August 27, 2025 15:26
@ffuugoo
Copy link
Contributor Author

ffuugoo commented Aug 27, 2025

Thanks!

Yeah the bench differences are within margin of error. Nice.

I'll also keep an eye on continuous benchmark in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants