Skip to content

Avoid shards holder locks on snapshot#8072

Merged
generall merged 6 commits intodevfrom
ownable-local-shard
Feb 10, 2026
Merged

Avoid shards holder locks on snapshot#8072
generall merged 6 commits intodevfrom
ownable-local-shard

Conversation

@generall
Copy link
Member

@generall generall commented Feb 8, 2026

Fixes: #7489
Unblocks: #7510

[AI-free]

  • refactor shard.create_snapshot to own everything necessary in the future for snapshotting
  • do not lock shards_holder and replica_set during snapshot
  • separate locks and snapshot creation all the way down

) -> CollectionResult<()> {
let local_read = self.local.read().await;

) -> impl Future<Output = CollectionResult<()>> + use<> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the main change

shard_id: ShardId,
temp_dir: &Path,
) -> CollectionResult<SnapshotDescription> {
) -> CollectionResult<impl Future<Output = CollectionResult<SnapshotDescription>> + use<>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the second main change

manifest: Option<SnapshotManifest>,
save_wal: bool,
) -> CollectionResult<()> {
) -> CollectionResult<impl Future<Output = CollectionResult<()>> + use<>> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this as well

manifest: Option<SnapshotManifest>,
save_wal: bool,
) -> CollectionResult<()> {
) -> CollectionResult<impl Future<Output = CollectionResult<()>> + use<>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would + 'static work here? IMO, it's less weird/niche than + use<>, I'd prefer to use 'static if it works.

Suggested change
) -> CollectionResult<impl Future<Output = CollectionResult<()>> + use<>> {
) -> CollectionResult<impl Future<Output = CollectionResult<()>> + 'static> {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would argue that use<> is more precise compare to 'static. We declare that future doesn't capture any lifetimes.

Copy link
Contributor

@ffuugoo ffuugoo Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. Though, I think in this case both are equivalent. But also, why do we care? If we accidentally capture non-static lifetime or smth, we would get a compilation error anyway. IMO, rule of thumb is "use strictly necessary, least restrictive bound".

And also 'static is better recognized than use<>. I'd prefer 'static as long as it works.

Comment on lines +40 to +52
let local_read = self.local.read().await;

let shard_config = ShardConfig::new_replica_set();
shard_config.save_to_tar(tar).await?;
Ok(())
let maybe_local_snapshot_future = if let Some(local) = &*local_read {
Some(
local
.get_snapshot_creator(&temp_path, &tar, format, manifest, save_wal)
.await?,
)
} else {
None
};

drop(local_read);
Copy link
Contributor

@ffuugoo ffuugoo Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙅‍♀️ no explicit drops 🚫

I'd also just call it create_snapshot instead of maybe_local_snapshot_future

Suggested change
let local_read = self.local.read().await;
let shard_config = ShardConfig::new_replica_set();
shard_config.save_to_tar(tar).await?;
Ok(())
let maybe_local_snapshot_future = if let Some(local) = &*local_read {
Some(
local
.get_snapshot_creator(&temp_path, &tar, format, manifest, save_wal)
.await?,
)
} else {
None
};
drop(local_read);
let maybe_local_snapshot_future = if let Some(local) = &*self.local.read().await {
let create_snapshot = local
.get_snapshot_creator(&temp_path, &tar, format, manifest, save_wal)
.await?;
Some(create_snapshot)
} else {
None
};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one is actually not necessary, and it will work the same way without drop. But I prefer to keep the drop to make it explicit.

Copy link
Contributor

@ffuugoo ffuugoo Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let Some(_) = local.read() { ... } is still simpler. 🤷‍♀️

Comment on lines 43 to 52
pub async fn create_snapshot(
&self,
_temp_path: &Path,
_tar: &tar_ext::BuilderExt,
_format: SnapshotFormat,
_manifest: Option<SnapshotManifest>,
_save_wal: bool,
) -> CollectionResult<()> {
self.dummy()
) -> CollectionError {
self.dummy_error()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simpler code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, I see. Then, maybe, remove create_snapshot at all and just do dummy.dummy_error() in that match? This does not make any sense at all.

Copy link
Contributor

@ffuugoo ffuugoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM, I don't see any major issues. Mostly just code style/structuring.

@qdrant qdrant deleted a comment from coderabbitai bot Feb 10, 2026
@generall generall merged commit a610710 into dev Feb 10, 2026
15 checks passed
@generall generall deleted the ownable-local-shard branch February 10, 2026 17:03
Comment on lines 159 to 163
Shard::Dummy(dummy_shard) => {
dummy_shard
return Err(dummy_shard
.create_snapshot(temp_path, tar, format, manifest, save_wal)
.await
.await);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just do return Err(dummy_shard.dummy_error()) and remove DummyShard::create_snapshot method entirely?

timvisee pushed a commit that referenced this pull request Feb 13, 2026
* refactor shard.create_snapshot to own everything necessary in the future for snapshotting

* do not lock shards_holder and replica_set during snapshot

* separate locks and snapshot creation all the way down

* separate locks and snapshot creation even further on the local shard level

* final touches

* minor review fixes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants