Currently, a snapshot of a shard that has not changed at all relative to an existing snapshot of the shard (i.e. does not require uploading any files for that shard) still triggers the following operations:
- Write
snap-${uuid}.dat blob to the unchanged shard's folder in the repository
- Write new
index-N blob to the unchanged shard's folder in the repository
In practice the effect of this is significant for use cases like rolling indices per day/hour/etc. A cluster that contains a small and bounded number of indices/shards that are actively written to and a large and growing number of shards that are constant in time will over time see ever more expensive and slower snapshots even though the amount of data added by each snapshot is not increasing.
This could be avoided by referencing the content of snap-${uuid}.dat in each shard differently. Instead of creating a blob per snapshot+shard tuple, a certain state of a shard could be described by what is currently a snap-${uuid}.dat and then itself be referenced from the root level index-N in the repository.
Currently, a snapshot of a shard that has not changed at all relative to an existing snapshot of the shard (i.e. does not require uploading any files for that shard) still triggers the following operations:
snap-${uuid}.datblob to the unchanged shard's folder in the repositoryindex-Nblob to the unchanged shard's folder in the repositoryIn practice the effect of this is significant for use cases like rolling indices per day/hour/etc. A cluster that contains a small and bounded number of indices/shards that are actively written to and a large and growing number of shards that are constant in time will over time see ever more expensive and slower snapshots even though the amount of data added by each snapshot is not increasing.
This could be avoided by referencing the content of
snap-${uuid}.datin each shard differently. Instead of creating a blob per snapshot+shard tuple, a certain state of a shard could be described by what is currently asnap-${uuid}.datand then itself be referenced from the root levelindex-Nin the repository.