-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Optimised prefix pattern per shard for remote store data and metadata files for higher throughput #12567
Description
Is your feature request related to a problem? Please describe
With remote store feature, we upload 2 kinds of data to remote store - data and metadata against both translog and metadata. We have #5854 for allowing buffering of requests before uploading it after every 650ms (default value). This works well in steady state. However, I have faced issue where I am running performance test with single index and higher number of shards.
The current path structure looks like this ->
!__Base path
!__Index path - <Base_path>/<index_uuid>/
!__Shard path - <Index_path>/<shard_id>
!__Segment path - <Shard_path>/segments
!__files path - <Segment_path>/data
|__segments_<N>__<file-gen>
|__<N>.si__<file-gen>
|__<N>.cfe__<file-gen>
|__<N>.cfs__<file-gen>
!__metadata path - <Segment_path>/metadata
|__metadata_path_<file-gen>_version
!__lock path - <Segment_path>/lock_files
!__metadata_path_<file-gen>_version.v2_lock
!__Translog path - <Shard_path>/translog
!__data path - <Translog_path>/data
|__primary-term
|__translog-<gen>.tlog
|__translog-<gen>.ckp
!__metadata path - <Translog_path>/metadata
|__metadata_prefix_gen_version
If we notice, the physical layout and logical layout of data is same. This structure allows some limits on number of GETs, PUTs, DELETEs, LISTs. However, the limits becomes bottleneck when there are too many shards for an index.
Describe the solution you'd like
A prefix pattern that is accepted by multiple repository providers like AWS S3, GCP storage. The general recommendation by the providers is to maximise the spread of data across as many prefixes as possible. This allows them to scale better.
So, the proposed prefix pattern is ->
hash(data,index_uuid,sharid,translog|segment)/<Base-path>/<index-uuid>/<shardid> → data
hash(md,index_uuid,sharid,translog|segment)<Base-path>/<index-uuid>/<shardid> → metadata
With above prefix pattern, we ensure that the prefixes are as random but predictable. For the combination of translog-data, translog-metadata, segment-data, segment-metadata, the path would be fixed and will remain same throughout it's life.
We can also see this referred by multiple cloud providers below -
- GCP Storage - https://cloud.google.com/storage/docs/request-rate#ramp-up
- AWS S3 - https://repost.aws/knowledge-center/http-5xx-errors-s3
Related component
Storage:Performance
Describe alternatives you've considered
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
Type
Projects
Status