Skip to content

Optimised prefix pattern per shard for remote store data and metadata files for higher throughput #12567

@ashking94

Description

@ashking94

Is your feature request related to a problem? Please describe

With remote store feature, we upload 2 kinds of data to remote store - data and metadata against both translog and metadata. We have #5854 for allowing buffering of requests before uploading it after every 650ms (default value). This works well in steady state. However, I have faced issue where I am running performance test with single index and higher number of shards.

The current path structure looks like this ->

!__Base path
    !__Index path - <Base_path>/<index_uuid>/
        !__Shard path - <Index_path>/<shard_id>
            !__Segment path - <Shard_path>/segments
                !__files path - <Segment_path>/data
                    |__segments_<N>__<file-gen>
                    |__<N>.si__<file-gen>
                    |__<N>.cfe__<file-gen>
                    |__<N>.cfs__<file-gen>
                !__metadata path - <Segment_path>/metadata
                    |__metadata_path_<file-gen>_version
                !__lock path - <Segment_path>/lock_files
                    !__metadata_path_<file-gen>_version.v2_lock

            !__Translog path - <Shard_path>/translog
                !__data path - <Translog_path>/data
                    |__primary-term
                        |__translog-<gen>.tlog
                        |__translog-<gen>.ckp
                !__metadata path - <Translog_path>/metadata
                    |__metadata_prefix_gen_version

If we notice, the physical layout and logical layout of data is same. This structure allows some limits on number of GETs, PUTs, DELETEs, LISTs. However, the limits becomes bottleneck when there are too many shards for an index.

Describe the solution you'd like

A prefix pattern that is accepted by multiple repository providers like AWS S3, GCP storage. The general recommendation by the providers is to maximise the spread of data across as many prefixes as possible. This allows them to scale better.

So, the proposed prefix pattern is ->

hash(data,index_uuid,sharid,translog|segment)/<Base-path>/<index-uuid>/<shardid>  → data
hash(md,index_uuid,sharid,translog|segment)<Base-path>/<index-uuid>/<shardid>  → metadata

With above prefix pattern, we ensure that the prefixes are as random but predictable. For the combination of translog-data, translog-metadata, segment-data, segment-metadata, the path would be fixed and will remain same throughout it's life.

We can also see this referred by multiple cloud providers below -

  1. GCP Storage - https://cloud.google.com/storage/docs/request-rate#ramp-up
  2. AWS S3 - https://repost.aws/knowledge-center/http-5xx-errors-s3

Related component

Storage:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

✅ Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions