Optimised prefix pattern per shard for remote store data and metadata files for higher throughput

### Is your feature request related to a problem? Please describe

With remote store feature, we upload 2 kinds of data to remote store - data and metadata against both translog and metadata. We have https://github.com/opensearch-project/OpenSearch/pull/5854 for allowing buffering of requests before uploading it after every 650ms (default value). This works well in steady state. However, I have faced issue where I am running performance test with single index and higher number of shards. 

The current path structure looks like this ->
```
!__Base path
    !__Index path - <Base_path>/<index_uuid>/
        !__Shard path - <Index_path>/<shard_id>
            !__Segment path - <Shard_path>/segments
                !__files path - <Segment_path>/data
                    |__segments_<N>__<file-gen>
                    |__<N>.si__<file-gen>
                    |__<N>.cfe__<file-gen>
                    |__<N>.cfs__<file-gen>
                !__metadata path - <Segment_path>/metadata
                    |__metadata_path_<file-gen>_version
                !__lock path - <Segment_path>/lock_files
                    !__metadata_path_<file-gen>_version.v2_lock

            !__Translog path - <Shard_path>/translog
                !__data path - <Translog_path>/data
                    |__primary-term
                        |__translog-<gen>.tlog
                        |__translog-<gen>.ckp
                !__metadata path - <Translog_path>/metadata
                    |__metadata_prefix_gen_version
```

If we notice, the physical layout and logical layout of data is same. This structure allows some limits on number of GETs, PUTs, DELETEs, LISTs. However, the limits becomes bottleneck when there are too many shards for an index.

### Describe the solution you'd like

A prefix pattern that is accepted by multiple repository providers like AWS S3, GCP storage. The general recommendation by the providers is to maximise the spread of data across as many prefixes as possible. This allows them to scale better. 

So, the proposed prefix pattern is ->
```
hash(data,index_uuid,sharid,translog|segment)/<Base-path>/<index-uuid>/<shardid>  → data
hash(md,index_uuid,sharid,translog|segment)<Base-path>/<index-uuid>/<shardid>  → metadata
```
With above prefix pattern, we ensure that the prefixes are as random but predictable. For the combination of translog-data, translog-metadata, segment-data, segment-metadata, the path would be fixed and will remain same throughout it's life.

We can also see this referred by multiple cloud providers below -
1. GCP Storage - https://cloud.google.com/storage/docs/request-rate#ramp-up
2. AWS S3 - https://repost.aws/knowledge-center/http-5xx-errors-s3

### Related component

Storage:Performance

### Describe alternatives you've considered

_No response_

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimised prefix pattern per shard for remote store data and metadata files for higher throughput #12567

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimised prefix pattern per shard for remote store data and metadata files for higher throughput #12567

Description

Is your feature request related to a problem? Please describe

Describe the solution you'd like

Related component

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions