[Feature Proposal] Enhancement of Repository Plugin

The purpose of this issue is to gather community feedback on proposal of enhancing the repository plugin.

OpenSearch repository plugin, today, provides transfer capabilities via streams to remote store. This plugin allows user to store the index data in off-cluster external repositories such as Amazon S3, Google Cloud Storage, or a shared filesystem, in addition to the local disk of the OpenSearch cluster. By using the repository plugin, OpenSearch users can take advantage of Snapshot feature to backup and restore to protect against data loss, enable disaster recovery, and create replicas of their data for testing and development purposes. With [remote-backed storage](https://opensearch.org/docs/latest/tuning-your-cluster/availability-and-recovery/remote/), the user now has the ability to  protect against data loss by automatically creating continuous backups of all index transactions and sending them to remote storage. OpenSearch users can achieve request level durability using remote-backed storage.


# Problem Statement

OpenSearch repository plugin today provides interfaces such as [writeBlob](https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/common/blobstore/BlobContainer.java#L125) interface to facilitate transfer of a file by using a single InputStream. This means that the file referenced by InputStream needs to be processed serially. This restricts the capabilities of underlying plugin to serially transfer buffered content of a file and only after successful processing of first buffer, subsequent buffer of content is read and transferred. Parallel processing of multiple parts of a file is therefore, not possible. Use cases such as download or upload of multiple parts of a file in parallel cannot be supported due to this.

S3 repository plugin, for instance, provides support for [multi-part upload](https://github.com/opensearch-project/OpenSearch/blob/main/plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.java#L140-L150) but due to single InputStream restriction of base plugin, [upload of each part happens serially](https://github.com/opensearch-project/OpenSearch/blob/main/plugins/repository-s3/src/main/java/org/opensearch/repositories/s3/S3BlobContainer.java#L424-L444) even though S3 provides support for parallel upload of individual parts of a file.


# Proposed Solution

We propose to enhance existing repository plugin to extend support for mutliple stream suppliers for underlying vendor plugins to be able to optionally provide multi-stream based implementations for remote transfers. Provisioning multiple streams instead of abstracting with file based transfer would provide control in core Opensearch code to pre-process buffered content with multiple stream wrappers after content is read and before transfer can take place. Stream suppliers instead of concrete streams can further facilitate delegation of stream creation till remote transfer is started. Following can be some of the abstractions we propose to provide the required support :

1. To check if upload blob is supported.
2. Upload blob supplied with upload context which can consist of ordered collection of stream suppliers along with metadata of each stream like length, headers, etc. 
3. To check if download blob is supported.
4. Download blob supplied with download context which can consist of ordered collection of stream appliers to be applied on top of sdk input stream before persisting data on disk. Each applier can have metadata needed for applying, associated with it. 

Credits - @vikasvb90, @ashking94 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Proposal] Enhancement of Repository Plugin #6354

Problem Statement

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Proposal] Enhancement of Repository Plugin #6354

Description

Problem Statement

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions