Add RemoteDirectory instance to Store as a secondary directory#3285
Conversation
There was a problem hiding this comment.
2 extra constructor methods due to secondaryDirectory. Should we use a builder instead?
There was a problem hiding this comment.
Should we use a builder instead?
Yes, and the builder should validate the arguments and throw a RuntimeException if the REMOTE_STORE feature flag is not set.
There was a problem hiding this comment.
In the latest commit, I have removed extra parameter to Store. So, this is not required anymore.
|
Fixing |
399f1df to
bcfd328
Compare
|
❌ Gradle Check failure 399f1dff56b54dc3010b29e46b523e29c478ed9d |
| Directory remoteDirectory = null; | ||
| if (this.indexSettings.isRemoteStoreEnabled()) { | ||
| try { | ||
| Repository repository = repositoriesService.repository("dragon-stone"); |
There was a problem hiding this comment.
This repository name is hardcoded for now. It would be made configurable as a part of #3286
|
Please refer this comment to understand why we went with having |
| if (secondaryDirectory != null) { | ||
| ByteSizeCachingDirectory sizeCachingSecondaryDir = new ByteSizeCachingDirectory(secondaryDirectory, refreshInterval); | ||
| this.secondaryDirectory = new StoreDirectory(sizeCachingSecondaryDir, Loggers.getLogger("index.store.deletes", shardId)); | ||
| } else { | ||
| this.secondaryDirectory = null; | ||
| } |
There was a problem hiding this comment.
I think we should we have a Directory factory instead instead of always having two directories always. Primary and secondary directory sounds confusing as well
There was a problem hiding this comment.
What is your view on having a separate store instance with RemoteDirectory?
There was a problem hiding this comment.
The presence of branching logic with primary / secondary is confusing, please consider subclassing the Store (fe CachedStore) to have a clean separation between regular Store and Store backed by a cache / secondary.
There was a problem hiding this comment.
As I think more on the comments from @Bukhtawar and @eirsep, we may need to have a CompositeDirectory which will have 2 directories (we can create compositions for multiple directories if required). CompositeDirectory will be abstract and its implementations can define how to handle operations on the encapsulated directories. Implementation can be parallel or sequential.
There was a problem hiding this comment.
Going forward I do feel a need to have an abstraction at a Store level, espl with #2900 where we might extend search functionality directly on a remote store. Then we have to perform StoreRecovery operations when the shard data is lost, integrity, checksums which is what the store abstraction provides. How about a CompositeStore with each Store based out of their corresponding Directory implementation and IndexShard interfacing with this CompositeStore
There was a problem hiding this comment.
As I think more, going either with CompositeStore or CompositeDirectory would be too early. I was able to get a quick implementation out with CompositeDirectory but it would require most of the RemoteDirectory methods to be implemented before we can actually start using it. Using these abstractions will not avoid having a reconciliation mechanism between local and remote directory. I made some changes in the latest commit and Store is not changed in the latest change. Please take a look.
| remoteDirectory = remoteDirectoryFactory.newDirectory(this.indexSettings, path, repository); | ||
| } catch (RepositoryMissingException e) { | ||
| throw new IllegalArgumentException( | ||
| "Repository should be created before creating index with remote_store enabled setting", |
There was a problem hiding this comment.
Is repository created by user or by system?
There was a problem hiding this comment.
As of now, it is created by user. Once we make the name configurable as a part of #3286, we can check if the repository can be created during bootstrap.
There was a problem hiding this comment.
So, you're saying as of now it will have to be created by user but he can only name it as "dragon-stone"? That doesn't seem correct right?
|
|
||
| private final AtomicBoolean isClosed = new AtomicBoolean(false); | ||
| private final StoreDirectory directory; | ||
| private final StoreDirectory secondaryDirectory; |
There was a problem hiding this comment.
can store not have to worry about how many directories are there and just have one StoreDirectory which is extended by a Remote+Local store Directory wrapper?
There was a problem hiding this comment.
Is secondaryDirectory just a cache? It looks like it, so why not to call it as such?
There was a problem hiding this comment.
Please check the latest changes. I have removed reference to secondaryDirectory from the Store.
There was a problem hiding this comment.
This needs to incorporate the REMOTE_STORE feature flag and throw a RuntimeException if the system is trying to bootstrap the remote directories with the feature flag disabled. Bypassing the feature flag here is trappy.
There was a problem hiding this comment.
Should we use a builder instead?
Yes, and the builder should validate the arguments and throw a RuntimeException if the REMOTE_STORE feature flag is not set.
| import java.util.Collections; | ||
|
|
||
| import static org.mockito.ArgumentMatchers.any; | ||
| import static org.mockito.Mockito.*; |
There was a problem hiding this comment.
Replaced with individual imports. I will check if we can add this check to spotless
ce5f6cd to
d13a660
Compare
Yes, while creating instance of RemoteDirectory, I am checking if remote_store setting is enabled or not. As the setting itself is not active if feature flag is not set, we don't need explicit feature flag check, right? |
…hListener Signed-off-by: Sachin Kale <kalsac@amazon.com>
d13a660 to
207941d
Compare
|
|
||
| @Override | ||
| public void afterRefresh(boolean didRefresh) throws IOException { | ||
| // ToDo Add implementation |
There was a problem hiding this comment.
Actual implementation will be added in the next commit
|
✅ Gradle Check success d13a660cf4f666d36f222c3b54f2b2f00f5b953b |
|
|
|
start gradle check |
| public class RemoteDirectoryFactory implements IndexStorePlugin.RemoteDirectoryFactory { | ||
|
|
||
| @Override | ||
| public Directory newDirectory(IndexSettings indexSettings, ShardPath path, Repository repository) throws IOException { | ||
| assert repository instanceof BlobStoreRepository : "repository should be instance of BlobStoreRepository"; | ||
| BlobPath blobPath = new BlobPath(); | ||
| blobPath = blobPath.add(indexSettings.getIndex().getName()).add(String.valueOf(path.getShardId().getId())); | ||
| BlobContainer blobContainer = ((BlobStoreRepository) repository).blobStore().blobContainer(blobPath); | ||
| return new RemoteDirectory(blobContainer); | ||
| } | ||
| } |
There was a problem hiding this comment.
Is the factory supposed to vend out another variation based on the inputs? If not do we need a factory?
There was a problem hiding this comment.
Yes, there will be a directory instance per shard as path will be different per directory.
There was a problem hiding this comment.
Wondering if that qualifies to be a factory f.e look at FsDirectoryFactory which creates different a HybridDirectory or NIOFSDirectory based on certain attributes
There was a problem hiding this comment.
Got it. You mean, if it is only one variant each time, we do not need a factory, right?
I don't have very strong inclination towards having a factory. It is mostly to keep things similar to what we have with FS based factories. It hides the complexity of understanding all the metadata and creating BlobContainer. Without factory, the code would be in the constructor of RemoteDirectory.
There was a problem hiding this comment.
Can we call it a Builder instead?
| final List<ReferenceManager.RefreshListener> internalRefreshListener; | ||
| if (remoteStoreRefreshListener != null && shardRouting.primary()) { | ||
| internalRefreshListener = Arrays.asList(new RefreshMetricUpdater(refreshMetric), remoteStoreRefreshListener); | ||
| } else { | ||
| internalRefreshListener = Collections.singletonList(new RefreshMetricUpdater(refreshMetric)); | ||
| } |
There was a problem hiding this comment.
Lets push this down to IndexShard and pass a NoopRemoteStoreRefreshListener as needed?
There was a problem hiding this comment.
I did not get this. You mean push it down to EngineConfig?
There was a problem hiding this comment.
Sorry I meant IndexServices that create IndexShard
There was a problem hiding this comment.
is this to avoid null check? As RemoteStoreRefreshListener is injected into IndexShard, tests can have the NoopRemoteStoreRefreshListener if does not want to actually upload the data.
There was a problem hiding this comment.
I am fine with what we have now, but if the checks grow at multiple place we might want to give this a revisit
There was a problem hiding this comment.
This is identical to the listener for segrep - here. What about building a list of additional listeners in IndexService and pass directly to IndexShard?
There was a problem hiding this comment.
@Bukhtawar Agree, adding a task for the same.
@mch2 Yes, once we merge with SegRep code, we can abstract it out.
4408e6b
into
opensearch-project:feature/durability-enhancements
…hListener (opensearch-project#3285) Co-authored-by: Sachin Kale <kalsac@amazon.com> Signed-off-by: Sachin Kale <kalsac@amazon.com>
Signed-off-by: Sachin Kale kalsac@amazon.com
Description
primaryDirectoryandsecondaryDirectory. The idea is to encapsulate RemoteDirectory in Store instead of creating a separate instance of Store, remoteStore.Issues Resolved
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.