-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Shadow Replica indexes do not delete properly #17695
Description
This issue is to document deletion problems with shadow replica indices that were found while working on #17265. A separate PR #17638 that improves the naming of methods in the IndicesService also contains tests or added assertions to existing tests that reveal the issues below and must be enabled as part of any PR that fixes the issues.
No. 1
The index file deletion logic that is triggered in IndicesService#deleteIndexStore(String reason, Index index, IndexSettings indexSettings checks before deleting files to see if the index is not a shadow replica, or if it is, ensure that it has been closed before (so that no other nodes are holding resources to it). An issue with this is that it is too strict of a check, so that if a shadow replica index is deleted, if it was not previously closed, the index folder itself is not deleted and remains on the file system (an empty folder). So one of the issues that needs fixing is to ensure index directories are deleted even on shadow replica index deletes. The following tests have commented out assertions to test this behavior once fixed:
IndexWithShadowReplicaIT#testIndexWithShadowReplicasCleansUpIndexWithShadowReplicaIT#testShadowReplicaNaturalRelocation
Note that shared shard data is cleaned up properly in a shadow replica index that is not closed, as the shard data is deleted by the StoreCloseListener. This is verified in the tests with the assertPathHasBeenCleared assert.
No. 2
The issue with deleting a shadow replica index that was previously closed is that all of the index and shard data are potentially deleted simultaneously by each node that receives the delete operation and invokes NodeEnvironment#deleteIndexDirectorySafe. This can lead to race conditions where a node is trying to delete a file that was deleted by another node as both are walking the file system simultaneously (using Lucene's IOUtils.rm). This ends up logged as a warning in IndicesService#deleteIndexStore(String reason, Index index, IndexSettings indexSettings and the deletion is put on the pending queue.