Check latest repoData before applying workaround for missing shardGen file#112979
Check latest repoData before applying workaround for missing shardGen file#112979elasticsearchmachine merged 3 commits intoelastic:mainfrom
Conversation
It is expected that the old master may attempt to read a shardGen file that is deleted by the new master. This PR suppresses the assertion. Resolves: elastic#112811
|
Pinging @elastic/es-distributed (Team:Distributed) |
DaveCTurner
left a comment
There was a problem hiding this comment.
Hmm a master failover is an "expected" situation, we can't have a by-default tripping assertion here.
Could we enhance the check to verify that the root blob is unchanged before running the workaround?
| final long latestGeneration = latestIndexBlobId(); | ||
| final RepositoryData currentRepositoryData = getRepositoryData(latestGeneration); | ||
| if (currentRepositoryData.shardGenerations().hasShardGen(new RepositoryShardId(indexId, shardId)) == false) { | ||
| // Master has concurrently mutated the shard generation. This can happen when master fails over | ||
| // which is "expected". We do not need the following workaround in this case. | ||
| throw noSuchFileException; | ||
| } |
There was a problem hiding this comment.
I wanted to simply compare the last known repo generation against the latest generation by listing root blobs. But it seems last known repo generation is not easily available especially when this runs in the context of shard snapshotting. Since this is a rare case, it feels ok to just read the repo data directly and check whether the shard gen still exists. It's slower but we are on a slow path already more or less. Hopefully that works for you.
That said, BlobStoreRepositoy class has many method related to fetching and caching repository data. Maybe one (some) of them can be reliably leveraged. Please let me know. Thank you!
|
@DaveCTurner Thanks for the feedback! I have updated the PR along the line of your suggestion. It's good for another look. |
DaveCTurner
left a comment
There was a problem hiding this comment.
Makes sense to me, thanks Yang. Couple of small comments.
| if (currentRepositoryData.shardGenerations().hasShardGen(new RepositoryShardId(indexId, shardId)) == false) { | ||
| // Master has concurrently mutated the shard generation. This can happen when master fails over |
There was a problem hiding this comment.
I think we're only checking whether the shard has been totally removed here? If the shard still exists then hasShardGen() should return true but getShardGen() will return a different generation from the one we were trying to load.
There was a problem hiding this comment.
You are right. I used the wrong method. Got confused by its name. It's now obvious that it does not take a shardGen parameter 🤦
Pushed fe1cf86
Thanks for spotting it.
| final long latestGeneration = latestIndexBlobId(); | ||
| final RepositoryData currentRepositoryData = getRepositoryData(latestGeneration); |
There was a problem hiding this comment.
These lines can also fail with an exception - if they do, I'd rather we suppressed the exception they throw and went back to throwing the original noSuchFileException.
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
… file (elastic#112979) It is expected that the old master may attempt to read a shardGen file that is deleted by the new master. This PR checks the latest repo data before applying the workaround (or throwing AssertionError) for missing shardGen files. Relates: elastic#112337 Resolves: elastic#112811 (cherry picked from commit 99b5ed8) # Conflicts: # muted-tests.yml
… file (#112979) (#113155) It is expected that the old master may attempt to read a shardGen file that is deleted by the new master. This PR checks the latest repo data before applying the workaround (or throwing AssertionError) for missing shardGen files. Relates: #112337 Resolves: #112811 (cherry picked from commit 99b5ed8) # Conflicts: # muted-tests.yml
It is expected that the old master may attempt to read a shardGen file that is deleted by the new master. This PR checks the latest repo data before applying the workaround (or throwing AssertionError) for missing shardGen files.
Relates: #112337
Resolves: #112811