Guard Repository#getRepositoryData for exception throw #50970
Guard Repository#getRepositoryData for exception throw #50970albertzaharovits wants to merge 4 commits intoelastic:masterfrom
Conversation
|
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
|
@albertzaharovits sorry for not thinking of it last night, but maybe instead of fixing the callers to the repository, shouldn't we better just fix |
|
@original-brownbear TBH I don't know how it's better, I think it depends on the conventions in the codebase. I don't like functions that throw and have a listener parameter. But in this case, it felt safer to me to handle the exception at a higher level since technically We could also fix I'm fine with any of the three options. Let me know your preference. |
|
We've synced and learned that the problem is not a case of a state applier thread, but the logic of listing to get the latest generation followed by a failure to read the blob for that generation and resetting the generation. |
|
Follow up is in #50987 |
In practice,
Repository#getRepositoryDatacan throw exceptions, for example theBlobStoreRepositoryimplementation of itelasticsearch/server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java
Line 1088 in 3c6f649
The
getRepositoryDatamethod is called before any operation involving snapshots. If the exception is thrown on the cluster state applier thread, as it happens when creating a new snapshot, the state applier thread will loop, retrying to apply the same state. If the exception is persistent, as is the case of an encrypted repository with a wrong password (see #50846), the state applier thread will get stuck.This commit tracks and fixes all uses of
Repository#getRepositoryDatato make sure that at some point the exception is catched and forwarded to aListener#onFailure.