Add possibility to acquire permits on primary shards with different checks#119794
Add possibility to acquire permits on primary shards with different checks#119794arteam merged 4 commits intoelastic:mainfrom
Conversation
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
…hecks Since elastic#42241 we check that the shard must be in a primary mode for acquiring a primary permit on it. We would like customize this check and an option to perform different checks before running the `onPermitAcquired` listener. For example, we would to skip the primary mode check when we acquire primary permits during recovering of a hollow indexing shard. See ES-10487
23917ed to
94631d5
Compare
kingherc
left a comment
There was a problem hiding this comment.
Looking good. A couple of comments.
| /** | ||
| * Check to run before running the primary permit operation | ||
| */ | ||
| public enum PrimaryPermitCheck { |
There was a problem hiding this comment.
I'm a bit late, but wouldn't be possible to somehow relax the assertion in #isPrimaryMode when the engine is hollow instead?
There was a problem hiding this comment.
Hi @fcofdez ! It's possible, but it's unclear what's the best way to do it since the hollow info/logic is in serverless code. In the POC, I remember I simply relaxed the assertion by allowing it if the shard is in the recovery process. Not sure though it's better than this PR since it was more generic (recovering shards). Feel free to shoot a specific proposal though to see if it's better.
There was a problem hiding this comment.
@fcofdez I believe the reason was that IndexShard doesn't know about HollowEngine which is a serverless concept, so you would need to expose this abstraction to Engine. I think disabling the primary mode check explicitly seems to be a good first approach and we can revert to the engine check if that makes more sense.
|
I've been chatting with @kingherc and I think that we could avoid this change if we acquire the permits in the target node during the primary context handover in serverless as that's called before the shard is marked as started in . That would avoid having to change this API. |
|
@fcofdez agreed. Opened https://elasticco.atlassian.net/browse/ES-10537 . |
We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release the permits in order for the test cluster to be properly shut down. See
* [WIP] Add support for relocating hollow shards * Add a test for relocating hollow shard * Run spotless * Use logger instead of System.out.println * Don't check activeOperations if we hold permits * Test relocating hollow shard * Add more testing for relocating hollowable shards * Acquire primary permits on hollow shards on recovery We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release them in order for the test cluster properly shit down. * Add releasePrimaryPermits call when shutting down HollowIndexEngine * Make releasing primary permits package-private * Add a comment about HollowShardsService being available for DI * Remove extra isLastCommitHollow check * Support relocating hollow shards with HollowIndexEngine * Run spotless * Move stateless compound checks * Add tests for swapping hollowed shards to HollowEngine on failures * Swap the engine to HollowIndexEngine if we were unable to relocate the shard * Always reset engine while holding primary permits * Extract switch to hollow shard in a separate method * Acquire primary permits after we initialized the shard with hollow index engine * Add additional arePrimaryPermitsHeld checks after the relocation * Revert to debug for "obtained primary context" message * Use resetEngine * Don't swap to HollowIndexEngine on source node when relocation fails * Add a link to JIRA ticket about swapping the engine on relocation failures * Rollback "acquiring all primary operation permits" to DEBUG * Reference in the relocation fail test * Use handOffPrimaryPermits for primaryPermits * use handOffPrimaryPermits * Simplify breaking of relocations Just throw exceptions instead of disconecting nodes * Use AtomicReference for primaryPermits since we mutate it --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release the permits in order for the test cluster to be properly shut down. See
* [WIP] Add support for relocating hollow shards * Add a test for relocating hollow shard * Run spotless * Use logger instead of System.out.println * Don't check activeOperations if we hold permits * Test relocating hollow shard * Add more testing for relocating hollowable shards * Acquire primary permits on hollow shards on recovery We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release them in order for the test cluster properly shit down. * Add releasePrimaryPermits call when shutting down HollowIndexEngine * Make releasing primary permits package-private * Add a comment about HollowShardsService being available for DI * Remove extra isLastCommitHollow check * Support relocating hollow shards with HollowIndexEngine * Run spotless * Move stateless compound checks * Add tests for swapping hollowed shards to HollowEngine on failures * Swap the engine to HollowIndexEngine if we were unable to relocate the shard * Always reset engine while holding primary permits * Extract switch to hollow shard in a separate method * Acquire primary permits after we initialized the shard with hollow index engine * Add additional arePrimaryPermitsHeld checks after the relocation * Revert to debug for "obtained primary context" message * Use resetEngine * Don't swap to HollowIndexEngine on source node when relocation fails * Add a link to JIRA ticket about swapping the engine on relocation failures * Rollback "acquiring all primary operation permits" to DEBUG * Reference in the relocation fail test * Use handOffPrimaryPermits for primaryPermits * use handOffPrimaryPermits * Simplify breaking of relocations Just throw exceptions instead of disconecting nodes * Use AtomicReference for primaryPermits since we mutate it --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
Since #42241 we check that the shard must be in a primary mode for acquiring a primary permit on it. We would like customize this check and an option to perform different checks before running the
onPermitAcquiredlistener. For example, we would to skip the primary mode check when we acquire primary permits during recovering of a hollow indexing shard.See ES-10487