Add possibility to acquire permits on primary shards with different checks by arteam · Pull Request #119794 · elastic/elasticsearch

arteam · 2025-01-08T20:10:01Z

Since #42241 we check that the shard must be in a primary mode for acquiring a primary permit on it. We would like customize this check and an option to perform different checks before running the onPermitAcquired listener. For example, we would to skip the primary mode check when we acquire primary permits during recovering of a hollow indexing shard.

See ES-10487

elasticsearchmachine · 2025-01-08T20:10:25Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

…hecks Since elastic#42241 we check that the shard must be in a primary mode for acquiring a primary permit on it. We would like customize this check and an option to perform different checks before running the `onPermitAcquired` listener. For example, we would to skip the primary mode check when we acquire primary permits during recovering of a hollow indexing shard. See ES-10487

kingherc

Looking good. A couple of comments.

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

kingherc

LGTM

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java

fcofdez · 2025-01-10T15:21:51Z

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java

+    /**
+     * Check to run before running the primary permit operation
+     */
+    public enum PrimaryPermitCheck {


I'm a bit late, but wouldn't be possible to somehow relax the assertion in #isPrimaryMode when the engine is hollow instead?

Hi @fcofdez ! It's possible, but it's unclear what's the best way to do it since the hollow info/logic is in serverless code. In the POC, I remember I simply relaxed the assertion by allowing it if the shard is in the recovery process. Not sure though it's better than this PR since it was more generic (recovering shards). Feel free to shoot a specific proposal though to see if it's better.

@fcofdez I believe the reason was that IndexShard doesn't know about HollowEngine which is a serverless concept, so you would need to expose this abstraction to Engine. I think disabling the primary mode check explicitly seems to be a good first approach and we can revert to the engine check if that makes more sense.

fcofdez · 2025-01-14T14:41:54Z

I've been chatting with @kingherc and I think that we could avoid this change if we acquire the permits in the target node during the primary context handover in serverless as that's called before the shard is marked as started in

elasticsearch/server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

Lines 974 to 988 in a59c182

    
           @Override 
        
           public void onRecoveryDone( 
        
               final RecoveryState state, 
        
               ShardLongFieldRange timestampMillisFieldRange, 
        
               ShardLongFieldRange eventIngestedMillisFieldRange 
        
           ) { 
        
               shardStateAction.shardStarted( 
        
                   shardRouting, 
        
                   primaryTerm, 
        
                   "after " + state.getRecoverySource(), 
        
                   timestampMillisFieldRange, 
        
                   eventIngestedMillisFieldRange, 
        
                   ActionListener.noop() 
        
               ); 
        
           }

. That would avoid having to change this API.

kingherc · 2025-01-14T14:55:43Z

@fcofdez agreed. Opened https://elasticco.atlassian.net/browse/ES-10537 .

We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release the permits in order for the test cluster to be properly shut down. See

* [WIP] Add support for relocating hollow shards * Add a test for relocating hollow shard * Run spotless * Use logger instead of System.out.println * Don't check activeOperations if we hold permits * Test relocating hollow shard * Add more testing for relocating hollowable shards * Acquire primary permits on hollow shards on recovery We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release them in order for the test cluster properly shit down. * Add releasePrimaryPermits call when shutting down HollowIndexEngine * Make releasing primary permits package-private * Add a comment about HollowShardsService being available for DI * Remove extra isLastCommitHollow check * Support relocating hollow shards with HollowIndexEngine * Run spotless * Move stateless compound checks * Add tests for swapping hollowed shards to HollowEngine on failures * Swap the engine to HollowIndexEngine if we were unable to relocate the shard * Always reset engine while holding primary permits * Extract switch to hollow shard in a separate method * Acquire primary permits after we initialized the shard with hollow index engine * Add additional arePrimaryPermitsHeld checks after the relocation * Revert to debug for "obtained primary context" message * Use resetEngine * Don't swap to HollowIndexEngine on source node when relocation fails * Add a link to JIRA ticket about swapping the engine on relocation failures * Rollback "acquiring all primary operation permits" to DEBUG * Reference in the relocation fail test * Use handOffPrimaryPermits for primaryPermits * use handOffPrimaryPermits * Simplify breaking of relocations Just throw exceptions instead of disconecting nodes * Use AtomicReference for primaryPermits since we mutate it --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release the permits in order for the test cluster to be properly shut down. See

* [WIP] Add support for relocating hollow shards * Add a test for relocating hollow shard * Run spotless * Use logger instead of System.out.println * Don't check activeOperations if we hold permits * Test relocating hollow shard * Add more testing for relocating hollowable shards * Acquire primary permits on hollow shards on recovery We added support for acquiring permits on primary shards without the routing mode check in elastic#119794, now we should be able to acquire permits for hollow shards. We also have to release them in order for the test cluster properly shit down. * Add releasePrimaryPermits call when shutting down HollowIndexEngine * Make releasing primary permits package-private * Add a comment about HollowShardsService being available for DI * Remove extra isLastCommitHollow check * Support relocating hollow shards with HollowIndexEngine * Run spotless * Move stateless compound checks * Add tests for swapping hollowed shards to HollowEngine on failures * Swap the engine to HollowIndexEngine if we were unable to relocate the shard * Always reset engine while holding primary permits * Extract switch to hollow shard in a separate method * Acquire primary permits after we initialized the shard with hollow index engine * Add additional arePrimaryPermitsHeld checks after the relocation * Revert to debug for "obtained primary context" message * Use resetEngine * Don't swap to HollowIndexEngine on source node when relocation fails * Add a link to JIRA ticket about swapping the engine on relocation failures * Rollback "acquiring all primary operation permits" to DEBUG * Reference in the relocation fail test * Use handOffPrimaryPermits for primaryPermits * use handOffPrimaryPermits * Simplify breaking of relocations Just throw exceptions instead of disconecting nodes * Use AtomicReference for primaryPermits since we mutate it --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>

arteam added >non-issue :Distributed/Engine Anything around managing Lucene and the Translog in an open shard. labels Jan 8, 2025

elasticsearchmachine added Team:Distributed Indexing (obsolete) Meta label for Distributed Indexing team. Obsolete. Please do not use. v9.0.0 labels Jan 8, 2025

arteam force-pushed the acquire-primary-permit-custom-check branch from 23917ed to 94631d5 Compare January 8, 2025 20:12

arteam requested review from fcofdez and kingherc January 9, 2025 07:55

Merge branch 'main' into acquire-primary-permit-custom-check

7c74d3b

kingherc reviewed Jan 10, 2025

View reviewed changes

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java Show resolved Hide resolved

server/src/main/java/org/elasticsearch/index/shard/IndexShard.java Show resolved Hide resolved

arteam added 2 commits January 10, 2025 12:09

Merge branch 'main' into acquire-primary-permit-custom-check

e454804

Add comments about disable primary mode checks

cdda060

kingherc approved these changes Jan 10, 2025

View reviewed changes

server/src/test/java/org/elasticsearch/index/shard/IndexShardTests.java Show resolved Hide resolved

arteam merged commit 6ca7e75 into elastic:main Jan 10, 2025

arteam deleted the acquire-primary-permit-custom-check branch January 10, 2025 13:04

fcofdez reviewed Jan 10, 2025

View reviewed changes

arteam mentioned this pull request Jan 20, 2025

Revert primary permits changes and add hook #120398

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add possibility to acquire permits on primary shards with different checks#119794

Add possibility to acquire permits on primary shards with different checks#119794
arteam merged 4 commits intoelastic:mainfrom
arteam:acquire-primary-permit-custom-check

arteam commented Jan 8, 2025

Uh oh!

elasticsearchmachine commented Jan 8, 2025

Uh oh!

kingherc left a comment

Uh oh!

Uh oh!

Uh oh!

kingherc left a comment

Uh oh!

Uh oh!

fcofdez Jan 10, 2025

Uh oh!

kingherc Jan 10, 2025

Uh oh!

arteam Jan 13, 2025

Uh oh!

fcofdez commented Jan 14, 2025

Uh oh!

kingherc commented Jan 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

arteam commented Jan 8, 2025

Uh oh!

elasticsearchmachine commented Jan 8, 2025

Uh oh!

kingherc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kingherc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fcofdez Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

kingherc Jan 10, 2025

Choose a reason for hiding this comment

Uh oh!

arteam Jan 13, 2025

Choose a reason for hiding this comment

Uh oh!

fcofdez commented Jan 14, 2025

Uh oh!

kingherc commented Jan 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants