Describe the bug
Problem:
PrimaryShardBatchAllocator: At https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/gateway/PrimaryShardBatchAllocator.java#L107 List.contains check is being performed on a list of unassigned shards in batch against all unassigned shards which expensive.
Identified regression
`---ts=2024-06-25 09:25:40;thread_name=opensearch[cce76896e008802de8384c1a0cea2d6a][clusterManagerService#updateTask][T#1];id=291;is_daemon=true;priority=5;TCCL=jdk.internal.loader.ClassLoaders$AppClassLoader@531d72ca
`---[19021.204849ms] org.opensearch.gateway.PrimaryShardBatchAllocator:allocateUnassignedBatch()
+---[0.00% 0.001395ms ] java.util.List:size() #85
+---[0.00% 9.6E-4ms ] org.apache.logging.log4j.Logger:debug() #85
+---[0.00% 6.4E-4ms ] java.util.HashMap:<init>() #86
+---[0.00% 5.5E-4ms ] java.util.ArrayList:<init>() #87
+---[0.00% 5.17E-4ms ] java.util.ArrayList:<init>() #88
+---[0.00% 5.83E-4ms ] java.util.List:iterator() #90
+---[0.01% min=4.02E-4ms,max=7.47E-4ms,total=1.672409ms,count=4001] java.util.Iterator:hasNext() #90
+---[0.01% min=4.34E-4ms,max=0.008025ms,total=1.825077ms,count=4000] java.util.Iterator:next() #90
+---[0.02% min=7.46E-4ms,max=0.001255ms,total=3.144428ms,count=4000] org.opensearch.gateway.PrimaryShardBatchAllocator:getInEligibleShardDecision() #91
+---[0.01% min=4.26E-4ms,max=0.012168ms,total=1.787079ms,count=4000] java.util.List:add() #96
+---[0.01% 1.454972ms ] org.opensearch.gateway.PrimaryShardBatchAllocator:fetchData() #101
+---[0.00% 9.27E-4ms ] org.opensearch.cluster.routing.allocation.RoutingAllocation:routingNodes() #103
+---[0.00% 7.63E-4ms ] org.opensearch.cluster.routing.RoutingNodes:unassigned() #103
+---[0.00% 0.003914ms ] org.opensearch.cluster.routing.RoutingNodes$UnassignedShards:iterator() #103
+---[1.13% min=5.08E-4ms,max=0.019036ms,total=214.624591ms,count=382367] org.opensearch.cluster.routing.RoutingNodes$UnassignedShards$UnassignedIterator:hasNext() #104
+---[1.41% min=5.74E-4ms,max=0.01646ms,total=268.380958ms,count=382366] org.opensearch.cluster.routing.RoutingNodes$UnassignedShards$UnassignedIterator:next() #105
+---[92.38% min=5.09E-4ms,max=12.265055ms,total=17572.545705ms,count=382366] java.util.List:contains() #108
+---[0.01% min=5.17E-4ms,max=9.19E-4ms,total=2.21695ms,count=4000] org.opensearch.cluster.routing.ShardRouting:shardId() #110
+---[0.01% min=4.51E-4ms,max=7.63E-4ms,total=1.905109ms,count=4000] java.util.HashMap:containsKey() #110
+---[0.02% min=6.89E-4ms,max=0.001904ms,total=3.423754ms,count=4000] org.opensearch.gateway.PrimaryShardBatchAllocator:adaptToNodeShardStates() #113
+---[0.03% min=0.001042ms,max=0.01842ms,total=6.486992ms,count=4000] org.opensearch.gateway.PrimaryShardBatchAllocator:getAllocationDecision() #114
+---[0.72% min=0.001321ms,max=0.074331ms,total=137.576431ms,count=4000] org.opensearch.gateway.PrimaryShardBatchAllocator:executeDecision() #116
+---[0.00% 6.15E-4ms ] java.util.List:size() #119
`---[0.00% 7.06E-4ms ] org.apache.logging.log4j.Logger:debug() #119
Replica allocations:
`---ts=2024-06-25 09:37:12;thread_name=opensearch[cce76896e008802de8384c1a0cea2d6a][clusterManagerService#updateTask][T#1];id=291;is_daemon=true;priority=5;TCCL=jdk.internal.loader.ClassLoaders$AppClassLoader@531d72ca
`---[97.588697ms] org.opensearch.gateway.ReplicaShardBatchAllocator:allocateUnassignedBatch()
+---[0.00% 0.001641ms ] java.util.List:size() #120
+---[0.00% 7.38E-4ms ] org.apache.logging.log4j.Logger:debug() #120
+---[0.00% 6.15E-4ms ] java.util.ArrayList:<init>() #121
+---[0.00% 5.17E-4ms ] java.util.ArrayList:<init>() #122
+---[0.00% 6.73E-4ms ] java.util.HashMap:<init>() #123
+---[0.00% 6.73E-4ms ] java.util.List:iterator() #125
+---[1.76% min=4.1E-4ms,max=8.86E-4ms,total=1.721501ms,count=4001] java.util.Iterator:hasNext() #125
+---[1.85% min=4.26E-4ms,max=9.52E-4ms,total=1.806896ms,count=4000] java.util.Iterator:next() #125
+---[3.59% min=8.45E-4ms,max=0.006646ms,total=3.505268ms,count=4000] org.opensearch.gateway.ReplicaShardBatchAllocator:getUnassignedShardAllocationDecision() #126
+---[1.93% min=4.43E-4ms,max=0.022982ms,total=1.87881ms,count=4000] java.util.List:add() #129
+---[2.79% min=5.16E-4ms,max=0.149802ms,total=2.718945ms,count=4000] java.util.Map:put() #130
+---[3.18% 3.104627ms ] org.opensearch.gateway.ReplicaShardBatchAllocator:fetchData() #137
+---[0.00% 0.001855ms ] java.util.List:stream() #139
+---[0.00% 8.12E-4ms ] java.util.stream.Stream:map() #139
+---[0.00% 5.66E-4ms ] java.util.stream.Collectors:toList() #139
+---[0.21% 0.201313ms ] java.util.stream.Stream:collect() #139
+---[0.00% 7.22E-4ms ] org.opensearch.cluster.routing.allocation.RoutingAllocation:routingNodes() #140
+---[0.00% 7.3E-4ms ] org.opensearch.cluster.routing.RoutingNodes:unassigned() #140
+---[0.00% 0.003873ms ] org.opensearch.cluster.routing.RoutingNodes$UnassignedShards:iterator() #140
+---[2.20% min=5.08E-4ms,max=0.009994ms,total=2.145579ms,count=4001] org.opensearch.cluster.routing.RoutingNodes$UnassignedShards$UnassignedIterator:hasNext() #141
+---[2.71% min=5.83E-4ms,max=0.010535ms,total=2.6432ms,count=4000] org.opensearch.cluster.routing.RoutingNodes$UnassignedShards$UnassignedIterator:next() #142
+---[2.08% min=4.59E-4ms,max=0.015689ms,total=2.029472ms,count=4000] org.opensearch.cluster.routing.ShardRouting:primary() #146
+---[2.21% min=5.08E-4ms,max=0.012275ms,total=2.1536ms,count=4000] org.opensearch.cluster.routing.ShardRouting:shardId() #146
+---[30.93% min=4.51E-4ms,max=0.033813ms,total=30.187663ms,count=4000] java.util.List:contains() #146
+---[2.56% min=4.76E-4ms,max=0.014458ms,total=2.499516ms,count=4000] java.util.Map:containsKey() #148
+---[2.08% min=4.75E-4ms,max=0.013211ms,total=2.026379ms,count=4000] java.util.Map:get() #149
+---[5.86% min=9.84E-4ms,max=0.012866ms,total=5.714058ms,count=4000] org.opensearch.gateway.ReplicaShardBatchAllocator:executeDecision() #160
+---[0.00% 7.3E-4ms ] java.util.List:size() #163
`---[0.00% 0.001846ms ] org.apache.logging.log4j.Logger:debug() #163
Related component
Cluster Manager
To Reproduce
N/A
Expected behavior
Use HashSet data-structure instead of List for constant performance.
Additional Details
OpenSearchVersion 2.13
Please list all plugins currently enabled.
Describe the bug
Problem:
PrimaryShardBatchAllocator: At https://github.com/opensearch-project/OpenSearch/blob/main/server/src/main/java/org/opensearch/gateway/PrimaryShardBatchAllocator.java#L107 List.contains check is being performed on a list of unassigned shards in batch against all unassigned shards which expensive.
Identified regression
Replica allocations:
Related component
Cluster Manager
To Reproduce
N/A
Expected behavior
Use HashSet data-structure instead of List for constant performance.
Additional Details
OpenSearchVersion 2.13
Please list all plugins currently enabled.