Simplify IndicesShardStoresAction#94507
Conversation
- No need to use an `AsyncShardFetch` here, there is no caching - Response may be very large, introduce chunking - Fan-out may be very large, introduce throttling - Processing time may be nontrivial, introduce cancellability - Eliminate many unnecessary intermediate data structures - Do shard-level response processing more eagerly - Determine allocation from `RoutingTable` not `RoutingNodes` - Add tests Relates elastic#81081
|
Pinging @elastic/es-distributed (Team:Distributed) |
| private final String[] concreteIndices; | ||
| private final RoutingTable routingTable; | ||
| private final Metadata metadata; | ||
| private final Map<String, Map<Integer, List<StoreStatus>>> indicesStatuses; |
There was a problem hiding this comment.
Why not Map<ShardId, List<StoreStatus>> ?
There was a problem hiding this comment.
Really just because that's what the response wants.
| this.failures = new ConcurrentLinkedQueue<>(); | ||
| this.outerListener = new RefCountingListener(1, listener.map(ignored -> { | ||
| task.ensureNotCancelled(); | ||
| return new IndicesShardStoresResponse(Map.copyOf(indicesStatuses), List.copyOf(failures)); |
There was a problem hiding this comment.
Oh, was IndicesShardStoresResponse requiring map of maps before?
|
|
||
| private static final Logger logger = LogManager.getLogger(TransportIndicesShardStoresAction.class); | ||
|
|
||
| static final int CONCURRENT_REQUESTS_LIMIT = 100; // TODO configurable? |
There was a problem hiding this comment.
Should it be converted to a Setting or was it intentionally left as a TODO?
Not sure if this is important though as it was not limited previously
There was a problem hiding this comment.
Well it's worth a discussion. Bounding concurrency here might make things slower. 100 feels high enough to me to have limited impact, whilst being low enough to avoid this API being actively harmful to an enormous cluster.
There was a problem hiding this comment.
Seems it could also depend on CPU/network. Have we ever recorded this was an issue in the wild when it was unlimited?
If not I am leaning towards approach of making it configurable with a huge default (1000?) and let them set it lower if/when issue is detected.
I'm going to mark this as not-approved while I think about the concurrent-requests-limit part. Otherwise I'll forget that this is outstanding.
|
Ok I pushed d0333bf which makes the limit configurable on a request-by-request basis (using the same parameter as we use in the search API: |
|
@elasticmachine please run elasticsearch-ci/docs |
Added in elastic#94507 but without the comment from elastic#93101, which this commit fixes.
AsyncShardFetchhere, there is no cachingRoutingTablenotRoutingNodesRelates #81081