Skip to content

optimize getIndices in IndicesSegmentResponse#80064

Merged
DaveCTurner merged 4 commits intoelastic:masterfrom
mushao999:feature/optimize_indices_segment_response
Oct 29, 2021
Merged

optimize getIndices in IndicesSegmentResponse#80064
DaveCTurner merged 4 commits intoelastic:masterfrom
mushao999:feature/optimize_indices_segment_response

Conversation

@mushao999
Copy link
Copy Markdown
Contributor

The getIndices method in IndicesSegmentResponse will traverse shards for n+1 times (n is indices count), which can be optimized to just traverse once by using Map and its computeIfAbsent method.
Same logic can be found in IndexSegments’ constructor, where we just traverse shard once.
This PR just want to make elasticsearch code better, thougth getIndices method may be called Infrequently and has no performance issue.

@elasticsearchmachine elasticsearchmachine added v8.1.0 external-contributor Pull request authored by a developer outside the Elasticsearch team labels Oct 29, 2021
@DaveCTurner DaveCTurner added :Core/Infra/Stats Statistics tracking and retrieval APIs >enhancement labels Oct 29, 2021
@elasticmachine elasticmachine added the Team:Data Management (obsolete) DO NOT USE. This team no longer exists. label Oct 29, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@DaveCTurner
Copy link
Copy Markdown
Member

@elasticmachine ok to test

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a nice improvement. I left a few suggestions.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested rename:

Suggested change
Map<String, List<ShardSegments>> tmpIndicesSegment = new HashMap<>();
final Map<String, List<ShardSegments>> segmentsByIndex = new HashMap<>();

Comment on lines 64 to 68
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest inlining the intermediate variable:

Suggested change
List<ShardSegments> indexSegments = tmpIndicesSegment.computeIfAbsent(
shard.getShardRouting().getIndexName(),
k -> new ArrayList<>()
);
indexSegments.add(shard);
segmentsByIndex.computeIfAbsent(shard.getShardRouting().getIndexName(), n -> new ArrayList<>()).add(shard);

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh also please make IndicesSegmentResponse#indicesSegments volatile. I don't see any places where we access it across threads today but it's possible we'd add one in future.

@mushao999
Copy link
Copy Markdown
Contributor Author

@DaveCTurner Thank you for your suggestion, I've updated the code.

@mushao999 mushao999 force-pushed the feature/optimize_indices_segment_response branch from e390ac4 to e652170 Compare October 29, 2021 13:32
@DaveCTurner
Copy link
Copy Markdown
Member

Thanks @mushao999, this looks good to me but CI is failing - you need to run ./gradlew spotlessapply to apply the correct formatting. Please don't force-push to PRs once they're being reviewed, it makes it hard to keep track of comments etc.

@mushao999
Copy link
Copy Markdown
Contributor Author

Thanks @mushao999, this looks good to me but CI is failing - you need to run ./gradlew spotlessapply to apply the correct formatting. Please don't force-push to PRs once they're being reviewed, it makes it hard to keep track of comments etc.

@DaveCTurner Thank you , I found this problem in jenkins log and have reformat the code. I am sorry for force pushing the code, I did this just to let it keep up with the latest master. No force push later.

Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It looks like we don't have a huge amount of testing of this API so I added a slightly stronger YAML test in 682e022. I also linked this from #77466 since it aligns closely with that work and I think this would be a problem in clusters with very large numbers of indices.

@DaveCTurner DaveCTurner merged commit 9233dc2 into elastic:master Oct 29, 2021
@mushao999 mushao999 deleted the feature/optimize_indices_segment_response branch October 30, 2021 06:38
@mushao999 mushao999 restored the feature/optimize_indices_segment_response branch November 5, 2021 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Stats Statistics tracking and retrieval APIs >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants