Skip to content

Add segment sorter for data streams#77261

Merged
mayya-sharipova merged 1 commit intoelastic:7.xfrom
mayya-sharipova:datastreams-leaf-sorter-7.x
Sep 3, 2021
Merged

Add segment sorter for data streams#77261
mayya-sharipova merged 1 commit intoelastic:7.xfrom
mayya-sharipova:datastreams-leaf-sorter-7.x

Conversation

@mayya-sharipova
Copy link
Copy Markdown
Contributor

@mayya-sharipova mayya-sharipova commented Sep 3, 2021

It is beneficial to sort segments within a datastream's index
by desc order of their max timestamp field, so
that the most recent (in terms of timestamp) segments
will be first.

This allows to speed up sort query on @timestamp desc field,
which is the most common type of query for datastreams,
as we are mostly concerned with the recent data.
This patch addressed this for writable indices.

Segments' sorter is different from index sorting.
An index sorter by itself is concerned about the order of docs
within an individual segment (and not how the segments are organized),
while the segment sorter is only used during search and allows
to start docs collection with the "right" segment,
so we can terminate the collection faster.

This PR adds a property to IndexShard isDataStreamIndex that
shows if a shard is a part of datastream.

Backport for #75195

It is beneficial to sort segments within a datastream's index
by desc order of their max timestamp field, so
that the most recent (in terms of timestamp) segments
will be first.

This allows to speed up sort query on @timestamp desc field,
which is the most common type of query for datastreams,
as we are mostly concerned with the recent data.
This patch addressed this for writable indices.

Segments' sorter is different from index sorting.
An index sorter by itself is  concerned about the order of docs
within an individual segment (and not how the segments are organized),
while the segment sorter is only used during search and allows
to start docs collection with the "right" segment,
so we can terminate the collection faster.

This PR adds a property to IndexShard `isDataStreamIndex` that
shows if a shard is a part of datastream.
@mayya-sharipova mayya-sharipova added the :Search/Search Search-related issues that do not fall into other categories label Sep 3, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Sep 3, 2021
@elasticmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-search (Team:Search)

@mayya-sharipova mayya-sharipova changed the title Add segment sorter for data streams (#75195) Add segment sorter for data streams Sep 3, 2021
@mayya-sharipova mayya-sharipova merged commit b830351 into elastic:7.x Sep 3, 2021
@mayya-sharipova mayya-sharipova deleted the datastreams-leaf-sorter-7.x branch September 3, 2021 15:33
jkakavas added a commit that referenced this pull request Oct 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v7.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants