Skip to content

Downsample ILM action should skip non-time-series indices#94835

Merged
csoulios merged 8 commits intoelastic:mainfrom
csoulios:fix-ds-ilm
Mar 30, 2023
Merged

Downsample ILM action should skip non-time-series indices#94835
csoulios merged 8 commits intoelastic:mainfrom
csoulios:fix-ds-ilm

Conversation

@csoulios
Copy link
Copy Markdown
Contributor

Make the downsample action skip the non-time-series indices

As discussed in #93123 (comment):

... we think the downsample action should skip backing indices that are not part of a time series data stream. We do a similar approach with the unfollow action where non follower indices are skipped. And this allows to have the same policy for both regular data streams and tsdb data streams.

Closes #93123

@csoulios csoulios added >bug :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics v8.8.0 labels Mar 28, 2023
@csoulios csoulios requested review from andreidan and martijnvg March 28, 2023 15:41
@elasticsearchmachine elasticsearchmachine added Team:Data Management (obsolete) DO NOT USE. This team no longer exists. Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Mar 28, 2023
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Hi @csoulios, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-analytics-geo (Team:Analytics)

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

/**
* Updates the lifecycle policy for the rollup index for the original/currently managed index
*/
public class UpdateRollupIndexPolicyStep extends AsyncActionStep {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step is a left over from a previous implementation of rollup v2. No longer used.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, thanks for deleting this

Copy link
Copy Markdown
Contributor

@andreidan andreidan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Christos

Left a few minor comments

/**
* Updates the lifecycle policy for the rollup index for the original/currently managed index
*/
public class UpdateRollupIndexPolicyStep extends AsyncActionStep {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, thanks for deleting this

private void createIndex(String index, String alias) throws IOException {
@Before
public void updateClusterSettings() throws IOException {
updateClusterSettings(client(), Settings.builder().put("indices.lifecycle.poll_interval", "5s").build());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something we want? What's the purpose of delaying the poll interval to later than the 1 second we configure it in tests?

Copy link
Copy Markdown
Contributor Author

@csoulios csoulios Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason the @After cleanUpCluster() method sets the following:

[2023-03-29T13:19:01,928][INFO ][o.e.c.s.ClusterSettings  ] [runTask-0] updating [indices.lifecycle.poll_interval] from [5s] to [10m]

So, I need to reset it at my @Before test.

So my logs look like this:

2023-03-29T13:19:01,649][INFO ][o.e.c.m.MetadataDeleteIndexService] [runTask-0] [index-xpirenuavc/DJ7v787aSwyGeKUjIhkzHg] deleting index
[2023-03-29T13:19:01,928][INFO ][o.e.c.s.ClusterSettings  ] [runTask-0] updating [indices.lifecycle.poll_interval] from [5s] to [10m]
[2023-03-29T13:19:17,100][INFO ][o.e.c.s.ClusterSettings  ] [runTask-0] updating [indices.lifecycle.poll_interval] from [10m] to [5s]

Copy link
Copy Markdown
Contributor

@andreidan andreidan Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran the test on your branch (after having removed this @Before method) and I don't see the setting being changed to 10m.

Gradle configures the poll interval to 1000ms for these test clusters https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/ilm/qa/multi-node/build.gradle#L29 (elasticsearch.yml for all nodes in the cluster will contain this setting)

This is my diff for your branch

--- a/x-pack/plugin/ilm/qa/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/DownsampleActionIT.java
+++ b/x-pack/plugin/ilm/qa/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/DownsampleActionIT.java
@@ -109,11 +109,6 @@ public class DownsampleActionIT extends ESRestTestCase {
         );
     }

-    @Before
-    public void updateClusterSettings() throws IOException {
-        updateClusterSettings(client(), Settings.builder().put("indices.lifecycle.poll_interval", "5s").build());
-    }
-
     private void createIndex(String index, String alias, boolean isTimeSeries) throws IOException {
         Settings.Builder settings = Settings.builder()
             .put(IndexMetadata.SETTING_NUMBER_OF_SHARDS, 1)

Could you remove this and try again please? I'm fairly convinced we don't need it

Comment on lines +298 to +301
String rollupIndex = waitAndGetRollupIndexName(client(), index);
assertNull("Rollup index should not have been created", rollupIndex);
assertBusy(() -> assertTrue("Source index should not have been deleted", indexExists(index)));
assertBusy(() -> assertThat(getStepKeyForIndex(client(), index), equalTo(PhaseCompleteStep.finalStep(phaseName).getKey())));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions make sense, but they are written in a way that would make the test idle for 60 seconds until waitAndGetRollupIndexName timeouts.

Would it make sense to have the final step assertion be the first one ?
i.e.

assertBusy(() -> assertThat(getStepKeyForIndex(client(), index), equalTo(PhaseCompleteStep.finalStep(phaseName).getKey())));

Followed by regular assertions (they don't need to be assertBusy anymore as ILM is in the final step already)
e.g.

assertNull("Rollup index should not have been created", rollupIndex);
assertTrue("Source index should not have been deleted", indexExists(index));

@csoulios csoulios requested a review from andreidan March 29, 2023 12:31
Copy link
Copy Markdown
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.7

csoulios added a commit to csoulios/elasticsearch that referenced this pull request Mar 30, 2023
)

Make the downsample action skip the non-time-series indices

As discussed in elastic#93123 (comment):

    ... we think the downsample action should skip backing indices that are 
   not part of a time series data stream. We do a similar approach with the 
   unfollow action where non follower indices are skipped. And this allows to 
  have the same policy for both regular data streams and tsdb data streams.

Closes elastic#93123
elasticsearchmachine pushed a commit that referenced this pull request Mar 30, 2023
…94902)

Make the downsample action skip the non-time-series indices

As discussed in #93123 (comment):

    ... we think the downsample action should skip backing indices that are 
   not part of a time series data stream. We do a similar approach with the 
   unfollow action where non follower indices are skipped. And this allows to 
  have the same policy for both regular data streams and tsdb data streams.

Closes #93123
csoulios added a commit that referenced this pull request Apr 18, 2023
Test DownsampleActionIT.testRollupIndex() has been failing after PR #94835 got merged.
Most probably, because The PR removed the following line from the testRollupIndex() method
```
updateClusterSettings(client(), Settings.builder().put("indices.lifecycle.poll_interval", "5s").build());
```
To fix the test, we set the indices.lifecycle.poll_interval setting before all tests run.

Closes #95156
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>bug :Data Management/ILM+SLM DO NOT USE. Use ":StorageEngine/ILM" or ":Distributed Coordination/SLM" instead. :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data :StorageEngine/TSDB You know, for Metrics Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Data Management (obsolete) DO NOT USE. This team no longer exists. v8.7.1 v8.8.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Index is stuck in the downsample ILM action

4 participants