Skip to content

Downsampling test fix: reorder policy and index creation.#109787

Merged
elasticsearchmachine merged 4 commits intoelastic:mainfrom
gmarouli:fix-wrong-order-policy-index-creation
Jun 17, 2024
Merged

Downsampling test fix: reorder policy and index creation.#109787
elasticsearchmachine merged 4 commits intoelastic:mainfrom
gmarouli:fix-wrong-order-policy-index-creation

Conversation

@gmarouli
Copy link
Copy Markdown
Contributor

This PR fixes two test failures #103981 & #105437 and refactors the code a bit to make things more explicit.

What was the issue
These tests were creating an index with a policy before that policy was created. This could cause an issue if ILM would run after the index was created but before the policy was created.

When ILM runs before the policy is added, the following happen:

  • the index encounters an error the ILM state sets that the current step is null, which makes sense since there is no policy to retrieve a step from.
  • A null step does not qualify to be executed periodically, which also makes sense because probably nothing changed, so chances are the index will remain in this state.
  • The test keeps waiting for something to happen, but this is not happening because no cluster state updates are coming like they would have if this was a "real" cluster.
  • Until the test tear down starts, then the index gets updates with the ILM policy but it's a bit too late.

The previous scenario is confirmed by the logging too.

----> The index gets created referring a policy that does not exist yet, ILM runs at least twice before the policy is there
[2024-06-12T20:14:28,857][....] [index-sanohmhwxl] creating index, ......
[2024-06-12T20:14:28,870][....] [index-sanohmhwxl] retrieved current step key: null
[2024-06-12T20:14:28,871][....] unable to retrieve policy [policy-tohpA] for index [index-sanohmhwxl], recording this in step_info for this index java.lang.IllegalArgumentException: policy [policy-tohpA] does not exist

-----> Only now the policy is added
[2024-06-12T20:14:29,024][....] adding index lifecycle policy [policy-tohpA]

-----> ILM is running periodically but because the current step is null it ignores it
[2024-06-12T20:15:23,791][....] job triggered: ilm, 1718223323790, 1718223323790
[2024-06-12T20:15:23,791][....] retrieved current step key: null
[2024-06-12T20:15:23,791][....] maybe running periodic step (InitializePolicyContextStep) with current step {"phase":"new","action":"init","name":"init"}

This can also be locally reproduced by adding a 5s thread sleep before adding the policy.

The fix
Adding a non existing policy to an index is a not a supported path. For this reason, we refactored the test to reflect a more realistic scenario.

  • We add the policy as an argument in private void createIndex(String index, String alias, String policy, boolean isTimeSeries). This way it's clear that a policy could be added.
  • We created the policy before adding the index, it does not appear that adding the policy later is crucial for the test, so simplifying it sounded like a good idea.
  • Simplified testRollupIndexInTheHotPhaseWithoutRollover that ensures that a downsampling action cannot be added in the hot phase without rollover. An index is not necessary for this test, so again simplifying it makes the purpose of the test more clear.

Fixes: #103981
Fixes: #105437

@gmarouli gmarouli added >test Issues or PRs that are addressing/adding tests auto-backport Automatically create backport pull requests when merged :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data v8.14.2 labels Jun 17, 2024
@gmarouli gmarouli requested a review from kkrik-es June 17, 2024 07:11
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Copy link
Copy Markdown
Member

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mary!

@gmarouli gmarouli added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 17, 2024
@gmarouli gmarouli changed the title Reorder policy and index creation. Downsampling test fix: reorder policy and index creation. Jun 17, 2024
@elasticsearchmachine elasticsearchmachine merged commit 05f80ef into elastic:main Jun 17, 2024
@gmarouli gmarouli deleted the fix-wrong-order-policy-index-creation branch June 17, 2024 07:56
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

💚 Backport successful

Status Branch Result
8.14

gmarouli added a commit to gmarouli/elasticsearch that referenced this pull request Jun 17, 2024
…9787)

This PR fixes two test failures
elastic#103981 &
elastic#105437 and refactors the
code a bit to make things more explicit.

**What was the issue** These tests were creating an index with a policy
before that policy was created. This could cause an issue if ILM would
run after the index was created but before the policy was created. 

When ILM runs before the policy is added, the following happen:

-  the index encounters an error the ILM state sets that the current step is `null`, which makes sense since there is no policy to retrieve a step from. 
- A `null` step does not qualify to be executed periodically, which also makes sense because probably nothing changed, so chances are the index will remain in this state.
- The test keeps waiting for something to happen, but this is not happening because no cluster state updates are coming like they would have if this was a "real" cluster. 
- Until the test tear down starts, then the index gets updates with the ILM policy but it's a bit too late.

The previous scenario is confirmed by the logging too.

```
----> The index gets created referring a policy that does not exist yet, ILM runs at least twice before the policy is there
[2024-06-12T20:14:28,857][....] [index-sanohmhwxl] creating index, ......
[2024-06-12T20:14:28,870][....] [index-sanohmhwxl] retrieved current step key: null
[2024-06-12T20:14:28,871][....] unable to retrieve policy [policy-tohpA] for index [index-sanohmhwxl], recording this in step_info for this index java.lang.IllegalArgumentException: policy [policy-tohpA] does not exist

-----> Only now the policy is added
[2024-06-12T20:14:29,024][....] adding index lifecycle policy [policy-tohpA]

-----> ILM is running periodically but because the current step is null it ignores it
[2024-06-12T20:15:23,791][....] job triggered: ilm, 1718223323790, 1718223323790
[2024-06-12T20:15:23,791][....] retrieved current step key: null
[2024-06-12T20:15:23,791][....] maybe running periodic step (InitializePolicyContextStep) with current step {"phase":"new","action":"init","name":"init"}
```

This can also be locally reproduced by adding a 5s thread sleep before
adding the policy. 

**The fix** Adding a non existing policy to an index is a not a
supported path. For this reason, we refactored the test to reflect a
more realistic scenario. 

- We add the policy as an argument in `private void createIndex(String index, String alias, String policy, boolean isTimeSeries)`. This way it's clear that a policy could be added.
- We created the policy before adding the index, it does not appear that adding the policy later is crucial for the test, so simplifying it sounded like a good idea.
- Simplified `testRollupIndexInTheHotPhaseWithoutRollover` that ensures that a downsampling action cannot be added in the hot phase without rollover. An index is not necessary for this test, so again simplifying it makes the purpose of the test more clear.

Fixes: elastic#103981 Fixes:
elastic#105437
elasticsearchmachine pushed a commit that referenced this pull request Jun 17, 2024
…109791)

This PR fixes two test failures
#103981 &
#105437 and refactors the
code a bit to make things more explicit.

**What was the issue** These tests were creating an index with a policy
before that policy was created. This could cause an issue if ILM would
run after the index was created but before the policy was created. 

When ILM runs before the policy is added, the following happen:

-  the index encounters an error the ILM state sets that the current step is `null`, which makes sense since there is no policy to retrieve a step from. 
- A `null` step does not qualify to be executed periodically, which also makes sense because probably nothing changed, so chances are the index will remain in this state.
- The test keeps waiting for something to happen, but this is not happening because no cluster state updates are coming like they would have if this was a "real" cluster. 
- Until the test tear down starts, then the index gets updates with the ILM policy but it's a bit too late.

The previous scenario is confirmed by the logging too.

```
----> The index gets created referring a policy that does not exist yet, ILM runs at least twice before the policy is there
[2024-06-12T20:14:28,857][....] [index-sanohmhwxl] creating index, ......
[2024-06-12T20:14:28,870][....] [index-sanohmhwxl] retrieved current step key: null
[2024-06-12T20:14:28,871][....] unable to retrieve policy [policy-tohpA] for index [index-sanohmhwxl], recording this in step_info for this index java.lang.IllegalArgumentException: policy [policy-tohpA] does not exist

-----> Only now the policy is added
[2024-06-12T20:14:29,024][....] adding index lifecycle policy [policy-tohpA]

-----> ILM is running periodically but because the current step is null it ignores it
[2024-06-12T20:15:23,791][....] job triggered: ilm, 1718223323790, 1718223323790
[2024-06-12T20:15:23,791][....] retrieved current step key: null
[2024-06-12T20:15:23,791][....] maybe running periodic step (InitializePolicyContextStep) with current step {"phase":"new","action":"init","name":"init"}
```

This can also be locally reproduced by adding a 5s thread sleep before
adding the policy. 

**The fix** Adding a non existing policy to an index is a not a
supported path. For this reason, we refactored the test to reflect a
more realistic scenario. 

- We add the policy as an argument in `private void createIndex(String index, String alias, String policy, boolean isTimeSeries)`. This way it's clear that a policy could be added.
- We created the policy before adding the index, it does not appear that adding the policy later is crucial for the test, so simplifying it sounded like a good idea.
- Simplified `testRollupIndexInTheHotPhaseWithoutRollover` that ensures that a downsampling action cannot be added in the hot phase without rollover. An index is not necessary for this test, so again simplifying it makes the purpose of the test more clear.

Fixes: #103981 Fixes:
#105437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data Team:StorageEngine >test Issues or PRs that are addressing/adding tests v8.14.2 v8.15.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] DownsampleActionIT testRollupIndex failing [CI] DownsampleActionIT testRollupNonTSIndex failing

3 participants