Run downsampling using persistent tasks by salvatore-campagna · Pull Request #97557 · elastic/elasticsearch

salvatore-campagna · 2023-07-11T08:49:18Z

Add support for downsampling persistent tasks. We would like to have the ability
to resume downsampling tasks, instead of starting from scratch in case of
("retriable") failures.

Instead of keeping track of the state of the task we just try to query the downsampling
target index before starting the actual downsampling task, getting the latest document
indexed and its tsid. If we find any, we just restart from that tsid, maybe
overwriting a subset of the documents already indexed (documents with the same
tsid whose timestamp is smaller then the one of the latest document).

Querying the downsampling target index is possible after introducing a predictable
naming scheme for the target index. This was not the case before, a part of the target
index name was random. That is required to be able to start from where the task left.

Note that since the ordering of documents is based on (tsid, timestamp) we do not
include the timestamp in the query to avoid skipping downsampling of documents
with a larger tsid but smaller timestamp.

Note also that we are not removing the TransportDownsampleIndexerAction for
backward compatibility. An older master node running a previous version would
not use persistent tasks and an API call would result in something like "missing
downsample action".

We are also adding an additional downsample rest query parameter, timeout.
It allows a user to set the maximum time taken while waiting for a downsampling
task to complete before returning with an timeout error. Note that when the timeout
triggers the task will still run under control of the executor, just waiting for it to finish
will result in a timeout. The default value for this timeout is 1 day.

Resolve #93582

elasticsearchmachine · 2023-07-11T08:50:01Z

Pinging @elastic/es-analytics-geo (Team:Analytics)

elasticsearchmachine · 2023-07-11T08:50:25Z

Hi @salvatore-campagna, I've created a changelog YAML for you.

elasticsearchmachine · 2023-07-11T08:51:27Z

Hi @salvatore-campagna, I've updated the changelog YAML for you.

salvatore-campagna · 2023-07-11T15:34:05Z

I removed x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/TransportDownsampleIndexerAction.java but I will restore it for BWC.

salvatore-campagna · 2023-07-11T15:53:32Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java


            @Override
            public void afterBulk(long executionId, BulkRequest request, BulkResponse response) {
+                // TODO: update persistent task state (consider multiple bulk requests in flight). Need to access the tsid 'per-bulk'


No need to keep track of in-flight indexing bulk requests.

salvatore-campagna · 2023-07-11T15:54:18Z

...llup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardPersistentTaskExecutor.java

+
+    @Override
+    protected void nodeOperation(final AllocatedPersistentTask task, final RollupShardTaskParams params, final PersistentTaskState state) {
+        final RollupShardPersistentTaskState taskState = state == null


Here we can query the target index getting the latest indexed document and its tsid and inject the tsid as the starting tsid.

salvatore-campagna · 2023-07-11T15:55:09Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/Rollup.java

            new ActionHandler<>(GetRollupIndexCapsAction.INSTANCE, TransportGetRollupIndexCapsAction.class),
            new ActionHandler<>(XPackUsageFeatureAction.ROLLUP, RollupUsageTransportAction.class),
            new ActionHandler<>(XPackInfoFeatureAction.ROLLUP, RollupInfoTransportAction.class),
-            new ActionHandler<>(DownsampleIndexerAction.INSTANCE, TransportDownsampleIndexerAction.class),


Will add this back for BWC.

salvatore-campagna · 2023-07-11T15:56:25Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/Rollup.java

+            indicesService
+        );
+
+        this.indicesService = indicesService;


This is executed before getPersisteTaskExecutor

salvatore-campagna · 2023-07-11T16:00:47Z

...src/main/java/org/elasticsearch/xpack/core/rollup/action/RollupShardPersistentTaskState.java

+        PersistentTaskState {
+
+    public static final String NAME = RollupShardTask.TASK_NAME;
+    private static final ParseField ROLLUP_SHARD_INDEXER_STATUS = new ParseField("status");


We need the task status other then the tsid to avoid resuming a task that failed with FAILED or CANCELLED. We would like to avoid being stuck in a loop resuming tasks the failed for "non-retriable" reasons.

salvatore-campagna · 2023-07-11T17:25:46Z

Will get rid of all deleteIndex.

martijnvg

I did a first review round. Thanks for working on this!

I think we should write a java integration test, that tests performing downsampling while all nodes get restarted. A similar test exists for CCR, see: FollowerFailOverIT.

martijnvg · 2023-07-12T10:07:23Z

...llup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardPersistentTaskExecutor.java

+        // Here we make sure we assign the task to the actual node holding the shard identified by
+        // the downsampling task shard id.
+        final ShardId shardId = params.shardId();
+        final ShardRouting shardRouting = clusterState.routingTable().shardRoutingTable(shardId).shard(shardId.id());


I think we also need to check whether the shard routing is started here.

What do you mean? Checking ShardRoutingState?

martijnvg · 2023-07-12T10:14:04Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java

+                null,
+                Collections.emptyMap()
+            );
+            return new BoolQueryBuilder().filter(new TermQueryBuilder(TimeSeriesIdFieldMapper.NAME, this.state.tsid()))


I think we can use Lucene query api instead of Elasticsearch's QueryBuilder api. That way there is no need to create a SearchExecutionContext here. For example:

return new BooleanQuery.Builder().add(new TermQuery(new Term(TimeSeriesIdFieldMapper.NAME, this.state.tsid())), BooleanClause.Occur.FILTER).build();

Actually this query will only match with documents that only have the specified tsid. I think we want to match with all documents that have the specified tsid and beyond? So I think we need a range query? If that is the case we need SortedSetDocValuesField.newSlowRangeQuery(...) because the _tsid field isn't indexed.

Hey, just a heads up, we've got this speedy query built for sorted indices that could totally work here too:
link.
Adapting it to our needs shouldn't be a big deal. And if we find it handy, we might even think about adding it directly into Lucene, kind of like what's been done with IndexSortSortedSetDocValuesRangeQuery.

Thanks for pointing this out Jim. The main focus is to get this PR merged sooner than later. But we will definitely change this in a followup change.

martijnvg · 2023-07-12T10:17:24Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java

+            );
+            return new BoolQueryBuilder().filter(new TermQueryBuilder(TimeSeriesIdFieldMapper.NAME, this.state.tsid()))
+                .toQuery(searchExecutionContext);
+        } else if (this.state.done()) {


maybe return in the execute() instead of here? This we can check without running query.

martijnvg · 2023-07-12T10:19:23Z

x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/downsample/RollupShardIndexer.java


        if (task.getNumIndexed() != task.getNumSent()) {
-            task.setRollupShardIndexerStatus(RollupShardIndexerStatus.FAILED);
+            task.updatePersistentTaskState(


maybe we should update the task.updatePersistentTaskState(...) in RollupShardPersistentTaskExecutor#nodeOperation(...)? When can just catch the exceptions thrown here there and set the status to failed.

martijnvg · 2023-07-12T10:20:40Z

.../core/src/main/java/org/elasticsearch/xpack/core/rollup/action/RollupShardIndexerStatus.java

+
+    @Override
+    public void writeTo(final StreamOutput out) throws IOException {
+        out.writeVInt(this.ordinal);


Did we before serialise this enum as string?

I will check..in that case I will revert it...

This was not a Writable before, so yes we were using the string representation of the enum.

…persistent-task

…e can't execute multiple downsample operations for the same source index concurrently.

…persistent-task

andreidan

Thanks for iterating on this Martijn, the ILM side LGTM

andreidan · 2023-08-14T08:19:24Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/DownsampleStep.java

@@ -93,11 +86,9 @@ public void performAction(
                );
                listener.onResponse(null);
            }


If the index doesn't have the SUCCESS tag but we're re-executing the downsample ILM step we should call performDownsampleIndex(indexName, downsampleIndexName, ActionListener.wrap(listener::onResponse, listener::onFailure)); to start waiting for a result or get an error message from the transport action

andreidan · 2023-08-14T15:19:22Z

x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/DownsampleStep.java

-            );
-            return;
-        }
+        final String downsampleIndexName = generateDownsampleIndexName(indexName, fixedInterval);


This just occurred to me - if we have downsampling configured multiple times, say in hot and warm. The hot phase will generate the target index name downsample-myindex-1h and then when we get to the warm phase to execute downsampling again we'll end up with the index name downsample-downsample-myindex-1h-3h
(as we don't use the original index name anymore).

This problem exists already in ILM but perhaps becomes a bit more visible with the new naming convention?
(currently in ILM we can endup with names like this downsample-8hit-downsample-eruv-.ds-my-data-2023.08.14-000001 )

Perhaps the Downsampling transport action should configure the index provided name setting index.provided_name to maintain the original index name? And we can make use of it in ILM and such to generate the new index name based on the originally provided name?

Right, this will look strange (and already looks strange). However I don't think the downsample action shouldn't be responsible for this, because the contract of this api is that the target index name is provided.

I think we should make the DownsampleStep smarter and that it generates a better name based on the source index? If the source index doesn't contain a the downsample prefix then it generates the same way it does now in the PR. If the source index contains the downsample index name prefix than it parse out the source index and generates a name from that.

I think we should do this in a follow up PR. This PR is already large and the problem already exists today. I will create a follow up PR and label it as a bug. We can then backport this as well after the FF.

Or just use provided index name in DownsampleStep, since current cluster state is available.

Great point, thanks for working on this !

…persistent-task

andreidan · 2023-08-15T08:52:25Z

Thanks @salvatore-campagna and @martijnvg for the great work here ! This is great !

If the downsampling index exists but it's still downsampling we should invocate the downsample transport action again (and wire up the request listener such that the ILM listener gets notified of success or failure) This modifies the test to assert the ILM listener is invoked and removes an invariant that doesn't hold anymore in ILM (i.e. previously before #97557 but now we can, and do) Markin as non-issue as this hasn't been released yet.

Add support for downsampling persistent tasks. We would like to have the ability to resume downsampling tasks, instead of starting from scratch in case of ("retriable") failures. Instead of keeping track of the state of the task we just try to query the downsampling target index before starting the actual downsampling task, getting the latest document indexed and its tsid. If we find any, we just restart from that tsid, maybe overwriting a subset of the documents already indexed (documents with the same tsid whose timestamp is smaller then the one of the latest document). Querying the downsampling target index is possible after introducing a predictable naming scheme for the target index. This was not the case before, a part of the target index name was random. That is required to be able to start from where the task left. Note that since the ordering of documents is based on (tsid, timestamp) we do not include the timestamp in the query to avoid skipping downsampling of documents with a larger tsid but smaller timestamp. Note also that we are not removing the TransportDownsampleIndexerAction for backward compatibility. An older master node running a previous version would not use persistent tasks and an API call would result in something like "missing downsample action". We are also adding an additional downsample rest query parameter, timeout. It allows a user to set the maximum time taken while waiting for a downsampling task to complete before returning with an timeout error. Note that when the timeout triggers the task will still run under control of the executor, just waiting for it to finish will result in a timeout. The default value for this timeout is 1 day. Closes elastic#93582 --------- Co-authored-by: Martijn van Groningen <martijn.v.groningen@gmail.com>

If the downsampling index exists but it's still downsampling we should invocate the downsample transport action again (and wire up the request listener such that the ILM listener gets notified of success or failure) This modifies the test to assert the ILM listener is invoked and removes an invariant that doesn't hold anymore in ILM (i.e. previously before elastic#97557 but now we can, and do) Markin as non-issue as this hasn't been released yet.

All nodes on the mixed cluster need to be at least on version 8.10 since PR elastic#97557 introduced execution of downsampling tasks using the persisten task framework which is incompatible with how execution was coordinated before.

All nodes on the mixed cluster need to be at least on version 8.10 since PR #97557 introduced execution of downsampling tasks using the persisten task framework which is incompatible with how execution was coordinated before.

All nodes on the mixed cluster need to be at least on version 8.10 since PR elastic#97557 introduced execution of downsampling tasks using the persisten task framework which is incompatible with how execution was coordinated before.

…06944) All nodes on the mixed cluster need to be at least on version 8.10 since PR #97557 introduced execution of downsampling tasks using the persisten task framework which is incompatible with how execution was coordinated before.

feature: run downsampling using persistent tasks

a15ea88

salvatore-campagna self-assigned this Jul 11, 2023

salvatore-campagna added :StorageEngine/Downsampling Downsampling (replacement for rollups) - Turn fine-grained time-based data into coarser-grained data >enhancement labels Jul 11, 2023

elasticsearchmachine added v8.10.0 Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) labels Jul 11, 2023

salvatore-campagna added 2 commits July 11, 2023 10:50

Update docs/changelog/97557.yaml

c18c2df

Update docs/changelog/97557.yaml

8636ad7

fix: npe and exceptions

d61bd7a

salvatore-campagna commented Jul 11, 2023

View reviewed changes

salvatore-campagna added 4 commits July 12, 2023 08:40

fix: mark persistent task completion

5cce3a3

fix: resume task from latest written tsid

93113f3

fix: do not delete the downsampling target index

e5ff5db

fix: handle existence of downsampling task

6c25089

martijnvg reviewed Jul 12, 2023

View reviewed changes

salvatore-campagna added 9 commits July 12, 2023 17:36

fix: introduce downsampling request timeout

5c9a418

fix: wait for existing task

46a166a

fix: checkstyle code format violation

0829c62

fix: just one one byte to serialize the enum

4d71ad6

fix: serielization of state and params

69be879

fix: include registry in test

a06ccab

fix: compilation error

45f0d99

fix: compilation error

f8c3837

fix: downsampling tests

33af909

martijnvg added 7 commits August 13, 2023 17:31

Merge remote-tracking branch 'es/main' into feature/93582-downsample-…

858bf43

…persistent-task

checkstyle

fc1a4e0

fixed version mistake

7dd1d91

stop using LifecycleState.downsampleIndexName()

b7d502d

rename rollup to downsample in a couple of places

6de0264

revert using source index name as part of persistent task id, other w…

8d2df09

…e can't execute multiple downsample operations for the same source index concurrently.

Merge remote-tracking branch 'es/main' into feature/93582-downsample-…

8330f61

…persistent-task

martijnvg requested a review from andreidan August 14, 2023 04:54

andreidan approved these changes Aug 14, 2023

View reviewed changes

andreidan reviewed Aug 14, 2023

View reviewed changes

martijnvg added 2 commits August 15, 2023 09:30

Merge remote-tracking branch 'es/main' into feature/93582-downsample-…

0dd5c0b

…persistent-task

tweaked disruption integration tests.

cf47109

martijnvg approved these changes Aug 15, 2023

View reviewed changes

martijnvg merged commit a52ab89 into elastic:main Aug 15, 2023

andreidan mentioned this pull request Aug 15, 2023

Make sure the downsample step is not stuck #98477

Merged

martijnvg mentioned this pull request Aug 17, 2023

Invoking downsampling results in a security exception #98569

Closed

This was referenced Mar 29, 2024

Unable to restart uncompleted downsampling tasks in ES 8.13 and above #106880

Closed

[CI] MixedClusterDownsampleRestIT test {p0=downsample/10_basic/Downsample index} failing #106937

Closed

salvatore-campagna mentioned this pull request Mar 30, 2024

Update mixed cluster test skip version for downsampling #106942

Merged

salvatore-campagna mentioned this pull request Feb 17, 2025

Remove TransportDownsampleIndexerAction #122756

Merged

Conversation

salvatore-campagna commented Jul 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 11, 2023

Uh oh!

elasticsearchmachine commented Jul 11, 2023

Uh oh!

elasticsearchmachine commented Jul 11, 2023

Uh oh!

salvatore-campagna commented Jul 11, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna commented Jul 11, 2023

Uh oh!

martijnvg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna Jul 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

salvatore-campagna Jul 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andreidan commented Aug 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

salvatore-campagna commented Jul 11, 2023 •

edited

Loading

salvatore-campagna Jul 13, 2023 •

edited

Loading

salvatore-campagna Jul 14, 2023 •

edited

Loading