Make reindexing managed by a persistent task#43382
Make reindexing managed by a persistent task#43382Tim-Brooks merged 66 commits intoelastic:reindex_v2from
Conversation
|
Pinging @elastic/es-distributed |
|
This PR still needs a few cleanups. I will assign reviewers and comment when it is ready. |
|
@henningandersen @ywelsch I have adjusted this PR for some change in the direction of returning the task-id from the allocated persistent task. I have not adjusted it to work with rethrottling yet. But it appears that the other test are passing. |
henningandersen
left a comment
There was a problem hiding this comment.
LGTM.
This approach looks good. Apart from minor comments I think it is ready to merge to feature branch. Feel free to defer some of them to follow-ups if that is easier.
| * task can't totally validate until it starts but this is better than | ||
| * nothing. | ||
| */ | ||
| ActionRequestValidationException validationException = internal.getReindexRequest().validate(); |
There was a problem hiding this comment.
This can go outside the if now.
| } | ||
| } | ||
|
|
||
| public static class Response extends AcknowledgedResponse { |
There was a problem hiding this comment.
It seems we either return acknowledged=true or an exception. Could this just be an ActionResponse then?
|
|
||
| import java.util.List; | ||
|
|
||
| public class TransportGetReindexJobTaskAction extends TransportTasksAction<ReindexTask, GetReindexJobTaskAction.Request, |
There was a problem hiding this comment.
This looks unused now?
| rethrottle(logger, clusterService.localNode().getId(), client, bulkByScrollTask, request.getRequestsPerSecond(), listener); | ||
| } else if (task instanceof ReindexTask) { | ||
| BulkByScrollTask childTask = null; | ||
| for (Task task1 : taskManager.getTasks().values()) { |
There was a problem hiding this comment.
task1 rename to childCandidate?
| return jobException; | ||
| } | ||
|
|
||
| public TaskId getTaskId() { |
There was a problem hiding this comment.
Maybe rename to getEphemeralTaskId(), it could make call-site code a little easier to read (at least I mixed it up with the persistent task id).
| builder.field("running_time_in_nanos", runningTimeNanos); | ||
| builder.field("cancellable", cancellable); | ||
| if (parentTaskId.isSet()) { | ||
| if (parentTaskId.isSet() && parentTaskId.getNodeId().equals("cluster") == false) { |
There was a problem hiding this comment.
Let us add a todo here, to ensure we revisit this. I think this means that cluster: parent ids no longer appear in the _tasks output, which could harm diagnostics/troubleshooting.
|
@ywelsch - This is probably ready for another look. I have address Henning's comments. The primary issue currently is due to a task submitting another task, there are some races with the child tasks being available for a follow-up rethrottle action. I have currently resolved this for testing purposes by a spin loop waiting for all the tasks to be available in Rethrottle action. I think the fix is to extract the |
ywelsch
left a comment
There was a problem hiding this comment.
I've left a few more minor comments, and one item (rethrottle) that we probably need to discuss
modules/reindex/src/main/java/org/elasticsearch/index/reindex/StartReindexJobAction.java
Outdated
Show resolved
Hide resolved
modules/reindex/src/main/java/org/elasticsearch/index/reindex/TransportReindexAction.java
Outdated
Show resolved
Hide resolved
modules/reindex/src/main/java/org/elasticsearch/index/reindex/TransportReindexAction.java
Outdated
Show resolved
Hide resolved
modules/reindex/src/main/java/org/elasticsearch/index/reindex/TransportRethrottleAction.java
Show resolved
Hide resolved
...es/reindex/src/main/java/org/elasticsearch/index/reindex/TransportStartReindexJobAction.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/reindex/ReindexTask.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/reindex/ReindexTask.java
Outdated
Show resolved
Hide resolved
|
@ywelsch I made the changes. |
This is related to #42612. Currently the reindexing transport action
creates a task on the local coordinator node. Unfortunately this is not
resilient to coordinator node failures. This commit adds a new action
that creates a reindexing job as a persistent task.