Retrying with a backward compatible task type on unknown task type error in parallel indexing by jihoonson · Pull Request #8905 · apache/druid

jihoonson · 2019-11-19T09:04:45Z

Description

I think I don't like this patch much, but don't see a better solution. Please leave comments if anyone has a better idea.

TaskMonitor.submit() creates a sub task for a given spec submits it to the overlord. Here, if the task type was unknown to the overlord (can happen during a rolling update), the overlord would return an HTTP error to TaskMonitor. HttpIndexingServiceClient would throw an IllegalStateException if the HTTP response was not 200.

To handle this, I added a new method SubTaskSpec.newSubTaskWithBackwardCompatibleType() which will be called if SubTaskSpec.newSubTask() fails with an IllegalStateExceptoin with a message starting with "Could not resolve type id".

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

This change is

…ror in parallel indexing

gianm · 2019-11-19T23:44:52Z

...c/main/java/org/apache/druid/indexing/common/task/batch/parallel/SinglePhaseSubTaskSpec.java

+      @Override
+      public String getType()
+      {
+        return SinglePhaseSubTask.OLD_TYPE_NAME;


Will this really affect how the task is serialized? I thought the type field would end up based solely on whatever Jackson thinks the type code for that class is, based on @JsonTypeName or @JsonSubTypes annotations.

Could you include a test that makes sure it serializes correctly? (Maybe serialize it, then deserialize as a Map and check the type field.)

Returning a different type works, but the class should be registered on Jackson properly. I fixed it and added a unit test.

gianm · 2019-11-19T23:50:52Z

...-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/TaskMonitor.java

+  {
+    T task = spec.newSubTask(numAttempts);
+    try {
+      indexingServiceClient.runTask(task);


Will this approach be able to retry immediately, or will it have to exhaust retries in the indexingServiceClient first? (The loop in DruidLeaderClient)

Ideally, this detects a problem on the first submission, and doesn't need to exhaust retries before it moves on to trying the backwards-compatible type.

Could you check it, if you haven't already?

This won't retry immediately but will exhaust retries in DruidLeaderClient. Yes, ideally it should be able to detect the problem before it retries, but I'm not sure whether that refactoring is worth to do because 1) it's not easy to teach the logic of the caller to DruidLeaderClient, 2) it will happen only during a particular type of rolling update, 3) and retries won't take much time compared to the total indexing time.

jihoonson · 2019-11-20T02:24:57Z

@gianm thanks for the review. I also tested this patch with a cluster of an overlord of 0.15.0 and middleManagers of this patch.

gianm

👍 with the latest changes, thanks @jihoonson

…ror in parallel indexing (apache#8905) * Retrying with a backward compatible task type on unknown task type error in parallel indexing * Register legacy class; add a serde test

…ask type error in parallel indexing (#8905) (#8949) * Retrying with a backward compatible task type on unknown task type error in parallel indexing (#8905) * Retrying with a backward compatible task type on unknown task type error in parallel indexing * Register legacy class; add a serde test * Backport fix, use firehoses

Retrying with a backward compatible task type on unknown task type er…

0d8f7fc

…ror in parallel indexing

jihoonson added Bug Area - Batch Ingestion labels Nov 19, 2019

jihoonson added this to the 0.17.0 milestone Nov 19, 2019

gianm reviewed Nov 19, 2019

View reviewed changes

Register legacy class; add a serde test

e9d8d8e

gianm approved these changes Nov 20, 2019

View reviewed changes

gianm merged commit baefc65 into apache:master Nov 20, 2019

jon-wei mentioned this pull request Nov 27, 2019

[Backport] Retrying with a backward compatible task type on unknown task type error in parallel indexing (#8905) #8949

Merged

jon-wei mentioned this pull request Nov 29, 2019

0.16.1-incubating release notes #8972

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrying with a backward compatible task type on unknown task type error in parallel indexing#8905

Retrying with a backward compatible task type on unknown task type error in parallel indexing#8905
gianm merged 2 commits intoapache:masterfrom
jihoonson:superbatch-rolling-update

jihoonson commented Nov 19, 2019 •

edited

Loading

Uh oh!

gianm Nov 19, 2019

Uh oh!

jihoonson Nov 20, 2019

Uh oh!

gianm Nov 19, 2019

Uh oh!

jihoonson Nov 20, 2019

Uh oh!

jihoonson commented Nov 20, 2019

Uh oh!

gianm left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jihoonson commented Nov 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

gianm Nov 19, 2019

Choose a reason for hiding this comment

Uh oh!

jihoonson Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

gianm Nov 19, 2019

Choose a reason for hiding this comment

Uh oh!

jihoonson Nov 20, 2019

Choose a reason for hiding this comment

Uh oh!

jihoonson commented Nov 20, 2019

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jihoonson commented Nov 19, 2019 •

edited

Loading