Retrying with a backward compatible task type on unknown task type error in parallel indexing#8905
Conversation
…ror in parallel indexing
| @Override | ||
| public String getType() | ||
| { | ||
| return SinglePhaseSubTask.OLD_TYPE_NAME; |
There was a problem hiding this comment.
Will this really affect how the task is serialized? I thought the type field would end up based solely on whatever Jackson thinks the type code for that class is, based on @JsonTypeName or @JsonSubTypes annotations.
Could you include a test that makes sure it serializes correctly? (Maybe serialize it, then deserialize as a Map and check the type field.)
There was a problem hiding this comment.
Returning a different type works, but the class should be registered on Jackson properly. I fixed it and added a unit test.
| { | ||
| T task = spec.newSubTask(numAttempts); | ||
| try { | ||
| indexingServiceClient.runTask(task); |
There was a problem hiding this comment.
Will this approach be able to retry immediately, or will it have to exhaust retries in the indexingServiceClient first? (The loop in DruidLeaderClient)
Ideally, this detects a problem on the first submission, and doesn't need to exhaust retries before it moves on to trying the backwards-compatible type.
Could you check it, if you haven't already?
There was a problem hiding this comment.
This won't retry immediately but will exhaust retries in DruidLeaderClient. Yes, ideally it should be able to detect the problem before it retries, but I'm not sure whether that refactoring is worth to do because 1) it's not easy to teach the logic of the caller to DruidLeaderClient, 2) it will happen only during a particular type of rolling update, 3) and retries won't take much time compared to the total indexing time.
|
@gianm thanks for the review. I also tested this patch with a cluster of an overlord of 0.15.0 and middleManagers of this patch. |
gianm
left a comment
There was a problem hiding this comment.
👍 with the latest changes, thanks @jihoonson
…ror in parallel indexing (apache#8905) * Retrying with a backward compatible task type on unknown task type error in parallel indexing * Register legacy class; add a serde test
…ror in parallel indexing (apache#8905) * Retrying with a backward compatible task type on unknown task type error in parallel indexing * Register legacy class; add a serde test
…ask type error in parallel indexing (#8905) (#8949) * Retrying with a backward compatible task type on unknown task type error in parallel indexing (#8905) * Retrying with a backward compatible task type on unknown task type error in parallel indexing * Register legacy class; add a serde test * Backport fix, use firehoses
Fixes #8836.
Description
I think I don't like this patch much, but don't see a better solution. Please leave comments if anyone has a better idea.
TaskMonitor.submit()creates a sub task for a given spec submits it to the overlord. Here, if the task type was unknown to the overlord (can happen during a rolling update), the overlord would return an HTTP error toTaskMonitor.HttpIndexingServiceClientwould throw anIllegalStateExceptionif the HTTP response was not 200.To handle this, I added a new method
SubTaskSpec.newSubTaskWithBackwardCompatibleType()which will be called ifSubTaskSpec.newSubTask()fails with anIllegalStateExceptoinwith a message starting with "Could not resolve type id".This PR has:
This change is