KAFKA-9258 Check Connect Metrics non-null in task stop#7768
Conversation
|
|
||
| if (connectorStatusMetricsGroup != null) { | ||
| connectorStatusMetricsGroup.recordTaskRemoved(taskId); | ||
| } |
There was a problem hiding this comment.
why not initialize this in the constructor (and only set the herder in the Worker#start() via a setter method in connectorStatusMetricsGroup)?
There was a problem hiding this comment.
I was unclear -- the connectorStatusMetricsGroup is not what's Null -- it actually is there. The issue is that the task is cleaned up. This case happens when a startTask() fails, and task is cleaned up in the "catch" block, then restarted and this code is hit again if you restart the failed task. You can see just moving the removal to after the check for task == null will be enoughl
C0urante
left a comment
There was a problem hiding this comment.
Built and tested this locally by creating a sink connector with invalid SSL configs for its consumer, then reconfiguring the connector with valid SSL configs, then restarting the failed task. The fix works as intended and the 500 error isn't thrown anymore.
It would be great if we could add a test to prevent a regression, though.
KAFKA-9258: Add integration test for restarting a failed task
|
@wicknicks, |
rhauch
left a comment
There was a problem hiding this comment.
LGTM, though there's one spelling mistake in a comment.
…tion/ConnectWorkerIntegrationTest.java Co-Authored-By: Randall Hauch <rhauch@gmail.com>
Remove nullcheck, and add integration tests for restarting a failed task. Authors: Cyrus Vafadari <cyrus@confluent.io>, Chris Egerton <chrise@confluent.io> Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
Connect sometimes will stop task when start() has failed. In these cases, we must guard against NPE as noted in https://issues.apache.org/jira/browse/KAFKA-9258