Skip to content

KAFKA-9258 Check Connect Metrics non-null in task stop#7768

Merged
rhauch merged 5 commits into
apache:trunkfrom
cyrusv:cyrus-metric-npe
Dec 4, 2019
Merged

KAFKA-9258 Check Connect Metrics non-null in task stop#7768
rhauch merged 5 commits into
apache:trunkfrom
cyrusv:cyrus-metric-npe

Conversation

@cyrusv

@cyrusv cyrusv commented Dec 2, 2019

Copy link
Copy Markdown
Contributor

Connect sometimes will stop task when start() has failed. In these cases, we must guard against NPE as noted in https://issues.apache.org/jira/browse/KAFKA-9258

@wicknicks wicknicks left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one comment


if (connectorStatusMetricsGroup != null) {
connectorStatusMetricsGroup.recordTaskRemoved(taskId);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not initialize this in the constructor (and only set the herder in the Worker#start() via a setter method in connectorStatusMetricsGroup)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unclear -- the connectorStatusMetricsGroup is not what's Null -- it actually is there. The issue is that the task is cleaned up. This case happens when a startTask() fails, and task is cleaned up in the "catch" block, then restarted and this code is hit again if you restart the failed task. You can see just moving the removal to after the check for task == null will be enoughl

@C0urante C0urante left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Built and tested this locally by creating a sink connector with invalid SSL configs for its consumer, then reconfiguring the connector with valid SSL configs, then restarting the failed task. The fix works as intended and the 500 error isn't thrown anymore.

It would be great if we could add a test to prevent a regression, though.

@C0urante C0urante left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this counts since I wrote the integration test... but LGTM

@rhauch can you take a look at this 2.4 blocker sometime today?

@cyrusv

cyrusv commented Dec 3, 2019

Copy link
Copy Markdown
Contributor Author

@wicknicks,
@C0urante has helped out with the testing, and the broader feature "restart a failed task" is now covered with integration test -- thanks @C0urante!

@rhauch rhauch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though there's one spelling mistake in a comment.

…tion/ConnectWorkerIntegrationTest.java

Co-Authored-By: Randall Hauch <rhauch@gmail.com>

@rhauch rhauch left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @cyrusv and @C0urante!

@wicknicks wicknicks left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks, @cyrusv!

@rhauch rhauch merged commit d3131a1 into apache:trunk Dec 4, 2019
@cyrusv cyrusv deleted the cyrus-metric-npe branch December 4, 2019 03:29
rhauch pushed a commit that referenced this pull request Dec 4, 2019
Remove nullcheck, and add integration tests for restarting a failed task.

Authors: Cyrus Vafadari <cyrus@confluent.io>, Chris Egerton <chrise@confluent.io>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants