-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-11137][Streaming] Make StreamingContext.stop() exception-safe #10807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-11137][Streaming] Make StreamingContext.stop() exception-safe #10807
Conversation
Make StreamingContext.stop() exception-safe
|
Exceptions are handled for each of the operations in the stop method, so that any exceptions does not abort rest of the statements |
|
@jayadevanmurali provide a better title/description please. "Update" doesn't say what this is about. It matters since it becomes the commit message |
|
@srowen sure and updated the title |
|
LGTM |
|
Test build #2399 has finished for PR 10807 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the primary goal of this JIRA is to allow partial clean-up and retry on stop() calls.
In this specific code path, it is already written in a way to allow for retry by setting the state to STOPPED only almost at the end on line 728 in the original code.
tryLogNonFatalError swallows and logs "non-fatal" exception, and with that added, despite any non-critical error thrown it could reach the line state = STOPPED. For instance, if env.metricsSystem.removeSource() throws then it will continue on and setting state to STOPPED, at which point the caller cannot get back to the same code to retry cleanup because of the state match case above.
Is that what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My interpretation of the purpose was a little different. If cleanup involves running A, B and C, then we want B and C to try to run even if A fails. I didn't think we necessarily expected the caller to re-try stop() since I don't think that's usual. Yes, a fatal error will still cause the whole thing to stop but in my mind a fatal error means lots of bets are off. This is also for consistency with how SparkContext.stop() works.
|
@srowen Thanks for the detailed comment. Can you merge the changes to master branch ? |
|
@felixcheung WDYT? |
|
@srowen it's good - I agree asking the caller to retry stop() or even put it in a loop is too far off. It might happen when someone is running from spark-shell or similar but likely it's an edge case. |
|
This fix would be executed in similar consistent way it has been fixed in SparkContext.stop() .I think this is fine. |
|
Merged to master |
Make StreamingContext.stop() exception-safe