Skip to content

Conversation

@jayadevanmurali
Copy link
Contributor

Make StreamingContext.stop() exception-safe

Make StreamingContext.stop() exception-safe
@jayadevanmurali
Copy link
Contributor Author

Exceptions are handled for each of the operations in the stop method, so that any exceptions does not abort rest of the statements

@srowen
Copy link
Member

srowen commented Jan 18, 2016

@jayadevanmurali provide a better title/description please. "Update" doesn't say what this is about. It matters since it becomes the commit message

@jayadevanmurali jayadevanmurali changed the title Update StreamingContext.scala [SPARK-11137] [SPARK-11137][Streaming] Make StreamingContext.stop() exception-safe Jan 18, 2016
@jayadevanmurali
Copy link
Contributor Author

@srowen sure and updated the title

@srowen
Copy link
Member

srowen commented Jan 18, 2016

LGTM

@SparkQA
Copy link

SparkQA commented Jan 18, 2016

Test build #2399 has finished for PR 10807 at commit 39bcfb5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the primary goal of this JIRA is to allow partial clean-up and retry on stop() calls.

In this specific code path, it is already written in a way to allow for retry by setting the state to STOPPED only almost at the end on line 728 in the original code.

tryLogNonFatalError swallows and logs "non-fatal" exception, and with that added, despite any non-critical error thrown it could reach the line state = STOPPED. For instance, if env.metricsSystem.removeSource() throws then it will continue on and setting state to STOPPED, at which point the caller cannot get back to the same code to retry cleanup because of the state match case above.

Is that what we want?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My interpretation of the purpose was a little different. If cleanup involves running A, B and C, then we want B and C to try to run even if A fails. I didn't think we necessarily expected the caller to re-try stop() since I don't think that's usual. Yes, a fatal error will still cause the whole thing to stop but in my mind a fatal error means lots of bets are off. This is also for consistency with how SparkContext.stop() works.

@jayadevanmurali
Copy link
Contributor Author

@srowen Thanks for the detailed comment. Can you merge the changes to master branch ?

@srowen
Copy link
Member

srowen commented Jan 21, 2016

@felixcheung WDYT?

@felixcheung
Copy link
Member

@srowen it's good - I agree asking the caller to retry stop() or even put it in a loop is too far off. It might happen when someone is running from spark-shell or similar but likely it's an edge case.
Try to run as much as we could at stop() seems fair.

@thomastechs
Copy link
Contributor

This fix would be executed in similar consistent way it has been fixed in SparkContext.stop() .I think this is fine.

@srowen
Copy link
Member

srowen commented Jan 23, 2016

Merged to master

@asfgit asfgit closed this in 5f56980 Jan 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants