Skip to content

[Metrics] Make DeltaProducer thread detached#13565

Closed
simon-mo wants to merge 3 commits intoray-project:masterfrom
simon-mo:metrics-detach-thread
Closed

[Metrics] Make DeltaProducer thread detached#13565
simon-mo wants to merge 3 commits intoray-project:masterfrom
simon-mo:metrics-detach-thread

Conversation

@simon-mo
Copy link
Copy Markdown
Contributor

@simon-mo simon-mo commented Jan 19, 2021

Why are these changes needed?

This PR fixes a problem with Ray stats not being correctly cleaned up upon ray.shutdown(). This PR resets the global variables handlers_and views_ in StatsExporter upon shutdown. For DeltaProducer, shutdown is skipped because the global singleton thread cannot be restarted after ray.shutdown() followed by ray.init().

Related issue number

Closes #13532.

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@architkulkarni architkulkarni added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 20, 2021
Copy link
Copy Markdown
Collaborator

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a description to the PR. Let's also let @rkooo567 review as he wrote the original patch.

@rkooo567
Copy link
Copy Markdown
Contributor

rkooo567 commented Jan 20, 2021

I have some questions;

  1. Does this mean we'll share the harvest thread between job A and B? (ray.init() (job A) => ray.shutdown() => ray.init() (job B)).
  2. Also I am concerned that not terminating the harvest thread will cause weird issues like segfault on shutdown hook. If the harvest thread touches any other parts that are shut down before the process terminates, it can cause weird issues.
  3. Also, why can't we restart the thread? Isn't it just we kill all threads => we restart? What prevents us?

Also @ashione Please take a look!

@rkooo567 rkooo567 added @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. and removed tests-ok The tagger certifies test failures are unrelated and assumes personal liability. labels Jan 20, 2021
@ashione
Copy link
Copy Markdown
Member

ashione commented Jan 22, 2021

I have some questions;

  1. Does this mean we'll share the harvest thread between job A and B? (ray.init() (job A) => ray.shutdown() => ray.init() (job B)).
  2. Also I am concerned that not terminating the harvest thread will cause weird issues like segfault on shutdown hook. If the harvest thread touches any other parts that are shut down before the process terminates, it can cause weird issues.
  3. Also, why can't we restart the thread? Isn't it just we kill all threads => we restart? What prevents us?

Also @ashione Please take a look!

In my experience, we'd better to fix the above issues rather than just skip shutdown or make harvest thread as a detached thread.
@simon-mo Could you give us some clues about why these will happend.

@ashione
Copy link
Copy Markdown
Member

ashione commented Jan 22, 2021

Please hold on. I'll try to figure out why opencensus thread crashed before next weekend.

@simon-mo
Copy link
Copy Markdown
Contributor Author

@ashione any update?

@zhongchun
Copy link
Copy Markdown
Contributor

@ashione any update?

I have encountered the same problem and am looking for a solution. We'd better restart the thread.

@ashione
Copy link
Copy Markdown
Member

ashione commented Jan 26, 2021

@ashione any update?

@rkooo567 @simon-mo I add some unit tests in metric exporter client test, but no such failures? In that case, ray stats had been setup and shutdown twice.

@ashione
Copy link
Copy Markdown
Member

ashione commented Jan 26, 2021

@ashione any update?

@rkooo567 @simon-mo I add some unit tests in metric exporter client test, but no such failures? In that case, ray stats had been setup and shutdown twice.

#13708

@edoakes
Copy link
Copy Markdown
Collaborator

edoakes commented Feb 15, 2021

ping @simon-mo

@zhongchun
Copy link
Copy Markdown
Contributor

ping @simon-mo

We've tried to close DeltaProducer in #13708, but found some other problems. It may introduce new problems if we modify more codes in opencensus. So we suggest do not support reopen in driver.

@simon-mo simon-mo closed this Oct 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Metrics] Custom metrics don't work after calling ray.shutdown() followed by ray.init()

6 participants