[tune] Component notification on node failure + Tests#3414
[tune] Component notification on node failure + Tests#3414richardliaw merged 14 commits intoray-project:masterfrom
Conversation
|
Test FAILed. |
| from ray.tune.suggest import BasicVariantGenerator | ||
|
|
||
|
|
||
| def register_test_trainable(): |
There was a problem hiding this comment.
Removed in favor of __fake
| node = nodes.pop() | ||
| cluster.remove_node(node) | ||
| assert cluster.wait_for_nodes() | ||
| assert ray.global_state.cluster_resources()["CPU"] == 1 |
There was a problem hiding this comment.
This test previously didn't test Tune's resource tracking - updated test
| trial_executor.start_trial(trial) | ||
| except Exception as e: | ||
| self.assertIn("a class", str(e)) | ||
|
|
There was a problem hiding this comment.
This test actually didn't actually work because start_trial didn't throw; I rewrote this test and moved it to ray_trial_executor.py.
| self.start_trial(trial) | ||
| else: | ||
| trial.status = Trial.PENDING | ||
|
|
There was a problem hiding this comment.
Moved to trial_runner.try_recover so for better handling and ability to notify other components.
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
Changes include:
This is a subset of changes of #3309, so this should go in before.
TODO:
try_recover