[tune] Use newest checkpoint in normal operation#7563
[tune] Use newest checkpoint in normal operation#7563richardliaw merged 7 commits intoray-project:masterfrom
Conversation
|
Can you add this test in? This test explicitly uses |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
Test FAILed. |
|
jenkins test tune |
|
Test FAILed. |
|
Test FAILed. |
|
jenkins test tune |
|
Test FAILed. |
|
|
||
| runner.step() | ||
| runner.step() # Start trial | ||
| runner.step() # Process result |
There was a problem hiding this comment.
btw, why do we need to process results here?
There was a problem hiding this comment.
we want there to be a result associated with the checkpoint, otherwise we can't tell which checkpoint (persistent or memory) to return since its tied on training iteration (-1 for no result).
Instead of determining trial checkpoint to use by PAUSED vs not PAUSED, we should determine by ERROR vs not ERROR. In the latter case use the newest checkpoint (whether it is in-memory or persistent).
This fixes bug for #7528 (case where trial is unpaused but we still want to use the in-memory checkpoint).
Checks
scripts/format.shto lint the changes in this PR.