[tune] trainable restore_from_object fails with FileNotFound

### What is the problem?
When trainable tries to restore_from_object, the restore process fails because of the wrong path. After loading the checkpoint object, the method writes it in the tmpdir (see code snippet below) but passes tmpdir/checkpoint dir to restore function. Shouldn't the write happen to the tmpdir/checkpoint dir? 
Current code does this --
   path = os.path.join(tmpdir, relpath_name)  #<-------- written to tmpdir. This should be tmpdir/checkpoint?
   .....
  self.restore(checkpoint_path)  #<------ passing tmpdir/checkpoint dir to restore

Full function snippet below. 


*Ray version and other system information (Python version, TensorFlow version, OS):*
ray    - 0.9.0.dev0
python - 3.7.7
TF - 2.2.0

### Reproduction (REQUIRED)
#From the Trainable class 
def restore_from_object(self, obj):
        """Restores training state from a checkpoint object.

        These checkpoints are returned from calls to save_to_object().
        """
        info = pickle.loads(obj)
        data = info["data"]
        tmpdir = tempfile.mkdtemp("restore_from_object", dir=self.logdir)
        checkpoint_path = os.path.join(tmpdir, info["checkpoint_name"])

        for relpath_name, file_contents in data.items():
           path = os.path.join(tmpdir, relpath_name)  #<-------- written to tmpdir

            # This may be a subdirectory, hence not just using tmpdir
            os.makedirs(os.path.dirname(path), exist_ok=True)
            with open(path, "wb") as f:
                f.write(file_contents)

        self.restore(checkpoint_path)  #<------ passing tmpdir/checkpoint dir to restore
        shutil.rmtree(tmpdir)


If we cannot run your script, we cannot fix your issue.

- [x] I have verified my script runs in a clean environment and reproduces the issue.
- [x] I have verified the issue also occurs with the [latest wheels](https://docs.ray.io/en/latest/installation.html).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tune] trainable restore_from_object fails with FileNotFound #8772

What is the problem?

Reproduction (REQUIRED)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[tune] trainable restore_from_object fails with FileNotFound #8772

Description

What is the problem?

Reproduction (REQUIRED)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions