Skip to content

[AIR] Syncing when upload_url="s3://<my_s3_bucket>?<endpoint_and_other_argument_overrides>" #29845

@tbukic

Description

@tbukic

What happened + What you expected to happen

When using non-AWS S3 API supporting storage, it seems that to get past _validate_upload_dir(sync_config) in L454 tune.py an upload_dir has to be provided in the shape of s3://<my_s3_bucket>?endpoint_override=<endpoint_url[&possible_other_args_like_region]>.

In that case, syncer gets an upload url in form s3://<my_s3_bucket>?endpoint_override=<endpoint_url>/<ray_defined_upload_path>. Thus, 'auto'-configured Syncer fails to upload files since PyArrow needs the path to be in the form of s3://<my_s3_bucket>/<ray_defined_upload_path>?endpoint_override=<endpoint_url>.

Construction os.path.join(self.local_dir, self.dir_name) in experiment.py seems to be responsible for this. This can be fixed with custom syncer which fixes URIs, but Ray should take care of passing correct URI unless another way to provide endpoint_override exists. As far as I know no other way is specified either in Ray's documentation or in the documentation of Ray's dependencies.

Example of almost-working syncer in the case when endpoint_override is defined in upload_dir variable.

def fix_uri(uri: str) -> str:
    bucket, rest = uri.split('?', 1)
    querry, path = rest.split('/', 1)
    uri = f'{os.path.join(bucket, path)}/?{querry}'
    return uri


class FixedSyncer(_DefaultSyncer):
    def _sync_up_command(
        self, local_path: str, uri: str, exclude: List | None = None
    ) -> Tuple[Callable, Dict]:
        uri = fix_uri(uri)
        return super()._sync_up_command(local_path, uri, exclude)

    def _sync_down_command(self, uri: str, local_path: str) -> Tuple[Callable, Dict]:
        uri = fix_uri(uri)
        return super()._sync_down_command(uri, local_path)

    def _delete_command(self, uri: str) -> Tuple[Callable, Dict]:
        uri = fix_uri(uri)
        return super()._delete_command(uri)

Second problem is that, when uploading checkpoints, sign '=' in tune-defined filename doesn't get accepted by either PyArrow or my S3 provider (OVH cloud). I'm not sure if this is OVH specific problem and how serious Ray think this issue is in syncing.

Versions / Dependencies

ray 2.0.1

Reproduction script

Please set the breakpoint at the first line of _sync_up_command in FixedSyncer (or _DefaultSyncer if you wish to drop syncer argument from the example).
To get to this point you'll need either an existing custom non-aws s3 storage and correct credentials, or comment out _validate_upload_dir(sync_config) in tune.py because bad path can be seen even before needing to interact with the actual storage.

tune.Tuner(
        "PPO",
        run_config=air.RunConfig(
            stop={"episode_reward_mean": 200},
            sync_config=SyncConfig(
                syncer=FixedSyncer(sync_period=60),
                upload_dir=your_upload_dir_**containing_endpoint_override**
            ),
        ),
        tune_config=tune.TuneConfig(
            metric="episode_reward_mean",
            mode="max",
        ),
        param_space={
            "env": "CartPole-v1",
            "framework": "torch",
            "num_gpus": 0,
            "num_workers": 1,
            "lr": tune.grid_search([0.01, 0.001])
        },
).fit()

Replicating problems created by '=' sign in S3 filename for me includes creating the breakpoint at remote_storage.py#L209
and executing edited line in debugger terminal: pyarrow.fs.copy_files(local_path, bucket_path.replace('=', '-'), destination_filesystem=fs) (passes) before/instead executing the original line (fails). For this part OVH s3 bucket is necessary, and I'm unsure about the other providers.

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Metadata

Metadata

Assignees

Labels

P1Issue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn't

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions