Skip to content

Error in transfer_manager with worker_type="process" #1012

@tqa236

Description

@tqa236

Hello, thank you very much for the new transfer_manager module. I encountered the error below why trying to download in parallel with worker_type="process"

Environment details

  • OS type and version: Windows 11, WSL running Ubuntu 22.04
  • Python version: python --version: Python 3.10.6
  • pip version: pip --version: pip 23.0.1
  • google-cloud-storage version: pip show google-cloud-storage
Name: google-cloud-storage
Version: 2.8.0
Summary: Google Cloud Storage API client library
Home-page: https://github.com/googleapis/python-storage
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: <venv>
Requires: google-api-core, google-auth, google-cloud-core, google-resumable-media, requests
Required-by: gcsfs

Steps to reproduce

Run the code below.

It works with worker_type="thread", but not worker_type="process". The data is publicly available here

Code example

import google.cloud.storage as gcs
from google.cloud.storage import transfer_manager

client = gcs.Client()
bucket_name = "gcp-public-data-sentinel-2"
bucket = client.bucket(bucket_name=bucket_name)
remote_files = ["index.csv.gz"]

status = transfer_manager.download_many_to_path(
    bucket=bucket,
    blob_names=remote_files,
    destination_directory="/tmp",
    worker_type="process",
    max_workers=4,
)

Stack trace

TypeError                                 Traceback (most recent call last)
Cell In [1], line 9
      6 bucket = client.bucket(bucket_name=bucket_name)
      7 remote_files = ["index.csv.gz"]
----> 9 status = transfer_manager.download_many_to_path(
     10     bucket=bucket,
     11     blob_names=remote_files,
     12     destination_directory="/tmp",
     13     worker_type="process",
     14     max_workers=4,
     15 )

File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:69, in _deprecate_threads_param.<locals>.convert_threads_or_raise(*args, **kwargs)
     67     return func(*args, **kwargs)
     68 else:
---> 69     return func(*args, **kwargs)

File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:703, in download_many_to_path(bucket, blob_names, destination_directory, blob_name_prefix, download_kwargs, threads, deadline, create_directories, raise_exception, worker_type, max_workers)
    700         os.makedirs(directory, exist_ok=True)
    701     blob_file_pairs.append((bucket.blob(full_blob_name), path))
--> 703 return download_many(
    704     blob_file_pairs,
    705     download_kwargs=download_kwargs,
    706     deadline=deadline,
    707     raise_exception=raise_exception,
    708     worker_type=worker_type,
    709     max_workers=max_workers,
    710 )

File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:69, in _deprecate_threads_param.<locals>.convert_threads_or_raise(*args, **kwargs)
     67     return func(*args, **kwargs)
     68 else:
---> 69     return func(*args, **kwargs)

File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:346, in download_many(blob_file_pairs, download_kwargs, threads, deadline, raise_exception, worker_type, max_workers)
    338         if needs_pickling and not isinstance(path_or_file, str):
    339             raise ValueError(
    340                 "Passing in a file object is only supported by the THREAD worker type. Please either select THREAD workers, or pass in filenames only."
    341             )
    343         futures.append(
    344             executor.submit(
    345                 _call_method_on_maybe_pickled_blob,
--> 346                 _pickle_blob(blob) if needs_pickling else blob,
    347                 "download_to_filename"
    348                 if isinstance(path_or_file, str)
    349                 else "download_to_file",
    350                 path_or_file,
    351                 **download_kwargs,
    352             )
    353         )
    354     concurrent.futures.wait(
    355         futures, timeout=deadline, return_when=concurrent.futures.ALL_COMPLETED
    356     )
    358 results = []

File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:910, in _pickle_blob(blob)
    908 p.dispatch_table = copyreg.dispatch_table.copy()
    909 p.dispatch_table[Client] = _reduce_client
--> 910 p.dump(blob)
    911 return f.getvalue()

TypeError: cannot pickle '_cffi_backend.FFI' object

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: storageIssues related to the googleapis/python-storage API.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions