-
Notifications
You must be signed in to change notification settings - Fork 172
Closed
Labels
api: storageIssues related to the googleapis/python-storage API.Issues related to the googleapis/python-storage API.
Description
Hello, thank you very much for the new transfer_manager module. I encountered the error below why trying to download in parallel with worker_type="process"
Environment details
- OS type and version: Windows 11, WSL running Ubuntu 22.04
- Python version:
python --version: Python 3.10.6 - pip version:
pip --version: pip 23.0.1 google-cloud-storageversion:pip show google-cloud-storage
Name: google-cloud-storage
Version: 2.8.0
Summary: Google Cloud Storage API client library
Home-page: https://github.com/googleapis/python-storage
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Location: <venv>
Requires: google-api-core, google-auth, google-cloud-core, google-resumable-media, requests
Required-by: gcsfs
Steps to reproduce
Run the code below.
It works with worker_type="thread", but not worker_type="process". The data is publicly available here
Code example
import google.cloud.storage as gcs
from google.cloud.storage import transfer_manager
client = gcs.Client()
bucket_name = "gcp-public-data-sentinel-2"
bucket = client.bucket(bucket_name=bucket_name)
remote_files = ["index.csv.gz"]
status = transfer_manager.download_many_to_path(
bucket=bucket,
blob_names=remote_files,
destination_directory="/tmp",
worker_type="process",
max_workers=4,
)Stack trace
TypeError Traceback (most recent call last)
Cell In [1], line 9
6 bucket = client.bucket(bucket_name=bucket_name)
7 remote_files = ["index.csv.gz"]
----> 9 status = transfer_manager.download_many_to_path(
10 bucket=bucket,
11 blob_names=remote_files,
12 destination_directory="/tmp",
13 worker_type="process",
14 max_workers=4,
15 )
File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:69, in _deprecate_threads_param.<locals>.convert_threads_or_raise(*args, **kwargs)
67 return func(*args, **kwargs)
68 else:
---> 69 return func(*args, **kwargs)
File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:703, in download_many_to_path(bucket, blob_names, destination_directory, blob_name_prefix, download_kwargs, threads, deadline, create_directories, raise_exception, worker_type, max_workers)
700 os.makedirs(directory, exist_ok=True)
701 blob_file_pairs.append((bucket.blob(full_blob_name), path))
--> 703 return download_many(
704 blob_file_pairs,
705 download_kwargs=download_kwargs,
706 deadline=deadline,
707 raise_exception=raise_exception,
708 worker_type=worker_type,
709 max_workers=max_workers,
710 )
File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:69, in _deprecate_threads_param.<locals>.convert_threads_or_raise(*args, **kwargs)
67 return func(*args, **kwargs)
68 else:
---> 69 return func(*args, **kwargs)
File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:346, in download_many(blob_file_pairs, download_kwargs, threads, deadline, raise_exception, worker_type, max_workers)
338 if needs_pickling and not isinstance(path_or_file, str):
339 raise ValueError(
340 "Passing in a file object is only supported by the THREAD worker type. Please either select THREAD workers, or pass in filenames only."
341 )
343 futures.append(
344 executor.submit(
345 _call_method_on_maybe_pickled_blob,
--> 346 _pickle_blob(blob) if needs_pickling else blob,
347 "download_to_filename"
348 if isinstance(path_or_file, str)
349 else "download_to_file",
350 path_or_file,
351 **download_kwargs,
352 )
353 )
354 concurrent.futures.wait(
355 futures, timeout=deadline, return_when=concurrent.futures.ALL_COMPLETED
356 )
358 results = []
File ~/<venv>/lib/python3.10/site-packages/google/cloud/storage/transfer_manager.py:910, in _pickle_blob(blob)
908 p.dispatch_table = copyreg.dispatch_table.copy()
909 p.dispatch_table[Client] = _reduce_client
--> 910 p.dump(blob)
911 return f.getvalue()
TypeError: cannot pickle '_cffi_backend.FFI' object
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
api: storageIssues related to the googleapis/python-storage API.Issues related to the googleapis/python-storage API.