-
Notifications
You must be signed in to change notification settings - Fork 3.3k
VERY slow large blob downloads #10572
Copy link
Copy link
Closed
Labels
ClientThis issue points to a problem in the data-plane of the library.This issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.StorageStorage Service (Queues, Blobs, Files)Storage Service (Queues, Blobs, Files)bugThis issue requires a change to an existing behavior in the product in order to be resolved.This issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.
Description
I am confused about how to optimize BlobClient for downloading large blobs (up to 100 GB).
For example, on a ~480 MB blob the following code takes around 4 minutes to execute:
full_path_to_file = '{}/{}'.format(staging_path,blob_name)
blob = BlobClient.from_connection_string(conn_str=connection_string, container_name=container_name, blob_name=blob_name)
with open(full_path_to_file, "wb") as my_blob:
download_stream = blob.download_blob()
result = my_blob.write(download_stream.readall())
In the previous version of the SDK I was able to specify a max_connections parameter that sped download significantly. This appears to have been removed (along with progress callbacks, which is annoying). I have files upwards of 99 GB which will take almost 13 hours to download at this rate, whereas I used to be able to download similar files in under two hours.
How can I optimize the download of large blobs?
Thank you!
Edit: I meant that it took 4 minutes to download a 480 megabyte file. Also, I am getting memory errors when trying to download larger files (~40 GB).
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
ClientThis issue points to a problem in the data-plane of the library.This issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.Workflow: This issue is responsible by Azure service team.StorageStorage Service (Queues, Blobs, Files)Storage Service (Queues, Blobs, Files)bugThis issue requires a change to an existing behavior in the product in order to be resolved.This issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.Issues that are reported by GitHub users external to the Azure organization.