Skip to content

VERY slow large blob downloads #10572

@argonaut76

Description

@argonaut76

I am confused about how to optimize BlobClient for downloading large blobs (up to 100 GB).

For example, on a ~480 MB blob the following code takes around 4 minutes to execute:

full_path_to_file = '{}/{}'.format(staging_path,blob_name)
blob = BlobClient.from_connection_string(conn_str=connection_string, container_name=container_name, blob_name=blob_name)
with open(full_path_to_file, "wb") as my_blob:
    download_stream = blob.download_blob()
    result = my_blob.write(download_stream.readall())

In the previous version of the SDK I was able to specify a max_connections parameter that sped download significantly. This appears to have been removed (along with progress callbacks, which is annoying). I have files upwards of 99 GB which will take almost 13 hours to download at this rate, whereas I used to be able to download similar files in under two hours.

How can I optimize the download of large blobs?

Thank you!

Edit: I meant that it took 4 minutes to download a 480 megabyte file. Also, I am getting memory errors when trying to download larger files (~40 GB).

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.Service AttentionWorkflow: This issue is responsible by Azure service team.StorageStorage Service (Queues, Blobs, Files)bugThis issue requires a change to an existing behavior in the product in order to be resolved.customer-reportedIssues that are reported by GitHub users external to the Azure organization.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions