Skip to content
This repository was archived by the owner on Mar 6, 2026. It is now read-only.
This repository was archived by the owner on Mar 6, 2026. It is now read-only.

Python SDK unable to download file due to checksum mismatch #204

@cloudryder

Description

@cloudryder

Object download failed complaining about checksum mismatch. Downloading the object through gsutils works fine.

./gcs-download-object.py
Traceback (most recent call last):
  File "./gcs-download-object.py", line 29, in <module>
    download_blob('##REDACTED##',
  File "./gcs-download-object.py", line 20, in download_blob
    blob.download_to_filename(destination_file_name)
  File "/usr/local/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1184, in download_to_filename
    client.download_blob_to_file(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/storage/client.py", line 719, in download_blob_to_file
    blob_or_uri._do_download(
  File "/usr/local/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 956, in _do_download
    response = download.consume(transport, timeout=timeout)
  File "/usr/local/lib/python3.8/site-packages/google/resumable_media/requests/download.py", line 171, in consume
    self._write_to_stream(result)
  File "/usr/local/lib/python3.8/site-packages/google/resumable_media/requests/download.py", line 120, in _write_to_stream
    raise common.DataCorruption(response, msg)
google.resumable_media.common.DataCorruption: Checksum mismatch while downloading:
  ##REDACTED##
The X-Goog-Hash header indicated an MD5 checksum of:
  lAhluFgTEwcNJDvTSap2fQ==
but the actual MD5 checksum of the downloaded contents was:
  61Kz/FQdqRvwqacGuwuFIA==

The Code itself is pretty straight forward:

#!/usr/bin/env python3.8
from google.cloud import storage
def download_blob(bucket_name, source_blob_name, destination_file_name):
    """Downloads a blob from the bucket."""
    # bucket_name = "your-bucket-name"
    # source_blob_name = "storage-object-name"
    # destination_file_name = "local/path/to/file"
    storage_client = storage.Client()
    bucket = storage_client.bucket(bucket_name)
    # Construct a client side representation of a blob.
    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve
    # any content from Google Cloud Storage. As we don't need additional data,
    # using `Bucket.blob` is preferred here.
    blob = bucket.blob(source_blob_name)
    blob.download_to_filename(destination_file_name)
    print(
        "Blob {} downloaded to {}.".format(
            source_blob_name, destination_file_name
        )
    )
download_blob('##REDACTED##',
              'remedia/mezzanines/Live/2018-06-24/M31_POL-COL_ESFUHD_06_24.mov', 'M31_POL-COL_ESFUHD_06_24.mov')

The file size is 2.3TB if that matters.

Following are the plugin versions

pip3.8 list
Package                  Version
------------------------ ---------
boto3                    1.17.13
botocore                 1.20.13
cachetools               4.2.1
certifi                  2020.12.5
cffi                     1.14.5
chardet                  4.0.0
google-api-core          1.26.0
google-auth              1.27.0
google-cloud-core        1.6.0
google-cloud-storage     1.36.0
google-crc32c            1.1.2
google-resumable-media   1.2.0
googleapis-common-protos 1.52.0
idna                     2.10
jmespath                 0.10.0
packaging                20.9
pip                      19.2.3
protobuf                 3.15.1
pyasn1                   0.4.8
pyasn1-modules           0.2.8
pycparser                2.20
pyparsing                2.4.7
python-dateutil          2.8.1
pytz                     2021.1
requests                 2.25.1
rsa                      4.7.1
s3transfer               0.3.4
setuptools               41.2.0
six                      1.15.0
urllib3                  1.26.3

I'm able to reproduce this issue for this file. I had downloaded several hundred objects with the same SDK. Not sure why its failing on this file.

Metadata

Metadata

Assignees

Labels

api: storageIssues related to the googleapis/google-resumable-media-python API.priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions