Skip to content

MediaIoBaseDownload next_chunk() Range header off-by-one #1593

@jvtm

Description

@jvtm

Environment details

  • OS type and version:
Ubuntu 20.04.3 LTS
  • Python version: python --version
Python 3.9.7
  • pip version: pip --version
pip 21.3.1
  • google-api-python-client version: pip show google-api-python-client
Name: google-api-python-client
Version: 2.29.0
Summary: Google API Client Library for Python
Home-page: https://github.com/googleapis/google-api-python-client/
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Requires: google-api-core, google-auth, google-auth-httplib2, httplib2, uritemplate

Steps to reproduce

  1. Use MediaIoBaseDownload(..., chunksize=1024) for downloading a file from Google Drive (see https://developers.google.com/drive/api/v3/manage-downloads#python)
  2. Received chunk is 1025 bytes
  3. See examples / behavior defined in https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier

Code example

import logging
from io import BytesIO

logging.basicConfig(level="DEBUG")

# example / skipping drive, auth setup for obvious reasons

file_id = "foobar123456789"
chunk_size = 1024
file_obj = BytesIO()
request = drive.files().get_media(fileId=file_id)
downloader =  MediaIoBaseDownload(file_obj, request, chunksize=chunk_size)
done = False
while not done:
    # debug log the lower level HTTP request headers?
    status, done = downloader.next_chunk()
    logging.debug(
        "Download status %r %s/%s bytes (%.1f%%)",
        file_id,
        status.resumable_progress,
        status.total_size,
        status.progress() * 100,
    )
    if not done:
        assert file_obj.tell() == status.resumable_progress
        assert file_obj.tell() % chunk_size == 0
    else:
        assert file_obj.tell() == status.total_size

In real life chunk size should be few megabytes at least. Default seems to be 100MiB:

googleapiclient/http.py:DEFAULT_CHUNK_SIZE = 100 * 1024 * 1024

Stack trace

.

Patch

Is it worth making a PR, signing CLA? :D

diff --git a/googleapiclient/http.py b/googleapiclient/http.py
index 1b661e1b..927464e9 100644
--- a/googleapiclient/http.py
+++ b/googleapiclient/http.py
@@ -733,7 +733,7 @@ class MediaIoBaseDownload(object):
         headers = self._headers.copy()
         headers["range"] = "bytes=%d-%d" % (
             self._progress,
-            self._progress + self._chunksize,
+            self._progress + self._chunksize - 1,
         )
         http = self._request.http
 

Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions