-
Notifications
You must be signed in to change notification settings - Fork 2.6k
MediaIoBaseDownload next_chunk() Range header off-by-one #1593
Copy link
Copy link
Closed
Labels
priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Environment details
- OS type and version:
Ubuntu 20.04.3 LTS
- Python version:
python --version
Python 3.9.7
- pip version:
pip --version
pip 21.3.1
google-api-python-clientversion:pip show google-api-python-client
Name: google-api-python-client
Version: 2.29.0
Summary: Google API Client Library for Python
Home-page: https://github.com/googleapis/google-api-python-client/
Author: Google LLC
Author-email: googleapis-packages@google.com
License: Apache 2.0
Requires: google-api-core, google-auth, google-auth-httplib2, httplib2, uritemplate
Steps to reproduce
- Use
MediaIoBaseDownload(..., chunksize=1024)for downloading a file from Google Drive (see https://developers.google.com/drive/api/v3/manage-downloads#python) - Received chunk is 1025 bytes
- See examples / behavior defined in https://httpwg.org/specs/rfc7233.html#rule.ranges-specifier
Code example
import logging
from io import BytesIO
logging.basicConfig(level="DEBUG")
# example / skipping drive, auth setup for obvious reasons
file_id = "foobar123456789"
chunk_size = 1024
file_obj = BytesIO()
request = drive.files().get_media(fileId=file_id)
downloader = MediaIoBaseDownload(file_obj, request, chunksize=chunk_size)
done = False
while not done:
# debug log the lower level HTTP request headers?
status, done = downloader.next_chunk()
logging.debug(
"Download status %r %s/%s bytes (%.1f%%)",
file_id,
status.resumable_progress,
status.total_size,
status.progress() * 100,
)
if not done:
assert file_obj.tell() == status.resumable_progress
assert file_obj.tell() % chunk_size == 0
else:
assert file_obj.tell() == status.total_sizeIn real life chunk size should be few megabytes at least. Default seems to be 100MiB:
googleapiclient/http.py:DEFAULT_CHUNK_SIZE = 100 * 1024 * 1024
Stack trace
.
Patch
Is it worth making a PR, signing CLA? :D
diff --git a/googleapiclient/http.py b/googleapiclient/http.py
index 1b661e1b..927464e9 100644
--- a/googleapiclient/http.py
+++ b/googleapiclient/http.py
@@ -733,7 +733,7 @@ class MediaIoBaseDownload(object):
headers = self._headers.copy()
headers["range"] = "bytes=%d-%d" % (
self._progress,
- self._progress + self._chunksize,
+ self._progress + self._chunksize - 1,
)
http = self._request.http
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.