Skip to content

Add option to optionally skip downloading the contents of a file#79

Merged
manuzhang merged 2 commits intomanuzhang:mainfrom
benjamb:max-bytes
Mar 4, 2024
Merged

Add option to optionally skip downloading the contents of a file#79
manuzhang merged 2 commits intomanuzhang:mainfrom
benjamb:max-bytes

Conversation

@benjamb
Copy link
Copy Markdown
Contributor

@benjamb benjamb commented Mar 1, 2024

This PR does a couple of things:

1 . The first commit prevents the downloading of entire files into memory by streaming the response and reading small chunks as a time.
2. The second commit adds the configuration option to optionally skip downloading of a remote URLs content.

Closes #76.

Comment on lines +157 to +159
# Download the entire contents as to not break previous behaviour.
for _ in response.iter_content(chunk_size=1024):
pass
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't actually care about downloading the contents (we seemingly don't do any verification of contents), then we could always just drop these lines, with stream=True set the body of the response isn't downloaded automatically.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't as long as the url status is correct.

Copy link
Copy Markdown
Contributor Author

@benjamb benjamb Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just simplify this MR to just skip the download entirely and drop the read_max_bytes configuration? I'm hesitant to break existing behaviour, so I'll list out a few possible routes this MR could go:

  1. Keep this MR as it is, with read_max_bytes limiting the size of downloaded content.
  2. Opt for a different configuration value, such as skip_downloads, which triggers this new behaviour of skipping the download of the response's body.
  3. Don't bother adding any configurable behaviour and just establish a connection via stream=True and skip any downloads.
    @manuzhang Which would be your preference?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer option 2.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manuzhang I've implemented option 2 and updated this MR, as well as the original issue title.

@benjamb benjamb changed the title Add option to read max N bytes of a file Add option to optionally skip downloading the contents of a file Mar 4, 2024
@manuzhang manuzhang merged commit fce60c9 into manuzhang:main Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add option to prevent downloading the entirety of a file

2 participants