Skip to content

refactor s3 submodule to minimize resource usage#569

Merged
mpenkov merged 7 commits intodevelopfrom
s3_refactor
Dec 27, 2020
Merged

refactor s3 submodule to minimize resource usage#569
mpenkov merged 7 commits intodevelopfrom
s3_refactor

Conversation

@mpenkov
Copy link
Copy Markdown
Collaborator

@mpenkov mpenkov commented Dec 18, 2020

Creating sessions and resources costs time and memory. If possible, smart_open should avoid creating resources by itself, and allow the user to specify them up front.

Here are some CPU-time benchmark results:

$ time python benchmark/read_s3.py < benchmark/urls.txt
real    1m20.786s
user    0m8.619s
sys     0m0.894s

$ time python benchmark/read_s3.py create_session < benchmark/urls.txt
real    1m45.826s
user    0m4.554s
sys     0m0.149s

$ time python benchmark/read_s3.py create_resource < benchmark/urls.txt
real    0m22.046s
user    0m1.474s
sys     0m0.065s

$ time python benchmark/read_s3.py create_session_and_resource < benchmark/urls.txt
real    0m21.086s
user    0m1.496s

There are memory benefits as well, but I didn't benchmark them, because the CPU benchmarks were compelling enough.

creating sessions and resources costs time and memory
$ time python benchmark/read_s3.py < benchmark/urls.txt
real    1m20.786s
user    0m8.619s
sys     0m0.894s

$ time python benchmark/read_s3.py create_session < benchmark/urls.txt
real    1m45.826s
user    0m4.554s
sys     0m0.149s

$ time python benchmark/read_s3.py create_resource < benchmark/urls.txt
real    0m22.046s
user    0m1.474s
sys     0m0.065s

$ time python benchmark/read_s3.py create_session_and_resource < benchmark/urls.txt
real    0m21.086s
user    0m1.496s
sys     0m0.073s
Comment thread smart_open/s3.py Outdated
Comment thread benchmark/read_s3.py Outdated
Comment thread smart_open/s3.py
Comment thread smart_open/s3.py
mpenkov and others added 3 commits December 19, 2020 08:08
Co-authored-by: Radim Řehůřek <radimrehurek@seznam.cz>
Comment thread howto.md Outdated
@mpenkov mpenkov merged commit 74afb2a into develop Dec 27, 2020
@mpenkov mpenkov deleted the s3_refactor branch December 27, 2020 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants