Skip to content

Refactor: Optimize create_test_file in benchmark.py#84

Merged
lxp merged 7 commits intocfv-project:python3from
mirusu400:python3
Nov 2, 2025
Merged

Refactor: Optimize create_test_file in benchmark.py#84
lxp merged 7 commits intocfv-project:python3from
mirusu400:python3

Conversation

@mirusu400
Copy link
Copy Markdown
Contributor

The original function used a while loop to write one random byte at a time, so I use os.urandom(size) to efficiently generate the full size bytes of random data in memory.

If the requested size is too large to fit in available system RAM (e.g., 50GB file on a 16GB machine), a MemoryError is caught. In this scenario, the function falls back to the original, memory-efficient byte-by-byte write logic.

@mirusu400 mirusu400 changed the title Fix: Add argument default value, make create_test_file more fast Refractor: Optimize create_test_file with memory-safe fallback Oct 28, 2025
@mirusu400 mirusu400 changed the title Refractor: Optimize create_test_file with memory-safe fallback Refractor: Optimize create_test_file with memory-safe fallback in benchmark.py Oct 28, 2025
@lxp
Copy link
Copy Markdown
Member

lxp commented Oct 29, 2025

Thanks for your pull request!

Ideally, I would like to get rid of the old behavior at all, since it is very inefficient.
However, reading a huge block of random data at once before writing it is also not very efficient memory wise.

Can you maybe come up with a middle way like reading and writing blocks of 64k?
At least 64k is also used when hashing files without mmap support:

x = f.read(65536)

Please also fix the formatting error (see https://github.com/cfv-project/cfv/actions/runs/18862085679/job/54029609083?pr=84):

Run flake8 --exclude=build,venv --ignore= --max-line-length=200 --max-complexity=75 --show-source --statistics .
./test/benchmark.py:69:1: W293 blank line contains whitespace
    
^
1     W293 blank line contains whitespace

@lxp lxp self-requested a review October 29, 2025 23:35
@mirusu400
Copy link
Copy Markdown
Contributor Author

Hello @lxp, I agree with your suggestion

I changed the code to write files split by chunk, not the whole data.

Also, I fixed the flake8 formatting error

Thank you for quick review!

Copy link
Copy Markdown
Member

@lxp lxp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating your pull request accordingly.
I made a few more suggestions with concrete changes.
We should be able to merge it with this changes (at least when there are not additonal flake8 formatting errors).

Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
Comment thread test/benchmark.py Outdated
while size:
f.write(b'%c' % random.randint(0, 255))
size -= 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a flake8 error on this line.

Run flake8 --exclude=build,venv --ignore= --max-line-length=200 --max-complexity=75 --show-source --statistics .
./test/benchmark.py:73:1: W293 blank line contains whitespace
    
^
1     W293 blank line contains whitespace
Error: Process completed with exit code 1.

See: https://github.com/cfv-project/cfv/actions/runs/18927372363/job/54136536126?pr=84

Suggested change

mirusu400 and others added 4 commits October 31, 2025 12:50
Co-authored-by: Lisa Gnedt <lisa@gnedt.at>
Co-authored-by: Lisa Gnedt <lisa@gnedt.at>
Co-authored-by: Lisa Gnedt <lisa@gnedt.at>
Co-authored-by: Lisa Gnedt <lisa@gnedt.at>
@lxp lxp changed the title Refractor: Optimize create_test_file with memory-safe fallback in benchmark.py Refactor: Optimize create_test_file in benchmark.py Oct 31, 2025
@lxp lxp added this to the 3.2 milestone Nov 2, 2025
@lxp lxp merged commit 024d9eb into cfv-project:python3 Nov 2, 2025
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants