-
Notifications
You must be signed in to change notification settings - Fork 99
feat: frequency also now does dynamic parallel chunk sizing
#3135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
so it can process even larger than memory files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds dynamic parallel chunk sizing to the frequency command, enabling it to process larger-than-memory files similar to how the stats command works. The implementation introduces memory-aware chunking that dynamically calculates chunk sizes based on available system memory and record sampling, preventing out-of-memory errors when processing large datasets.
Key changes:
- Adds memory-aware chunking with dynamic sizing based on available system memory
- Introduces
QSV_FREQ_CHUNK_MEMORY_MBenvironment variable for manual chunk memory control - Implements automatic index creation when files are too large for sequential processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
so it can process even larger than memory files like
stats