-
Notifications
You must be signed in to change notification settings - Fork 99
perf: moarstats much faster bivariate calculation
#3248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
and to optimize for bivariate stats that users normally use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a significant performance optimization to the moarstats command's bivariate statistics computation by:
- Adding a new
--bivariate-statsflag that allows users to selectively compute only the statistics they need - Changing the default behavior to compute only "fast" statistics (Pearson correlation and covariance) which use streaming algorithms
- Optimizing the Kendall tau computation to avoid unnecessary vector clones
The key changes include:
- New
BivariateStatsConfigstruct to configure which statistics to compute - Conditional allocation and computation based on the selected statistics
- Dynamic CSV header and row generation based on requested statistics
- Comprehensive test coverage for the new flag options
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/cmd/moarstats.rs | Added --bivariate-stats flag, BivariateStatsConfig struct, conditional computation logic, optimized Kendall tau, and dynamic output generation |
| tests/test_moarstats.rs | Updated existing tests to explicitly request "all" statistics where needed, and added three new tests for the flag options (pearson-only, multiple stats, and all stats) |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…is triggering false positive devskim security alert being confused with a weak hashing algorithm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
…csv` when `--join-inputs` is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.
found by GH Copilot review, but was not flagged as it was outside the changed code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
Also optimized for computing streaming bivariate stats by default