-
Notifications
You must be signed in to change notification settings - Fork 99
feat: frequency add weighted frequencies
#3218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds weighted frequency computation to the frequency command through a new --weight flag. Users can now compute frequencies where each row contributes a weight value instead of a count of 1, enabling use cases like analyzing survey responses with importance weights or aggregating pre-counted data.
Key Changes:
- Added
--weight <column>flag to specify a numeric weight column - Weight column is automatically excluded from frequency computation
- Missing/invalid weights default to 1.0; zero and negative weights are ignored
- Both sequential and parallel processing paths support weighted frequencies
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| tests/test_frequency.rs | Adds 15 comprehensive test cases covering weighted frequency basics, edge cases (missing/invalid/zero/negative weights), integration with existing flags (--limit, --asc, --rank-strategy, --json, --ignore-case, --no-nulls), and error handling |
| src/cmd/schema.rs | Initializes the new flag_weight field and updates tuple destructuring to handle the new return signature |
| src/cmd/frequency.rs | Implements weighted frequency support including: new --weight documentation, WeightedFTables type, process_frequencies_weighted and counts_weighted methods, ftables_weighted_internal for computing weighted frequencies, parallel processing support, JSON output handling, and process_headers_with_weight_exclusion to exclude the weight column from frequency computation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.
Comments suppressed due to low confidence (1)
src/cmd/frequency.rs:2158
- The mapping logic appears incorrect.
all_unique_headers_veccontains indices intoselected_headers(which excludes the weight column), but the code searches for these indices insel(line 2156), which contains original column indices before exclusion. This mismatch means the unique column detection will fail to properly map columns when a weight column is specified. The indices fromget_unique_headersshould be mapped directly without searching throughsel, orget_unique_headersshould work with the original headers and selection.
let mapped_unique_headers: Vec<usize> = all_unique_headers_vec
.iter()
.filter_map(|&original_idx| {
// Find the position of this original index in the selection
sel.iter().position(|&sel_idx| sel_idx == original_idx)
})
.collect();
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
resolves #3175