Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

resolves #3175

@jqnatividad jqnatividad requested a review from Copilot December 22, 2025 04:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds weighted frequency computation to the frequency command through a new --weight flag. Users can now compute frequencies where each row contributes a weight value instead of a count of 1, enabling use cases like analyzing survey responses with importance weights or aggregating pre-counted data.

Key Changes:

  • Added --weight <column> flag to specify a numeric weight column
  • Weight column is automatically excluded from frequency computation
  • Missing/invalid weights default to 1.0; zero and negative weights are ignored
  • Both sequential and parallel processing paths support weighted frequencies

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
tests/test_frequency.rs Adds 15 comprehensive test cases covering weighted frequency basics, edge cases (missing/invalid/zero/negative weights), integration with existing flags (--limit, --asc, --rank-strategy, --json, --ignore-case, --no-nulls), and error handling
src/cmd/schema.rs Initializes the new flag_weight field and updates tuple destructuring to handle the new return signature
src/cmd/frequency.rs Implements weighted frequency support including: new --weight documentation, WeightedFTables type, process_frequencies_weighted and counts_weighted methods, ftables_weighted_internal for computing weighted frequencies, parallel processing support, JSON output handling, and process_headers_with_weight_exclusion to exclude the weight column from frequency computation

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 11 comments.

Comments suppressed due to low confidence (1)

src/cmd/frequency.rs:2158

  • The mapping logic appears incorrect. all_unique_headers_vec contains indices into selected_headers (which excludes the weight column), but the code searches for these indices in sel (line 2156), which contains original column indices before exclusion. This mismatch means the unique column detection will fail to properly map columns when a weight column is specified. The indices from get_unique_headers should be mapped directly without searching through sel, or get_unique_headers should work with the original headers and selection.
        let mapped_unique_headers: Vec<usize> = all_unique_headers_vec
            .iter()
            .filter_map(|&original_idx| {
                // Find the position of this original index in the selection
                sel.iter().position(|&sel_idx| sel_idx == original_idx)
            })
            .collect();

jqnatividad and others added 3 commits December 22, 2025 06:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jqnatividad jqnatividad requested a review from Copilot December 22, 2025 13:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

jqnatividad and others added 3 commits December 22, 2025 08:31
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@jqnatividad jqnatividad merged commit 8fb11cf into master Dec 22, 2025
14 of 15 checks passed
@jqnatividad jqnatividad deleted the 3175-weighted-frequencies branch December 22, 2025 13:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

frequency: add support for weights

2 participants