Skip to content

Conversation

@jqnatividad
Copy link
Collaborator

No description provided.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds bivariate statistics computation to the moarstats command, enabling analysis of relationships between pairs of columns in CSV datasets. The feature computes five correlation/covariance statistics (Pearson, Spearman, Kendall's tau, sample/population covariance) and mutual information for field pairs.

Key changes:

  • Implements bivariate statistics with parallel chunked processing for large files and sequential processing for smaller datasets
  • Adds multi-dataset join capability to compute bivariate statistics across joined datasets
  • Includes comprehensive test coverage with 9 new test cases covering various scenarios (basic correlation, negative correlation, string fields, mixed types, joins, etc.)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/cmd/moarstats.rs Core implementation of bivariate statistics computation including correlation algorithms, mutual information calculation, multi-dataset join support, parallel/sequential processing strategies, and extensive optimizations (date parsing cache, string interning, batch conversions)
tests/test_moarstats.rs Comprehensive test suite with 9 test cases covering positive/negative correlations, string fields, multiple fields, all statistics, mixed types, joins, and index auto-creation
docs/STATS_DEFINITIONS.md Documentation for bivariate statistics including definitions of Pearson/Spearman/Kendall correlations, covariance, mutual information, and multi-dataset join usage

@jqnatividad jqnatividad merged commit 0bc45b6 into master Dec 29, 2025
15 of 16 checks passed
@jqnatividad jqnatividad deleted the moarstats-bivariate branch December 29, 2025 05:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants