Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 31, 2025

The NMI (Normalized Mutual Information) computation logic was duplicated between parallel and sequential code paths in moarstats, requiring bug fixes and improvements to be applied in two places.

Changes

  • Extracted compute_nmi_with_threshold helper function encapsulating:

    • Early exit for disabled NMI or zero pairs
    • Cardinality threshold filtering with debug logging
    • Entropy computation from marginal frequency counts
    • MI reuse or computation as needed
    • Final NMI calculation
  • Replaced 118 lines of duplicated code with single function calls in both parallel (lines 2463-2477) and sequential (lines 2796-2810) paths

Example

Before:

// Parallel path - lines 2406-2471
let normalized_mutual_information = if !stats_config.nmi || chunk_stats.total_pairs == 0 {
    None
} else if let Some(threshold) = cardinality_threshold {
    // ... 40+ lines of threshold checking, entropy computation, MI handling ...
} else {
    // ... 15+ lines duplicating entropy computation and MI handling ...
};

// Sequential path - lines 2790-2841
let normalized_mutual_information = if !stats_config.nmi || total_pairs == 0 {
    // ... exact same logic duplicated ...
};

After:

// Both paths now use:
let normalized_mutual_information = compute_nmi_with_threshold(
    stats_config.nmi,
    total_pairs,
    cardinality_threshold,
    field1_info,
    field2_info,
    pair_key,
    field_names,
    &x_counts,
    &y_counts,
    &xy_counts,
    mutual_information,
);

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Created `compute_nmi_with_threshold` helper function to encapsulate NMI computation logic
- Replaced parallel path NMI computation with helper function call
- Replaced sequential path NMI computation with helper function call
- Eliminates code duplication between parallel and sequential paths
- Improves maintainability by centralizing NMI computation logic

Co-authored-by: jqnatividad <1980690+jqnatividad@users.noreply.github.com>
Copilot AI changed the title [WIP] Address feedback on moarstats normalized mutual info Deduplicate NMI computation logic in moarstats bivariate stats Dec 31, 2025
Copilot AI requested a review from jqnatividad December 31, 2025 03:56
Copy link
Collaborator

@jqnatividad jqnatividad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, this code duplication is OK as we're optimizing for speed, and there are too many parameters being passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants