Expand and scramble test cases by bamader · Pull Request #364 · CDCgov/RecordLinker

bamader · 2025-05-08T18:54:19Z

Description

This PR adds a script that allow us to combinatorially expand and scramble the test cases NBS provided for us. We leave each of the original test cases unchanged, but we generate a random number of duplicates of each case to mess with. We then dropout some fields to simulate missingness, as well as apply random likelihoods of edit distances to string values in algorithm-relevant fields. All of this is controlled by parameters at the top of the script (which only needs to be run once to make the data-set, doesn't need to be run each time we run the algorithm tests). Of note, the script right now copies the match decisions of the original test case into each duplicate we generate from it, no matter how much we then scramble and mangle the duplicate case. This is the reason we have poor performance on this set, so that's expected. Before sharing this out, we should consider "re-grading" our expanded set to have better labels of when things should and shouldn't match. Also, since the script is customizable, we can also generate different test expansion cases if we want to simulate "pretty close" data (low randomness scrambling) vs "really bad data" (lots of scrambling). We'd need to grade each separately, but this could be useful for purposes of showing performance in different contexts.

Related Issues

Closes #354

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
I have reviewed my changes to ensure they are clear, concise, and well-documented.
I have updated the documentation, if applicable.
I have added or updated test cases to cover my changes, if applicable.
I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

The code follows best practices and conventions.
The changes implement the desired functionality or fix the reported issue.
The tests cover the new changes and pass successfully.
Any potential edge cases or error scenarios have been considered.

bamader · 2025-05-08T18:54:45Z

Performance of the algorithm tests on this initial run

codecov · 2025-05-08T18:59:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.51%. Comparing base (1b961c8) to head (b1523f1).
Report is 4 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #364   +/-   ##
=======================================
  Coverage   98.51%   98.51%           
=======================================
  Files          33       33           
  Lines        1947     1947           
=======================================
  Hits         1918     1918           
  Misses         29       29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Expand and scramble test cases

a63168e

bamader requested review from ericbuckley, johanna-skylight and m-goggins as code owners May 8, 2025 18:54

Appease god of linting

b1523f1

ericbuckley approved these changes May 13, 2025

View reviewed changes

Comment thread tests/algorithm/scripts/expand_test_data.py

Comment thread tests/algorithm/scripts/expand_test_data.py

bamader merged commit 6402da9 into main May 14, 2025
15 checks passed

bamader deleted the expanded-test-cases branch May 14, 2025 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand and scramble test cases#364

Expand and scramble test cases#364
bamader merged 2 commits into
mainfrom
expanded-test-cases

bamader commented May 8, 2025

Uh oh!

bamader commented May 8, 2025

Uh oh!

codecov Bot commented May 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bamader commented May 8, 2025

Description

Related Issues

Checklist

Checklist for Reviewers

Uh oh!

bamader commented May 8, 2025

Uh oh!

codecov Bot commented May 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 8, 2025 •

edited

Loading