Skip to content

Add scrambler for load testing#384

Merged
m-goggins merged 5 commits into
mainfrom
refactor-expand-test-data
May 22, 2025
Merged

Add scrambler for load testing#384
m-goggins merged 5 commits into
mainfrom
refactor-expand-test-data

Conversation

@m-goggins

@m-goggins m-goggins commented May 21, 2025

Copy link
Copy Markdown
Collaborator

Description

This PR adds a script to scramble generated records for the purposes of load testing. It re-uses much of the code @bamader wrote for the scrambling the csv of NBS data, and adapts it for the json/PIIRecord data we have. To test out the code, you can run python3 -m tests.load.scripts.scramble_data --file="./tests/load/assets/test_data.json" and it will spit out a file with scrambled records and the original from the test_data.json file.

Related Issues

Closes #376

Additional Notes

If there is time after load testing to refactor this, there's a good amount of cleanup/refactoring I'd like to do to make this more adaptable to other file types, e.g., the NBS CSV and other CSVs. But, given the priority for load testing (not scrambling), this script does the job and does it quickly.

Background for why we need the script:
We want to be able to seed the test MPI during load testing with a variety of records that may or may not match each other. We plan to:

  1. generate 1.5 million records (the "original" data)
  2. grab ~1 million of the records from the original data and scramble/multiple them using this script to get roughly 4-5 million records (the "scrambled" data)
  3. Seed the MPI with the scrambled data
  4. Randomly select records from the original data to hit the /link endpoint with to see how the database performs with a larger MPI

@codecov

codecov Bot commented May 21, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.51%. Comparing base (bb812c1) to head (941fdf6).
Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #384   +/-   ##
=======================================
  Coverage   98.51%   98.51%           
=======================================
  Files          33       33           
  Lines        1948     1948           
=======================================
  Hits         1919     1919           
  Misses         29       29           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@m-goggins m-goggins marked this pull request as ready for review May 21, 2025 17:31

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything logically looks good! Just a few comments on cleanup and possible value swaps/imports, but nothing blocking merge once those get addressed.

Comment thread tests/load/scrambler/utils.py
Comment thread tests/load/scrambler/utils.py
Comment thread tests/load/scrambler/json.py Outdated
Comment thread tests/load/scrambler/json.py Outdated
Comment thread tests/load/scrambler/json.py Outdated
Comment thread tests/load/scrambler/json.py
Comment thread tests/load/scripts/scramble_data.py
@m-goggins m-goggins merged commit 7505a75 into main May 22, 2025
15 checks passed
@m-goggins m-goggins deleted the refactor-expand-test-data branch May 22, 2025 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor expand_test_data.py to accept PIIRecords

3 participants