Skip to content

Implement RMS#229

Merged
bamader merged 28 commits into
mainfrom
draft-log-odds-window
Apr 10, 2025
Merged

Implement RMS#229
bamader merged 28 commits into
mainfrom
draft-log-odds-window

Conversation

@bamader

@bamader bamader commented Feb 27, 2025

Copy link
Copy Markdown
Collaborator

Implement RMS

Description

This PR provides all code necessary to implement a log-odds based RMS comparison to evaluate match quality. This will replace belongingness ratio as the internal decision-driver the algorithm uses to find the highest quality match.

Related Issues

Closes #236

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

  • I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
  • I have reviewed my changes to ensure they are clear, concise, and well-documented.
  • I have updated the documentation, if applicable.
  • I have added or updated test cases to cover my changes, if applicable.
  • I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

  • The code follows best practices and conventions.
  • The changes implement the desired functionality or fix the reported issue.
  • The tests cover the new changes and pass successfully.
  • Any potential edge cases or error scenarios have been considered.

@bamader bamader marked this pull request as draft February 27, 2025 14:46
Comment thread src/recordlinker/assets/initial_algorithms.json Outdated
Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
@codecov

codecov Bot commented Feb 28, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.06%. Comparing base (3ee7ec5) to head (a608798).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #229      +/-   ##
==========================================
+ Coverage   97.80%   98.06%   +0.26%     
==========================================
  Files          32       32              
  Lines        1774     1813      +39     
==========================================
+ Hits         1735     1778      +43     
+ Misses         39       35       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bamader

bamader commented Feb 28, 2025

Copy link
Copy Markdown
Collaborator Author

@ericbuckley @m-goggins Updated code to make all tests pass (there's still one set of three tests that involved belongingness that I'm not quite sure what to do with, but will continue noodling).

I also ran the algorithm tests using the certain hierarchy I established, and we doubled our performance! If we were getting 33% last time (two out of the six cases), we now get 4 out of the 6 cases correct. We correctly get 3 out of the 5 match cases, and correctly flag the invalid birthdate. The cases we miss are (1) the record has first and last name switched (there's nothing we can do about that, since first and last are blocking fields in one pass [they fail exact matching] and then evaluation fields in another [but they earn 0 points]); and (2) a test case where someone named Tho-mas fails to match to a cluster that contains 2 identical patients except for their first names, Thomas and ThoMas. I believe if we implemented basic string normalization on incoming names (e.g. standardize casing, remove numbers and punctuation) we would catch this case. We don't even have to do it at the persistence level so that data is still preserved as supplied; we could apply it on the fly during record evaluation, just like we proposed with skip values. This would allow us to still correctly model names that might actually have punctuation (e.g. the Arabic surname Al'Charif, also sometimes written Al-Charif) but wouldn't penalize one of those representations over the other. I believe this is a bug we should fix, but wanted to drop my findings here before I actually took off for the day.

@ericbuckley ericbuckley added this to the v25.5.0 milestone Feb 28, 2025
@bamader

bamader commented Mar 5, 2025

Copy link
Copy Markdown
Collaborator Author

Updating to mark all code here complete. All tests have been adjusted to match the new algorithm and everything passes. This is, from my view, "merge ready" code.

@bamader bamader requested a review from ericbuckley March 5, 2025 14:04
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py
Comment thread src/recordlinker/assets/initial_algorithms.json
Comment thread tests/unit/linking/test_link.py Outdated
Comment thread tests/unit/linking/test_link.py Outdated
@bamader bamader closed this Mar 17, 2025
@bamader bamader reopened this Mar 17, 2025
@bamader bamader force-pushed the draft-log-odds-window branch from 2150ab6 to b0e2931 Compare March 19, 2025 18:59
@bamader

bamader commented Mar 24, 2025

Copy link
Copy Markdown
Collaborator Author

Flagging that I've edited the ticket's AC due to scoping down work and the urgency of progressing this to allow our clients to test. There won't be any logical or performance based hits as a result, merely moving work to consolidate and refactor code into a follow-up.

@bamader

bamader commented Mar 24, 2025

Copy link
Copy Markdown
Collaborator Author

Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
@bamader bamader marked this pull request as ready for review March 27, 2025 18:35
@bamader bamader changed the title Experiment: switch to link by log odds window Implement RMS Mar 31, 2025

@ericbuckley ericbuckley left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, thanks for making all the changes per the github issue so far.

Comment thread src/recordlinker/schemas/link.py Outdated
Comment thread src/recordlinker/schemas/link.py Outdated
Comment thread src/recordlinker/schemas/link.py Outdated
Comment thread src/recordlinker/schemas/link.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/linking/link.py
Comment thread src/recordlinker/database/mpi_service.py Outdated
@bamader bamader requested a review from ericbuckley April 9, 2025 12:56

@m-goggins m-goggins left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! The only thing I'd add is a test to cover the lines that don't have coverage in link.py. Normally, I wouldn't be worried about the test coverage since it's not a big hit for those lines but I do think we want to demonstrate how the match grade is determined when the grades are not the same.

Comment thread src/recordlinker/schemas/link.py
@bamader

bamader commented Apr 10, 2025

Copy link
Copy Markdown
Collaborator Author

@m-goggins Super fair point, and I think it's important to show that grade adaptation is a new feature of the RMS decision mechanism. I've added tests to cover both directions (existing grade is certain so no change in pass 2, and existing grade is only possible so definite change in pass 2) so we should now have total coverage for our link.py case logic.

@bamader bamader requested a review from m-goggins April 10, 2025 13:41

@m-goggins m-goggins left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, really excited to get this feature out the door to NBS! Thanks for adding the additional tests

@bamader bamader merged commit 10deb83 into main Apr 10, 2025
@bamader bamader deleted the draft-log-odds-window branch April 10, 2025 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Relative Match Score

3 participants