Skip to content

missing blocking keys#254

Merged
ericbuckley merged 17 commits into
mainfrom
feature/230-missing-blocking-keys
Mar 29, 2025
Merged

missing blocking keys#254
ericbuckley merged 17 commits into
mainfrom
feature/230-missing-blocking-keys

Conversation

@ericbuckley

@ericbuckley ericbuckley commented Mar 18, 2025

Copy link
Copy Markdown
Collaborator

Description

Changed the get blocking data call to allow for some missing blocking values. If too many values are missing from the query (indicated by checking log odds values), then the blocking pass will be skipped.

Related Issues

closes #230

Additional Notes

A couple of things to note on the implementation.

  1. The naming conventions for the new "compare_minimum_percentage" value does not match the issue. This was done because the issue was written with the assumption that remove kwargs from AlgorithmPass #223 would be completed first, which it is not. The plan is to make changes to remove kwargs from AlgorithmPass #223 to adjust the location of this new parameter, but for now we can just continue to use kwargs.
  2. get_block_data was converted into a class, GetBlockData. The Improve Evaluation of Missing Values epic has introduced extra conditions for retrieving blocking data. Checking for those extra conditions requires reusing variables, or at least benefits from reuse, thus storing some sort of state between the different functions has an advantage. There are some alternatives, that are worth considering, that I experimented with before landing on this solution. I think all three implementations can do a good job of encapsulating the logic and optimizing the looping constructs for efficient evaluation. If anyone thinks another implementation would read better, please voice that.
    • A long function with all the logic encapsulated
    • A long function with nested functions

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

  • I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
  • I have reviewed my changes to ensure they are clear, concise, and well-documented.
  • I have updated the documentation, if applicable.
  • I have added or updated test cases to cover my changes, if applicable.
  • I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

  • The code follows best practices and conventions.
  • The changes implement the desired functionality or fix the reported issue.
  • The tests cover the new changes and pass successfully.
  • Any potential edge cases or error scenarios have been considered.

@ericbuckley ericbuckley self-assigned this Mar 18, 2025
@codecov

codecov Bot commented Mar 18, 2025

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 96.66667% with 2 lines in your changes missing coverage. Please review.

Project coverage is 97.82%. Comparing base (cae7976) to head (1a354fe).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/recordlinker/routes/algorithm_router.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #254      +/-   ##
==========================================
- Coverage   97.85%   97.82%   -0.03%     
==========================================
  Files          33       33              
  Lines        1770     1797      +27     
==========================================
+ Hits         1732     1758      +26     
- Misses         38       39       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ericbuckley ericbuckley added the api New API feature label Mar 18, 2025
@ericbuckley ericbuckley marked this pull request as ready for review March 18, 2025 16:48
@ericbuckley ericbuckley changed the title Feature/230 missing blocking keys missing blocking keys Mar 19, 2025
Comment thread src/recordlinker/assets/initial_algorithms.json Outdated
Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/routes/algorithm_router.py

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall, some questions around class decisions and naming conventions, but I think this is basically there

Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/linking/link.py Outdated
Comment thread src/recordlinker/routes/algorithm_router.py
bamader
bamader previously approved these changes Mar 27, 2025

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all discussions have been handled, so thanks for some of the naming/convention changes and the explanations of thought processes.

@ericbuckley ericbuckley requested a review from bamader March 28, 2025 18:03
@ericbuckley ericbuckley merged commit 2dc3243 into main Mar 29, 2025
@ericbuckley ericbuckley deleted the feature/230-missing-blocking-keys branch March 29, 2025 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api New API feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

improve blocking with missing payload keys

3 participants