Skip to content

Add API endpoint for merging person clusters#203

Merged
m-goggins merged 39 commits into
mainfrom
feature/161-API-endpoint-for-merging-person-clusters
Feb 12, 2025
Merged

Add API endpoint for merging person clusters#203
m-goggins merged 39 commits into
mainfrom
feature/161-API-endpoint-for-merging-person-clusters

Conversation

@m-goggins

@m-goggins m-goggins commented Feb 7, 2025

Copy link
Copy Markdown
Collaborator

Description

This PR adds a new endpoint to merge person clusters as such POST /person/<person-reference-id>/merge -d '{"persons": ["<person-reference-id1>", "<person-reference-id2>"]}'.

Two questions that I want to raise about the PR:

  1. After the person clusters are merged, e.g., person-reference-id1 and person-reference-id2 into person-reference-id as above, person-reference-id1 and person-reference-id2 are not currently cleaned up (deleted from the database). Originally I wasn't sure if this should occur as a separate step, but after working on the endpoint and our discussion about additional manual work yesterday, I think I should add a cleanup step after merging to remove person-reference-id1 and person-reference-id2. In theory, this cleanup could occur on the routine cleanup that we've talked about but not yet implemented, but I like the "cleanup as you go if you can" approach for now. Thoughts?

  2. Does anyone have strong opinions on creating similar functions that have different inputs and returns like get_person_by_reference_id and get_persons_by_reference_ids? They are used for slightly different purposes but I could condense into a single function if folks feel it improves clarity.

Related Issues

Fixes #161

@codecov

codecov Bot commented Feb 7, 2025

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 97.29730% with 1 line in your changes missing coverage. Please review.

Project coverage is 97.68%. Comparing base (3f18299) to head (8807895).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/recordlinker/database/mpi_service.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #203      +/-   ##
==========================================
- Coverage   97.70%   97.68%   -0.02%     
==========================================
  Files          32       32              
  Lines        1610     1645      +35     
==========================================
+ Hits         1573     1607      +34     
- Misses         37       38       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@m-goggins m-goggins changed the title Feature/161 api endpoint for merging person clusters Add API endpoint for merging person clusters Feb 7, 2025
@m-goggins m-goggins marked this pull request as ready for review February 7, 2025 20:29
@ericbuckley

Copy link
Copy Markdown
Collaborator

@m-goggins re question 1; this is accounted for in use case 12 of the MPI API design. My original thought was, let's make this an explicit DELETE rather than an implicit one. It's true that these old person clusters are orphaned and no longer in use, however we can not definitely say that the user's systems have purged all references to these clusters. By making it an explicit step to DELETE the orphaned person clusters, we have more trust that the user is doing the same with their references.

This isn't a hard requirement, nor am I married to that implementation. It may turn out that automatic cleanup is easier, or we could just provide a flag in the payload to indicate that automatic removal is preferred.

Comment thread src/recordlinker/routes/person_router.py Outdated
Comment thread rfc/001-identifier-triplets.md
Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/routes/person_router.py Outdated

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks solid to me, just some nits on language and comment cleanups that I think could help reduce the cognitive burden of reading some of this stuff

Comment thread src/recordlinker/database/mpi_service.py Outdated
Comment thread src/recordlinker/routes/person_router.py Outdated
Comment thread src/recordlinker/routes/person_router.py
Co-authored-by: Eric Buckley <eric.buckley@gmail.com>
@ericbuckley

Copy link
Copy Markdown
Collaborator

@m-goggins re question 1; this is accounted for in use case 12 of the MPI API design. My original thought was, let's make this an explicit DELETE rather than an implicit one. It's true that these old person clusters are orphaned and no longer in use, however we can not definitely say that the user's systems have purged all references to these clusters. By making it an explicit step to DELETE the orphaned person clusters, we have more trust that the user is doing the same with their references.

This isn't a hard requirement, nor am I married to that implementation. It may turn out that automatic cleanup is easier, or we could just provide a flag in the payload to indicate that automatic removal is preferred.

@m-goggins PR is looking great, thanks for all your work on this so far. I wanted to revisit the above with regard to auto removal of the person cluster when its empty. What do you think about updating the payload to accept a flag for this action? Do you think that would be helpful for users?

@m-goggins

Copy link
Copy Markdown
Collaborator Author

What do you think about updating the payload to accept a flag for this action? Do you think that would be helpful for users?

@ericbuckley That sounds like a good approach to me. I can make that change.

bamader
bamader previously approved these changes Feb 11, 2025

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving so I don't block, since we've got agreement on new comments.

Comment thread tests/unit/routes/test_person_router.py
Comment thread tests/unit/routes/test_person_router.py Outdated
Co-authored-by: Eric Buckley <eric.buckley@gmail.com>

@ericbuckley ericbuckley left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @m-goggins thanks for all your work on this!

@m-goggins m-goggins merged commit f0d035a into main Feb 12, 2025
@m-goggins m-goggins deleted the feature/161-API-endpoint-for-merging-person-clusters branch February 12, 2025 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API endpoint for merging of person clusters [MPIAPI]

3 participants