Skip to content

adding indicies to improve blocking#421

Merged
ericbuckley merged 1 commit into
mainfrom
feature/blocking-query-optimization
Jun 10, 2025
Merged

adding indicies to improve blocking#421
ericbuckley merged 1 commit into
mainfrom
feature/blocking-query-optimization

Conversation

@ericbuckley

@ericbuckley ericbuckley commented Jun 10, 2025

Copy link
Copy Markdown
Collaborator

Description

Updating MPI indices to improve blocking queries.

Additional Notes

The changes identified below created a 25x performance boost in running blocking queries on a Postgres database with ~750k person clusters and 5 million patient records.

For reference, the blocking queries generated look similar to the following

SELECT mpi_patient.id, mpi_patient.person_id, mpi_patient.data, mpi_patient.external_patient_id, mpi_patient.external_person_id, mpi_patient.external_person_source, mpi_patient.reference_id 
FROM mpi_patient 
WHERE mpi_patient.person_id IN (
  SELECT DISTINCT mpi_patient.person_id
  FROM mpi_patient
  JOIN mpi_blocking_value AS bv0 ON
    mpi_patient.id = bv0.patient_id AND bv0.blockingkey = 1 AND bv0.value IN ('YYYY-MM-DD')
  JOIN mpi_blocking_value AS bv1 ON
    mpi_patient.id = bv1.patient_id AND bv1.blockingkey = 3 AND bv1.value IN ('X')
  JOIN mpi_blocking_value AS bv2 ON
    mpi_patient.id = bv2.patient_id AND bv2.blockingkey = 10 AND bv2.value IN ('XXXX:SS', 'XXXXMR')
);
  • Adding an index to mpi_patient.person_id: The subquery returns a list of person_id values, and we use the query's where clause to join on that. Adding an index to this column vastly improves performance, as we no longer have to scan every value in the mpi_patient table.
  • Reorder mpi_blocking_value composite index: In the subquery we join on the mpi_blocking_value table, once for every blocking criteria. While patient_id, blockingkey and value are used in the join clause, only the latter two are used first to filter out values based on the blocking data. Moving the order of the composite key to use blockingkey and value first, allows us to reuse the index for both this initial filtering step and the overall join clause used in the subquery.
  • Adding an index to mpi_blocking_value.patient_id: Not actually needed for blocking, however the MPI API has an endpoint that requires deleting mpi_blocking_value rows related to a patient. This endpoint benefits greatly from having an index on patient_id. This was previously available in the composite index, as patient_id was the first value, however when we move it to the end its no longer available as a solo index. Thus, we are adding a separate key to make sure we don't lose performance when delete_blocking_values_for_patient is called.

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

  • I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
  • I have reviewed my changes to ensure they are clear, concise, and well-documented.
  • I have updated the documentation, if applicable.
  • I have added or updated test cases to cover my changes, if applicable.
  • I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

  • The code follows best practices and conventions.
  • The changes implement the desired functionality or fix the reported issue.
  • The tests cover the new changes and pass successfully.
  • Any potential edge cases or error scenarios have been considered.

@ericbuckley ericbuckley self-assigned this Jun 10, 2025
@codecov

codecov Bot commented Jun 10, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.51%. Comparing base (f6ef4bf) to head (56e6281).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #421   +/-   ##
=======================================
  Coverage   98.51%   98.51%           
=======================================
  Files          33       33           
  Lines        1948     1948           
=======================================
  Hits         1919     1919           
  Misses         29       29           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ericbuckley ericbuckley marked this pull request as ready for review June 10, 2025 03:00

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

25 x = 🔥

@ericbuckley ericbuckley merged commit a47c0f2 into main Jun 10, 2025
17 checks passed
@ericbuckley ericbuckley deleted the feature/blocking-query-optimization branch June 10, 2025 14:03
@ericbuckley ericbuckley added this to the v25.8.0 milestone Jun 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants