Skip to content

address line suffix normalize#258

Merged
ericbuckley merged 9 commits into
mainfrom
feature/239-address-line-suffix-normalize
Mar 29, 2025
Merged

address line suffix normalize#258
ericbuckley merged 9 commits into
mainfrom
feature/239-address-line-suffix-normalize

Conversation

@ericbuckley

@ericbuckley ericbuckley commented Mar 20, 2025

Copy link
Copy Markdown
Collaborator

Description

Add a normalization step for normalizing street suffixes and updating the state normalization process to happen during ingestion.

Related Issues

closes #239

Additional Notes

Previously, state values were being normalized in the feature_iter. In alignment with the direction we're taking on the other fields, it makes sense to do this normalization just once, at ingestion time. One key difference with this, is we are now preserving any invalid state values, mainly to preserve data when a state could be misspelled.

<--------------------- REMOVE THE LINES BELOW BEFORE MERGING --------------------->

Checklist

Please review and complete the following checklist before submitting your pull request:

  • I have ensured that the pull request is of a manageable size, allowing it to be reviewed within a single session.
  • I have reviewed my changes to ensure they are clear, concise, and well-documented.
  • I have updated the documentation, if applicable.
  • I have added or updated test cases to cover my changes, if applicable.
  • I have minimized the number of reviewers to include only those essential for the review.

Checklist for Reviewers

Please review and complete the following checklist during the review process:

  • The code follows best practices and conventions.
  • The changes implement the desired functionality or fix the reported issue.
  • The tests cover the new changes and pass successfully.
  • Any potential edge cases or error scenarios have been considered.

@ericbuckley ericbuckley self-assigned this Mar 20, 2025
@codecov

codecov Bot commented Mar 20, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.86%. Comparing base (cae7976) to head (1850113).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #258      +/-   ##
==========================================
+ Coverage   97.85%   97.86%   +0.01%     
==========================================
  Files          33       33              
  Lines        1770     1780      +10     
==========================================
+ Hits         1732     1742      +10     
  Misses         38       38              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ericbuckley ericbuckley marked this pull request as ready for review March 25, 2025 22:34
Comment thread src/recordlinker/schemas/pii.py
Comment thread src/recordlinker/schemas/pii.py
Comment thread src/recordlinker/schemas/pii.py
Comment thread src/recordlinker/schemas/pii.py

@bamader bamader left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks good to go, but I left a note about wanting to revisit formatting, preprocessing, and additional input sanitization points down the road. I think they're small things we can do to boost robustness and piece of mind in the (probably likely) event that improperly formatted or odd-valued raw data gets attempted to be submitted. It's also likely got value when we think about standalone solutions so that we're more fault tolerant.

Comment thread src/recordlinker/schemas/pii.py
Comment thread src/recordlinker/schemas/pii.py
@ericbuckley ericbuckley merged commit b5dc520 into main Mar 29, 2025
@ericbuckley ericbuckley deleted the feature/239-address-line-suffix-normalize branch March 29, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Normalize address line suffixes

3 participants