Skip to content

feat(medcat): CU-869ccxgj7 Add AU model bundle support#371

Merged
mart-r merged 6 commits intomainfrom
feat/medcat/CU-869ccxgj7-Add-AU-model-bundle-support
Mar 25, 2026
Merged

feat(medcat): CU-869ccxgj7 Add AU model bundle support#371
mart-r merged 6 commits intomainfrom
feat/medcat/CU-869ccxgj7-Add-AU-model-bundle-support

Conversation

@mart-r
Copy link
Copy Markdown
Collaborator

@mart-r mart-r commented Mar 23, 2026

This PR adds AU model bundle support.

The change is 2 fold:

  1. Fix the regex that prevented AU release version parsing
    • UK extensions come in formats like SnomedCT_UKClinicalRefsetsRF2_PRODUCTION_20260211T000001Z
    • But AU extension is in format SnomedCT_Release_AU1000036_20260228
    • But the previous regex expected the last bit (release) to have the TZ part as well which the AU one doesn't
  2. Fix the missing AU bundle
    • Previously the AU extension was defined
    • But the bundle wasn't
    • So now we have both
This is what we had before
 % python look_at_release.py temp/2026_03_23_snomed_au_integration/data_snapshot/REL_EXTR temp/2026_03_23_snomed_au_integration/preprocessed/preproccessed_snapshot.csv
Traceback (most recent call last):
  File "/Users/martratas/Documents/CogStack/.MedCAT.nosync/cogstack-ops/medcat-snomed-model-creation/temp/2026_03_23_snomed_au_integration/preprocess_to_.py", line 31, in <module>
    main(*sys.argv[1:])
  File "/Users/martratas/Documents/CogStack/.MedCAT.nosync/cogstack-ops/medcat-snomed-model-creation/temp/2026_03_23_snomed_au_integration/preprocess_to_.py", line 27, in main
    preprocess(*args)
  File "/Users/martratas/Documents/CogStack/.MedCAT.nosync/cogstack-ops/medcat-snomed-model-creation/temp/2026_03_23_snomed_au_integration/preprocess_to_.py", line 14, in preprocess
    pp = Snomed(from_path, )
         ^^^^^^^^^^^^^^^^^^^
  File "/Users/martratas/Documents/CogStack/.MedCAT.nosync/cogstack-ops/medcat-snomed-model-creation/.venv_v2_312/lib/python3.12/site-packages/medcat/model_creation/preprocess_snomed.py", line 269, in __init__
    self._check_path_and_release())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/martratas/Documents/CogStack/.MedCAT.nosync/cogstack-ops/medcat-snomed-model-creation/.venv_v2_312/lib/python3.12/site-packages/medcat/model_creation/preprocess_snomed.py", line 589, in _check_path_and_release
    rel = self._determine_release(folder, strict=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/martratas/Documents/CogStack/.MedCAT.nosync/cogstack-ops/medcat-snomed-model-creation/.venv_v2_312/lib/python3.12/site-packages/medcat/model_creation/preprocess_snomed.py", line 332, in _determine_release
    raise UnkownSnomedReleaseException(
medcat.model_creation.preprocess_snomed.UnkownSnomedReleaseException: No version found in 'SnomedCT_Release_AU1000036_20260228'
Behaviour now
% python look_at_release.py ../../cogstack-ops/medcat-snomed-model-creation/temp/2026_03_23_snomed_au_integration/data_snapshot/REL_EXTR/ "abc"
Bundles SupportedBundles.AU_EXT
Releases ['20260228']
Extensions ['AU']
Test script
import sys
import os

from medcat.model_creation import preprocess_snomed
from medcat.model_creation.preprocess_snomed import Snomed, SupportedBundles, SupportedExtension


def main(from_path: str, to_path: str):
    pp = Snomed(from_path, )
    print("Bundles", pp.bundle)
    print("Releases", pp.snomed_releases)
    print("Extensions", [ext.name for ext in pp.exts])


if __name__ == "__main__":
    main(*sys.argv[1:])

@mart-r mart-r force-pushed the feat/medcat/CU-869ccxgj7-Add-AU-model-bundle-support branch from d12d3b5 to f53e2f9 Compare March 24, 2026 12:20
Copy link
Copy Markdown
Collaborator

@alhendrickson alhendrickson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, approved !

Only over thinking part is questioning how restrictive/permissive you want to be.

For example - if you want it to be limited, would it make sense to have two regex's and check UK vs australia pattern based on the enum? Aware it's overkill

Else, if you dont want it to be limited, why not just (.*) for the last group - if there's a folder starting with SnomedCT_ext_prod_something, should you just read "something" even if its not a date?

@mart-r
Copy link
Copy Markdown
Collaborator Author

mart-r commented Mar 25, 2026

Looks good, approved !

Only over thinking part is questioning how restrictive/permissive you want to be.

For example - if you want it to be limited, would it make sense to have two regex's and check UK vs australia pattern based on the enum? Aware it's overkill

Else, if you dont want it to be limited, why not just (.*) for the last group - if there's a folder starting with SnomedCT_ext_prod_something, should you just read "something" even if its not a date?

The last group is used as the release (though it's just the date) later in _determine_release. So there is an expectation as to what it includes.

The inflexibility allows for a reasonably expected format for the release down the stream.
But if it was more flexibly we'd be able to potentially support other naming formats or even test-time fake names.

With that said, I'm inclined to leave it as is for now. I don't want to make a change just in case. If/when we figure out there's a solid reason to change it or be more flexible we can do that.

@mart-r mart-r merged commit 614af59 into main Mar 25, 2026
41 checks passed
@mart-r mart-r deleted the feat/medcat/CU-869ccxgj7-Add-AU-model-bundle-support branch March 25, 2026 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants