Skip to content

bug(medcat): CU-869ckx6dr Allow for better supervised training#374

Merged
adam-sutton-1992 merged 15 commits intomainfrom
bug/medcat/CU-869ckx6dr-allow-for-better-supervised-training
Apr 1, 2026
Merged

bug(medcat): CU-869ckx6dr Allow for better supervised training#374
adam-sutton-1992 merged 15 commits intomainfrom
bug/medcat/CU-869ckx6dr-allow-for-better-supervised-training

Conversation

@mart-r
Copy link
Copy Markdown
Collaborator

@mart-r mart-r commented Mar 24, 2026

During supervised training the document and entity provided to the trained component (linker only for now) don't align with each other.
There are 2 distinct issues:

  • Since the document is generated from the regular inference procedure, it has .ner_ents and .linked_ents lists, but they are not aligned with what the annotated dataset specifies.
  • The entity passed to the trained component is always a new instance with new state instead of being reused

This PR fixes the above by:

  • Preparing the .ner_ents and .linked_ents for the document when doing supervised training (they will contain the same entities for now)
  • Reworking how entity creation is done in this context in order to be able to reuse these entities (so the MutableEntity is within the .linked_ents)
    • This involved creating a new method entity_from_tokens_in_doc for the pipe and tokenizers
    • And deprecating the old one (entity_from_tokens)
  • It also adds a few tests to support the above and updates other tests along with the changes

@adam-sutton-1992 adam-sutton-1992 self-requested a review April 1, 2026 10:47
@adam-sutton-1992
Copy link
Copy Markdown
Contributor

Looks good to me. For a few of the depreciated changes - is that just to be safe rather than removing them? I can't see them being used by people outside of the library.

@mart-r
Copy link
Copy Markdown
Collaborator Author

mart-r commented Apr 1, 2026

For a few of the depreciated changes - is that just to be safe rather than removing them? I can't see them being used by people outside of the library.

I agree, it's incredibly unlikely somebody is using them. But in the off chance, I've deprecated for now and will remove next minor version.

@adam-sutton-1992 adam-sutton-1992 enabled auto-merge (squash) April 1, 2026 11:02
@adam-sutton-1992 adam-sutton-1992 merged commit 7c12cae into main Apr 1, 2026
23 of 26 checks passed
@adam-sutton-1992 adam-sutton-1992 deleted the bug/medcat/CU-869ckx6dr-allow-for-better-supervised-training branch April 1, 2026 11:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants