Skip to content

Implement Relative Match Score #236

@ericbuckley

Description

@ericbuckley

Summary

Replace the belongingness ratio score with a relative match score value that is based on log odds comparisons of different evaluation fields.

Acceptance Criteria

  • Update the algorithm configuration to accept a possible_match_window on each pass
  • Remove the beloningness_ratio parameter on the algorithm configuration
  • Implement the new link.link_record method

- [ ] Merge the link.LinkResult and schemas.LinkResult class into one, keeping the latter (moved to follow up)

  • Update the schemas.LinkResult class to include the relative_match_score value, pass_number and context of feature evaluations (context moved to follow up)
  • Remove the schemas.Prediction class
  • Create the schemas.MatchGrade enum
  • Add sufficient logging to show values calculated for the evaluation phase and when selecting a "best pass" from the cluster table.
  • Update documentation in site/design.md to include info on RMS and remove information about the belongingness ratio

Details / Tasks

The details in how the new relative match score (RMS) is calculated can be found in this document. Use that as a guide for implementing the new link_record function, which will accept the same input and more or less produce the same output.

The schemas.LinkResult class needs to change to capture the differences between a belongingness ratio score and RMS.

  1. Remove the existing belongingness_ratio value
  2. Add the relative_match_score value
  3. Add a pass number to indicate which pass produced the best score for the cluster (1 indexed)
    4. Add a context list, this is a list of all the feature evaluations for the pass that generated the highest score. Each item in the list will have the feature name and the weighted score associated to that feature.

The MatchGrade enum should allow for ordering, that is, I should be able to show that MatchGrade.CERTAIN > MatchGrate.POSSIBLE. Consider using the total_ordering decorator for implementation. For the hierarchy; CERTAIN > POSSIBLE > CERTAINLY_NOT.

When selecting the best results from the evaluation table, first compare on MatchGrade. If two passes for the same cluster have the same MatchGrade, then select the result with the highest RMS.

Background / Context

RFC-002

Testing Considerations

Please include algorithm test results of the 84 test cases NBS shared with us, using both the existing codebase and the RMS implementation.

Metadata

Metadata

Assignees

Labels

apiNew API feature

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions