Skip to content

Improve evaluation of missing data fields #235

@ericbuckley

Description

@ericbuckley

Summary

Evaluating two records on a feature, where one of those records doesn't have a value, penalizes the comparison too much. For example, if two birthdates don't match (1980-01-01 and 1990-06-05) that should result in a worse score than a record with 1980-01-01 and a record that is missing a birthdate. Right now, the former gets a better score.

Acceptance Criteria

  • Add a new algorithm config value of "defaults/compare_missing_percentage", default to 0.5
  • Update the link.compare method
  • Update the matchers.compare_* signatures to indicate when data is missing
  • New test cases for the compare method

Details / Tasks

The compare method needs to use the new "defaults/compare_missing_percentage" and the "defaults/compare_minimum_percentage" when calculating the value. When comparing the incoming record to an existing one, compare_missing_percentage of the log odds value should be rewarded for the feature comparison if either value is missing.

When comparing two records, if more than compare_minimum_percentage is missing from the comparison, the evaluation as a whole should result in a 0 score.

Dependencies

#230

Metadata

Metadata

Assignees

Labels

apiNew API feature

Type

No fields configured for Task.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions