Skip to main content

Understand the certification process

The certification process creates consolidated and certified golden records from various data sources. This automated process applies rules and constraints defined in the model to transform raw source data into trusted, high-quality data that your organization can rely on for business operations and decision-making.

This page provides an overview of the certification process, including its key components, stages, and how it handles different types of records and operations.

Identify the record types and operations involved

The certification process is designed to handle different types of records and operations, depending on the entity type (basic, ID-matched, or fuzzy-matched) and the data origin (source or user-authored records).

The process handles the following record types and operations:

  • Source records: these are data pushed into the hub by middleware systems on behalf of upstream applications (publishers). Depending on the entity type, these records are either transformed directly (for basic entities), or matched and consolidated into golden records (for ID- and fuzzy-matched entities).
    note

    Consolidated records are referred to as master records. Golden records derived from these are master-based golden records.

  • User-authored records: these records are created or updated manually by users in MDM applications. Behavior depends on the entity type and application design:
    • For all entity types, users can create or update golden records that exist only in the hub. These are called data-entry-based golden records. For basic entities, this is the only method available.
    • For ID- and fuzzy-matched entities, users can create or update master records, which are then matched and consolidated into golden records. Users can also override golden record values generated from publisher data.

  • Delete operations: user-initiated deletions of golden and master records from entities where deletion is enabled.

  • Matching decisions: actions taken by data stewards for fuzzy-matched entities using duplicate managers, such as confirming, merging, or splitting groups of matching records, and accepting or rejecting suggestions.
info

The certification process is automatically generated based on the rules and constraints you configure in the data model. These rules reflect how entities and publishers should be managed according to your organization’s practices.

Understand the rules involved in the process

The certification process uses several types of rules:

  • Enrichment rules transform and standardize source and consolidated data to ensure completeness and consistency.

  • Data quality constraints and validation rules check data for referential integrity, uniqueness, required fields, and allowed values. Records that fail these checks are flagged for review. These rules apply to both source and consolidated data.

  • Match rules define logic to detect and group similar records into duplicate clusters, or match groups, which are evaluated based on match scores and confidence thresholds.

  • Survivorship rules (for ID- and fuzzy-matched entities) determine how golden record values are computed. These include:
    • A consolidation strategy that defines how values from duplicate records (identified by the matcher) are combined into a single golden record.
    • An override strategy that defines how values entered by users override the consolidated golden record value.

  • Fuzzy lookup rules use fuzzy logic to compare and identify similar but not exact records, facilitating automatic population or suggestion of reference relationships based on record similarity.

Understand the certification process for basic entities

The following diagram illustrates the certification process for basic entities. The steps described below explain each stage of the process. For details about the data tables involved (SA, AE, GD, and GH), see Data tables reference.

  1. Enrich and standardize source data: source records and authored source records undergo enrichment and standardization using SemQL and API enrichers (Java plugin and REST client) that are executed pre-consolidation.

  2. Validate source data: the quality of the enriched and standardized records is checked against various pre-consolidation constraints. Records identified as erroneous are blocked from publication, and their errors are logged for review.

  3. Publish certified data: the set of certified records is finalized and made available for downstream use, as follows:

    • Golden records are created directly from source records after they have been enriched, standardized, and validated.
    • The golden records are published for consumption.
  4. Historize data: the certification process applies the configured historization strategy. When historization is enabled, changes to golden records are captured in their history.

note

For basic entities:

  • There is no distinction between source records and authored source records; both follow the same process.
  • Source data does not pass through enrichers or validations executed post-consolidation.
  • No matching or consolidation occurs since basic entities do not have duplicate detection logic.

Understand the certification process for ID- and fuzzy-matched entities

The following diagram illustrates the certification process for ID- and fuzzy-matched entities. The steps described below explain each stage of the process. For details about the data tables involved (SD, SA, SE, AE, MI, MD, GI, GD, GE, MH, and GH), see Data tables reference.

Certification process for ID- and fuzzy-matched entities

  1. Enrich and standardize source data: source records are enriched and standardized using SemQL and API enrichers (Java plugin or REST client) that are executed pre-consolidation.

  1. Validate source data (pre-consolidation): the quality of the enriched and standardized records is checked against various pre-consolidation constraints. Records identified as erroneous are excluded from further processing, and their errors are logged.

note

For ID- and fuzzy-matched entities, authored source records are not enriched or validated at this stage of the certification process. Instead, they are enriched and validated during the steppers used when users author the data.

  1. Match and find duplicates: the matching process varies based on the entity type:
    • For fuzzy-matched entities:
      • A matcher compares pairs of records and creates match groups (or duplicate clusters).
      • Each match group is either suggested for data steward review or immediately merged and potentially confirmed as a golden record, based on the matcher’s merge policy and auto-confirm policy.
      • User-defined matching decisions on match groups override the matcher's automated choices.
    • For ID-matched entities, records with the same ID are grouped together.

  1. Consolidate data: duplicate records within match groups are consolidated into single records, using the consolidation strategy defined in the survivorship rules.

  1. Enrich consolidated data: SemQL and API enrichers (Java plugin or REST client) are executed post-consolidation to further standardize or add data to the consolidated records.

  1. Publish certified data: the integrated records are finalized and made available for downstream use, as follows:
    • Master records resulting from the matching and consolidation of source records are created or updated in the hub.
    • Golden records are created or updated based on survivorship rules and overrides:
      • Golden records derived from the matching and consolidation of source records are created or updated.
      • Any manual changes made to authored source records are applied according to the configured override rules.
      • Golden records originating from manual entries or edits performed directly within the data hub are created or updated.
        note

        These golden records are referred to as data-entry-based golden records. Unlike other golden records derived from external source data, they are generated and maintained entirely within the data hub.

      • Both master and golden records are published for consumption.

  1. Validate golden record data (post-consolidation): the quality of the golden records is checked against post-consolidation constraints. Unlike pre-consolidation validation, erroneous golden records are not blocked from publication. They remain part of the certification process and are published, while their errors are logged for review.

  1. Historize data: the certification process applies the configured historization strategy. When historization is enabled, changes to golden and master records are captured in their respective history.

Understand the deletion process

When a golden record is deleted, the deletion process manages all related data according to the propagation rules defined in the model.

  1. Apply defined propagation rules: the deletion process first evaluates the deletePropagation property of each reference relationship and applies the configured propagation rules:
    • Cascade delete: child records directly or indirectly related to the deleted record are also deleted.
      note

      You do not need to manually configure every entity that should be affected by a cascade delete. The integration job automatically identifies the entities that must be included for deletion based on the entities it manages.

    • Nullify delete: references from child records related to the deleted golden records are set to null.
    • Restrict delete: records with related child records are prevented from being deleted. If any restrictions are found, the entire deletion operation is cancelled.

  1. Propagate deletion to owned master records (for ID- and fuzzy-matched entities only): the deletion is extended to the master records linked to the deleted golden records.

  1. Publish deletion: the deletion is executed in the hub based on the configured deletion method:
    • For soft deletes, the platform keeps a trace of the deleted records, including their values, before removing them from golden and master data.
    • For hard deletes, all traces of the deleted records are removed from every data table. The only remaining trace is the record IDs of the deleted golden and master records, stored without data.

  1. Track deletion history: the deletion process applies the configured historization strategy. When historization is enabled, both soft and hard deletes are captured, so you can track deletion events for golden and master records.