Skip to content

Primary Key Improvements #8811

@bbernays

Description

@bbernays

Is your feature request related to a problem? Please describe.

Providers do not do consistent job of documenting what fields are guaranteed to uniquely define a response. As such CloudQuery is forced to release periodic breaking changes in order to fix Primary Key violations.

Describe the solution you'd like

Rather than having composite Primary Keys which require a breaking schema change in order to fix, we are looking into supporting using a deterministic hashing method to define a single unique column whose value would be changed by changing the columns that are used to calculate the hash. This solution would mean PK changes would be non-breaking for users that are using the overwrite-delete-stale mode, but if a user is using overwrite there is nothing that can be done to fix the duplicate PK issues as changing the hash will always result in a new row rather than an update

We have added functionality in the plugin-sdk to make the _cq_id deterministic based on the Primary Key fields (unless there is no PK defined, then the value is still random). Proposal to making this the default behavior:

  • Update SDK so that _cq_id is deterministic based on defined Primary Key values (feature is behind an opt in flag)
  • Update all source plugins to use new SDK
  • Update Postgres destination to not ignore _cq_id and _parent_cq_id during update
  • Introduce new transformer transformers.WholeItemAsPK that enables developers to easily use all fields in the PK hashing. A new issue has been opened to track this work: feat: Method for Defining hash of Entire record to be the PK #9423
  • Update plugin-sdk to support an attribute in the destination spec that enables users to opt into using the deterministic _cq_id + _parent_cq_id as the only PK(s) Support _cq_id-only PK mode #8474
  • Update all destinations that support overwrite or overwrite-delete-stale to respect PK flag Not needed as handled in the SDK

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions