-
Notifications
You must be signed in to change notification settings - Fork 549
Primary Key Improvements #8811
Description
Is your feature request related to a problem? Please describe.
Providers do not do consistent job of documenting what fields are guaranteed to uniquely define a response. As such CloudQuery is forced to release periodic breaking changes in order to fix Primary Key violations.
Describe the solution you'd like
Rather than having composite Primary Keys which require a breaking schema change in order to fix, we are looking into supporting using a deterministic hashing method to define a single unique column whose value would be changed by changing the columns that are used to calculate the hash. This solution would mean PK changes would be non-breaking for users that are using the overwrite-delete-stale mode, but if a user is using overwrite there is nothing that can be done to fix the duplicate PK issues as changing the hash will always result in a new row rather than an update
We have added functionality in the plugin-sdk to make the _cq_id deterministic based on the Primary Key fields (unless there is no PK defined, then the value is still random). Proposal to making this the default behavior:
- Update SDK so that
_cq_idis deterministic based on defined Primary Key values (feature is behind an opt in flag) - Update all source plugins to use new SDK
- Update Postgres destination to not ignore
_cq_idand_parent_cq_idduring update - Introduce new transformer
transformers.WholeItemAsPKthat enables developers to easily use all fields in the PK hashing. A new issue has been opened to track this work: feat: Method for Defining hash of Entire record to be the PK #9423 - Update
plugin-sdkto support an attribute in the destination spec that enables users to opt into using the deterministic_cq_id+_parent_cq_idas the only PK(s) Support _cq_id-only PK mode #8474 -
Update all destinations that supportNot needed as handled in the SDKoverwriteoroverwrite-delete-staleto respect PK flag