-
Notifications
You must be signed in to change notification settings - Fork 550
Deletion of Stale Data #2168
Description
Overview:
In v1 CloudQuery never deletes any data. This means that resources that have been deleted in the source are never updated or removed. This negatively impacts users in a few ways:
- Deleted resources that fail policy evaluations will always be marked as
failed - Users must manually access the CQ database and identify which tables/rows they want to delete
So overall CloudQuery needs a mechanism for indicating to users that a resource no longer is present in the source. This can be either be marking a resource as deleted via a boolean/timestamp column or by actually deleting the record. During this process CloudQuery should only interact with resources that were fetched.
Cases:
All examples are for AWS, but apply to any resource that supports dynamic multiplexing
- Single Config File hard coded accounts
- At the end of fetch the only resources that should be left are those resources that existed in the source accounts
- Single Config File multiple concurrent triggers + overlapping accounts and resources
- Case does not need to be supported
- Single Config File dynamic accounts (via Orgs)
- At the end of the fetch, for each multiplexed resource that was successfully fetched we should delete records that were not present in the source
- Multiple Config files and triggers: Same hard coded accounts + unique resources
- At the end of the fetch, for each multiplexed resource that was successfully fetched we should delete records that were not present in the source
- Multiple Config files and triggers: Different hard coded accounts + overlapping resources
- At the end of the fetch, for each multiplexed resource that was successfully fetched we should delete records that were not present in the source
- Multiple Config files and concurrent triggers: Same dynamic accounts + unique resources
- At the end of the fetch, for each multiplexed resource that was successfully fetched we should delete records that were not present in the source
- Multiple Config files and concurrent triggers: Same dynamic accounts + overlapping resources
- Case does not need to be supported
Biggest Take Aways:
- Deletion of deleted resources is required
- Fetching of concurrent non overlapping resources is required
- Resources should only be deleted if the list was successful
Open Question:
- If a dynamic multiplexing returns different data or if hard coded accounts change should CloudQuery assume the data changed because resources were deleted?
- Permissions might have changed so that CQ is not able to fetch them
- Configuration might have changed to narrow the data being fetched
Notes:
When referring to CQ deleting records that have been removed from the source, I am not saying that CQ must delete from DB, only that the records need to easily be identified