Skip to content

[Feature request] Per key cleanup for the GDPR requirements #5059

@lanwen

Description

@lanwen

Is your feature request related to a problem? Please describe.
We are using Pulsar for the event sourcing as the source of truth with the indefinite number of events. Since we are storing some personal information in the events (like name or email) and operating in Europe we have to be compliant to GDPR and be able to remove all the data for the specific key.

Describe the solution you'd like
A good thing would be to follow the same way as topic compaction - with the only difference that the only number of keys should be compacted. Admin tool and/or admin API allowing to run compaction for the specific limited set of keys.

Another approach - a way to clean up the tiered storage - like offload the topic after some time to s3 and cleanup that somehow. A possible way would be to do a filtered offload, but with the situation handled where we receive a request for deletion to already offloaded data - so that it should be loaded and re-offloaded properly cleaned with no traces of old keys.

Describe alternatives you've considered
Right now we are migrating the whole cluster regularly to clean up all the events that should be removed, what is not fun at all :)

Additional context
Kafka doesn't provide anything like that. Also, the recommendation to encrypt the data in the pulsar and later just throw away the encryption key doesn't really work since it's not considered as compliant by some of the EU governments (like in Germany).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions