-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Problem
From my testing and reading through of the code, when a topic with segments in tiered storage and on bookies is deleted, the data on the bookies eventually gets deleted (because the bookies notice that the ledger metadata is no longer in zookeeper), but the offloaded segments are orphaned and must be deleted manually. Given the way these objects are named, it is not very easy to find the right segments to delete from tiered storage.
Describe the solution you'd like
I think the brokers should eagerly delete the offloaded data when a topic is deleted. This is the behavior I expected, but I could see how the orphaning of data might actually be a feature that some people rely on. In that case, perhaps there is a feature flag to delete or orphan the offloaded data. If we add a feature flag, we'll need to add clear documentation detailing this feature because it has consequences on the tiered storage costs.
I think it would make sense to include the deletion of tiered storage segments when the owning topic is deleted. This is the point where the segments are definitely no longer needed.
Based on reading through the code, I think this deletion would come at the same time as deleting the zookeeper metadata for the offloaded ledger/segment.
One potential issue with this implementation is that there could be a lot of segments to delete, depending on the topic's retention, and that could mean that deleting the topic takes a while. I don't think this particular issue is a problem, as I think users expect deleting a topic to recursively delete the topic's data.
Describe alternatives you've considered
Alternatively, we could have some type of asynchronous process that runs on a broker and looks for offloaded segments that have been orphaned and then deletes them. I think this solution introduces a lot of complexity (how to find those segments, how often to look for them, how to coordinate their deletion, to name a few). However, it would remove some potential delays on the actual deleting of the managed ledger. I'm not familiar enough with the process of deleting a topic to know if it is bad to keep a topic around until all of the offloaded data is deleted.
Perhaps it is relevant to ask how zookeeper failures are handled when a topic is deleted. I believe the topic's deletion triggers a deletion of metadata from zookeeper. How do we handle network failures? That might provide an already defined paradigm to follow when deleting the offloaded segments.
Additional context
Additional problems that the implementation might come up against are potential rate limiting errors. For example, I know that S3 has rate limits that could get in the way. Any implementation will need to handle the possibility of these types of transient failures and have appropriate retries to ensure we don't give up on a segment for a temporary failure.
I'm happy to implement this after we have solidified the details of the improvement.