Skip to content

Conversation

@lhotari
Copy link
Member

@lhotari lhotari commented Jun 18, 2025

Motivation

Change TopicPoliciesService interface to fix consistency issues with Topic Policies. Example issues #24393, #21303

Modifications

This is a PIP document "PIP-428: Change TopicPoliciesService interface to fix consistency issues".
The WIP implementation PR is #24427 .

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 4.1.0 milestone Jun 18, 2025
@lhotari lhotari self-assigned this Jun 18, 2025
@github-actions github-actions bot added PIP doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. labels Jun 18, 2025
Copy link
Contributor

@codelipenghui codelipenghui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for driving the proposal. And the proposed solution looks good to me.

For the critical issues that affect production systems, if there is any link or code snippet to show the existing issues (How current implementation looks like), it will be great to help the reviewers to understand the issue quickly.

@lhotari
Copy link
Member Author

lhotari commented Jun 19, 2025

Thanks for driving the proposal. And the proposed solution looks good to me.

For the critical issues that affect production systems, if there is any link or code snippet to show the existing issues (How current implementation looks like), it will be great to help the reviewers to understand the issue quickly.

@codelipenghui Thanks for the feedback. I'll follow up by adding more examples of what could go wrong when an intended update gets lost. #24393 contains examples of lost updates.

@lhotari
Copy link
Member Author

lhotari commented Jun 20, 2025

Thanks for driving the proposal. And the proposed solution looks good to me.
For the critical issues that affect production systems, if there is any link or code snippet to show the existing issues (How current implementation looks like), it will be great to help the reviewers to understand the issue quickly.

@codelipenghui Thanks for the feedback. I'll follow up by adding more examples of what could go wrong when an intended update gets lost. #24393 contains examples of lost updates.

@codelipenghui I added a code example of lost updates with commit 32d9797 .

I'm not sure if more examples would be needed to explain why it's critical for production systems that topic policies work as expected and don't loose changes. If a topic gets configured with incorrect policy, it could lead to multiple issues. One possible issue would be that there's a namespace level setting for shorter retention and there's a requirement that a particular topic which should have a longer retention. Let's say that the user configures multiple different settings for the topic policy and the retention setting gets lost. When this retention setting for topic policy gets lost, data will get lost unexpectedly.

@codelipenghui
Copy link
Contributor

I added a code example of lost updates with commit 32d9797 .

Thanks for providing the test case.

One possible issue would be that there's a namespace level setting for shorter retention and there's a requirement that a particular topic which should have a longer retention. Let's say that the user configures multiple different settings for the topic policy and the retention setting gets lost. When this retention setting for topic policy gets lost, data will get lost unexpectedly.

It should not happen since we have enabled topic compaction for system topics, which means infinite topic retention. And we also disabled message TTL, backlog quota on system topics as well.

I'm not sure if more examples would be needed to explain why it's critical for production systems that topic policies work as expected and don't loose changes.

I'm good with the new test case. I tried to debug and understand why it loses updates. The major reason is read-write consistency issue which you already mentioned in the proposal

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM.

There is one point that it's not clear when should skipUpdateWhenTopicPolicyDoesntExist be true? It would be better to document when will a plugin's method be called in Pulsar, as well as what are the arguments. This will be good for plugin authors to understand the interface to implement.

@BewareMyPower
Copy link
Contributor

BTW, unlike the interface designed to be called by users, the interface designed to be customized by users and only called internally should have a different document style.

Take SchemaStorage for example, there are two put methods:

CompletableFuture<SchemaVersion> put(String key, byte[] value, byte[] hash);
/**
* Put the schema to the schema storage.
*
* @param key The schema ID
* @param fn The function to calculate the value and hash that need to put to the schema storage
* The input of the function is all the existing schemas that used to do the schemas compatibility check
* @return The schema version of the stored schema
*/
default CompletableFuture<SchemaVersion> put(String key,
Function<CompletableFuture<List<CompletableFuture<StoredSchema>>>,
CompletableFuture<Pair<byte[], byte[]>>> fn) {
return fn.apply(getAll(key)).thenCompose(pair -> put(key, pair.getLeft(), pair.getRight()));
}

Actually the 1st put overload is never used out of the built-in implementation. The 2nd put overload is actually used. But it's very confusing for a downstream plugin's author. Actually I have encountered NPE before since only the 1st method is implemented.

@lhotari
Copy link
Member Author

lhotari commented Jun 21, 2025

Overall LGTM.

There is one point that it's not clear when should skipUpdateWhenTopicPolicyDoesntExist be true? It would be better to document when will a plugin's method be called in Pulsar, as well as what are the arguments. This will be good for plugin authors to understand the interface to implement.

@BewareMyPower Thanks for the feedback. skipUpdateWhenTopicPolicyDoesntExist should be set to true only for updates that are removals from an existing policy and could be omitted when the policy doesn't already exist. I'll take a look if I can clarify this and add better documentation.

@lhotari
Copy link
Member Author

lhotari commented Jun 21, 2025

@BewareMyPower I made some updates to address your feedback.

@lhotari
Copy link
Member Author

lhotari commented Jun 21, 2025

I have started a VOTE thread at https://lists.apache.org/thread/ndh0dhfjqf1htcmodgyyqt6k4bgdfj3x , please vote.

Copy link
Contributor

@BewareMyPower BewareMyPower left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's clear now

@lhotari
Copy link
Member Author

lhotari commented Jun 24, 2025

@lhotari lhotari merged commit 7c472e1 into apache:master Jun 24, 2025
20 checks passed
KannarFr pushed a commit to CleverCloud/pulsar that referenced this pull request Sep 22, 2025
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. PIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants