[fix][broker] Invalidate metadata children cache after key deleted#20363
Conversation
|
/pulsarbot rerun-failure-checks |
|
Besides, I remember that ZK cannot remove recursively at a batch but iterating from children to parent. How the cache won't be invalidated by its children getting deleted? Anyway, invalidate cache eagerly won't introduce correctness issue. |
In the scenario described in #14779 (comment), |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #20363 +/- ##
=============================================
+ Coverage 36.93% 72.95% +36.01%
- Complexity 12032 31889 +19857
=============================================
Files 1687 1864 +177
Lines 128761 138403 +9642
Branches 14002 15185 +1183
=============================================
+ Hits 47553 100965 +53412
+ Misses 74932 29427 -45505
- Partials 6276 8011 +1735
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
Thanks for your explanation! Now I know that no children cause the cache be empty and return empty cursor even if there is some. And this patch is valid even for defensive programming (delete node should always invalid cache). Although, if the cursor is a child of the topic znode, won't I'd appreciate it if you can provide a znode view of execution steps to demonstrate the exception. |
Try me break down into two steps to explain how this error could be produced in #14779:
pseudo call stack: BTW, broker will try create a cursor znode which already exists in zk, so it'll receive Here's In summary, this is how I observed the occurrence of BadVersionException. |
|
After a discussion with @Shawyeok I learned that it's about two broker node that:
|
|
Merging... Thanks for your contribution and detailed investigation @Shawyeok! |
|
@Shawyeok Could you add a test for this patch? |
Sure, I'll do it later. |
|
This patch was aimed to fix #14779, I reproduced in my environment which is based on the However, in theory, there is a chance that the new cursor watcher event is not triggered before |
|
As discussed on the mailing list https://lists.apache.org/thread/w4jzk27qhtosgsz7l9bmhf1t7o9mxjhp, there is no plan to release 2.9.6, so I am going to remove the release/2.9.6 label |
…pache#20363) (cherry picked from commit 64831dc)
Motivation
There is a chance if a topic reassign to a broker that owned this topic once, consumer reconnect will failure with
BadVersionException, here's detail: #14779 (comment)Reproduce steps:
Modifications
Cleanup metadata children cache if managed-ledger key deleted
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
doc-not-neededMatching PR in forked repository
PR in forked repository: Shawyeok#7