Fix potential memory leaks when a writable replica is promoted to primary after direct writes of keys with expiration by li-benson · Pull Request #2953 · valkey-io/valkey

li-benson · 2025-12-19T06:59:07Z

The bug mentioned in this #2952 has been fixed.

dvkashapov · 2025-12-19T07:13:42Z

Can't we somewhere in replica write logic call rememberreplicaKeyWithExpire()? That way we won't need to manually clean up anything or am I missing something?

Also, can you please signoff commits using git commit --signoff to pass DCO check?

codecov · 2025-12-19T07:18:12Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.28%. Comparing base (0ee4234) to head (9b55702).
⚠️ Report is 47 commits behind head on unstable.

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #2953      +/-   ##
============================================
- Coverage     74.29%   74.28%   -0.01%     
============================================
  Files           129      129              
  Lines         70974    70977       +3     
============================================
- Hits          52731    52727       -4     
- Misses        18243    18250       +7

Files with missing lines	Coverage Δ
src/server.c	`89.52% <100.00%> (+0.09%)`	⬆️

... and 29 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

li-benson · 2025-12-19T07:26:13Z

Can't we somewhere in replica write logic call rememberreplicaKeyWithExpire()? That way we won't need to manually clean up anything or am I missing something?

Also, can you please signoff commits using git commit --signoff to pass DCO check?

Yes, currently, when a writable replica receives a write operation for a key with an expiration time, it invokes the rememberReplicaKeyWithExpire function. This function stores the key in the replicaKeysWithExpire dictionary, which is used to proactively and periodically clean up these expired keys on the replica. However, when the node is promoted to a primary, this dictionary is no longer required—and the current logic lacks any mechanism to clean it up, resulting in a memory leak.
Our application scenario is as follows: during server maintenance, we first switch all traffic to the replica, and then promote it to a primary node. During this process, we observed a memory leak on the newly promoted primary node, so a fix is required.

dvkashapov · 2025-12-19T07:49:35Z

during server maintenance, we first switch all traffic to the replica, and then promote it to a primary node.

Wow, that's unusual, don't you want to let your client to retry writes to primary after promotion?

But the need to expire those keys makes sense to me. If we plan to have repl_expire_clean_enabled always enabled, maybe we need that logic inside activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW); and add comment to signify that this is needed when writable replica is used?

But other than that - LGTM

li-benson · 2025-12-19T08:17:40Z

during server maintenance, we first switch all traffic to the replica, and then promote it to a primary node.

Wow, that's unusual, don't you want to let your client to retry writes to primary after promotion?

But the need to expire those keys makes sense to me. If we plan to have repl_expire_clean_enabled always enabled, maybe we need that logic inside activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW); and add comment to signify that this is needed when writable replica is used?

But other than that - LGTM

At first, I also considered whether to place this piece of logic into activeExpireCycle(ACTIVE_EXPIRE_CYCLE_SLOW). However, upon further reflection, I realized there are still differences between the two functionalities: activeExpireCycle impacts actual keys (e.g., deleting them), whereas flushReplicaKeysWithExpireList only cleans up useless data, frees leaked memory, and has no impact on keys whatsoever. I believe this functionality does not fall under the scope of active expiration and should not be governed by the active_expire_enabled parameter; instead, it ought to be controlled by a separate parameter repl_expire_clean_enabled. What do you think?

dvkashapov · 2025-12-19T08:23:24Z

If we have some key that shouldn't be here then we need to delete, I don't think we need a parameter here to control that, thats why maybe it's better to put this check alongside activeExpireCycle?

li-benson · 2025-12-19T08:31:58Z

If we have some key that shouldn't be here then we need to delete, I don't think we need a parameter here to control that, thats why maybe it's better to put this check alongside activeExpireCycle?

On second thought, your consideration makes sense. We can remove this newly added parameter as well—it will cut down unnecessary redundancy. I’ll make the changes.

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

dvkashapov

LGTM but DCO is failing, need to signoff again

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

ranshid · 2025-12-23T08:31:55Z

We are getting rid of the dictionary but the keys themselves (which were never replicated) are still in the main dictionary right? these keys could be replicated right after when the old replica will restart and sync with the new promoted primary and there might still be a leakage since there is nothing that will free these keys.
So the way I know the writable replica feature, it is mainly used when RFR is used and there is a need to store some intermediate data (eg some side calculation like a set union or something). Doing this would basically mean that such operations would break in case a failover occurred and they are done outside of a script/transaction context (not sure how important this case is though).

IMO it is fine to keep the 2nd limitation (ie do NOT persist the replica written keys after failover) but #1 is something that needs attention IMO. maybe it is important to flush them and delete them when a replica is promoted, rather than to only delete the expiration dictionary?
It might also be possible to keep the keys and manage expiring them as part of the primary expiration cycle, but it would also require to avoid replicating these keys (which is not trivial at all)

li-benson · 2025-12-23T10:03:16Z

We are getting rid of the dictionary but the keys themselves (which were never replicated) are still in the main dictionary right? these keys could be replicated right after when the old replica will restart and sync with the new promoted primary and there might still be a leakage since there is nothing that will free these keys.

So the way I know the writable replica feature, it is mainly used when RFR is used and there is a need to store some intermediate data (eg some side calculation like a set union or something). Doing this would basically mean that such operations would break in case a failover occurred and they are done outside of a script/transaction context (not sure how important this case is though).

IMO it is fine to keep the 2nd limitation (ie do NOT persist the replica written keys after failover) but #1 is something that needs attention IMO. maybe it is important to flush them and delete them when a replica is promoted, rather than to only delete the expiration dictionary? It might also be possible to keep the keys and manage expiring them as part of the primary expiration cycle, but it would also require to avoid replicating these keys (which is not trivial at all)

Yes, writing expired keys to a replica stores them in both the main dict and replica expiration dict (for proactive expiration). When the replica becomes primary, the replica expiration dict becomes obsolete and unused—undeniably a memory leak.

Are you saying that besides this metadata, the actual replica-written keys & values are also unused by the primary and should be cleaned up (and not replicated to new replicas)? This is reasonable in theory, but seems overly radical to me. We cannot guarantee the primary will never need this data. In our failover scenario: traffic is switched to the replica first, then the replica is promoted to primary—we need this data post-promotion, so deleting it would contradict expectations.

Maybe we can be conservative for the time being?

ranshid · 2025-12-23T10:38:45Z

We are getting rid of the dictionary but the keys themselves (which were never replicated) are still in the main dictionary right? these keys could be replicated right after when the old replica will restart and sync with the new promoted primary and there might still be a leakage since there is nothing that will free these keys.

So the way I know the writable replica feature, it is mainly used when RFR is used and there is a need to store some intermediate data (eg some side calculation like a set union or something). Doing this would basically mean that such operations would break in case a failover occurred and they are done outside of a script/transaction context (not sure how important this case is though).

IMO it is fine to keep the 2nd limitation (ie do NOT persist the replica written keys after failover) but #1 is something that needs attention IMO. maybe it is important to flush them and delete them when a replica is promoted, rather than to only delete the expiration dictionary? It might also be possible to keep the keys and manage expiring them as part of the primary expiration cycle, but it would also require to avoid replicating these keys (which is not trivial at all)

Yes, writing expired keys to a replica stores them in both the main dict and replica expiration dict (for proactive expiration). When the replica becomes primary, the replica expiration dict becomes obsolete and unused—undeniably a memory leak.

Are you saying that besides this metadata, the actual replica-written keys & values are also unused by the primary and should be cleaned up (and not replicated to new replicas)? This is reasonable in theory, but seems overly radical to me. We cannot guarantee the primary will never need this data. In our failover scenario: traffic is switched to the replica first, then the replica is promoted to primary—we need this data post-promotion, so deleting it would contradict expectations.

Maybe we can be conservative for the time being?

I am saying that the "leak" is not really being fixed. not only the keys are kept on the primary follow the failover, they are also potentially replicated to the replicas which are performing full sync follow this failover.
In general there are many holes in the writable replicas design:

Not all written keys expiration is managed only those written to dbid < 64)
these keys persist after failover, but only on full synced replicas. this means that the cluster data would be out of sync following failover and this would not show on the repl offset.

I think there is no "good" solution in the existing implementation. Probably better solution should be that each key written on a replica is also recorded in the replicaKeysWithExpire, even when it has NO expiration (meaning it should be remembered in dbAddInternal). We can maintain separation of these keys by verify and fail the replication or delete the key when a key is attempted to be written to by the replica client and it exists in the "remember dict". We can also flush all these keys right when we become primary (and not via background active expiration job) - maybe in replicationUnsetPrimary?

Maybe we can be conservative for the time being?

We can. I think that for keys with expiration, they should probably get expired eventually, so I guess this will not introduce any degradation.

ranshid

Overall looks simple enough. Left some comments.
One thing I do think about is if this should have been an info field or not?
I know some prefer avoid adding more and more info fields, but IMO it makes it much easier to identify issues and help debug cases when it is exposed in info.
@madolson (IIRC you said you prefer to avoid adding unnecessary info fields) WDYT?

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

ranshid

Overall LGTM, accept the indication via DEBUG command rather than an info field.
I know actually some other maintainers PREFER to avoid adding info fields, so will wait to hear their opinions

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

ranshid · 2026-01-06T09:39:54Z

@li-benson IMO this can be placed in the Memory, Debug or Replication sections of the INFO output. Debug commands are mostly used in order to allow tests to run, but I find this information something that applications can still use in some ways. Personally I feel it can be part of the Memory section, since this is probably the main purpose of this data.

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

li-benson · 2026-01-06T11:49:57Z

@li-benson IMO this can be placed in the Memory, Debug or Replication sections of the INFO output. Debug commands are mostly used in order to allow tests to run, but I find this information something that applications can still use in some ways. Personally I feel it can be part of the Memory section, since this is probably the main purpose of this data.

I just found this information in info Stats: slave_expires_tracked_keys. It's redundant for me to use the debug command to get it, so I'll make a modification.

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

…mary after direct writes of keys with expiration (valkey-io#2953) The bug mentioned in this valkey-io#2952 has been fixed. --------- Signed-off-by: li-benson <1260437731@qq.com>

…mary after direct writes of keys with expiration (valkey-io#2953) The bug mentioned in this valkey-io#2952 has been fixed. --------- Signed-off-by: li-benson <1260437731@qq.com> Signed-off-by: Harkrishn Patro <bunty.hari@gmail.com>

github-actions Bot assigned li-benson Dec 19, 2025

li-benson force-pushed the fix_replciaExpireDict branch from 679a224 to abb7706 Compare December 19, 2025 07:31

dvkashapov self-requested a review December 19, 2025 07:49

Fix potential memory leaks when switching to primary after writing ex…

db85fed

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

li-benson force-pushed the fix_replciaExpireDict branch from abb7706 to db85fed Compare December 19, 2025 08:54

dvkashapov approved these changes Dec 19, 2025

View reviewed changes

Fix potential memory leaks when switching to primary after writing ex…

d89c876

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

li-benson force-pushed the fix_replciaExpireDict branch from b473ef3 to d89c876 Compare December 19, 2025 09:02

ranshid reviewed Jan 1, 2026

View reviewed changes

li-benson added 2 commits January 4, 2026 09:10

Fix potential memory leaks when switching to primary after writing ex…

36fd553

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

Merge branch 'unstable' into fix_replciaExpireDict

58a75b0

ranshid reviewed Jan 4, 2026

View reviewed changes

Comment thread tests/unit/expire.tcl

li-benson added 2 commits January 5, 2026 06:27

Fix potential memory leaks when switching to primary after writing ex…

68cfd08

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

Merge branch 'unstable' into fix_replciaExpireDict

5138aa5

Fix potential memory leaks when switching to primary after writing ex…

f34fd0e

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

li-benson added 2 commits January 6, 2026 11:51

Fix potential memory leaks when switching to primary after writing ex…

ee468ea

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

Fix potential memory leaks when switching to primary after writing ex…

9b55702

…pired keys to replica Signed-off-by: li-benson <1260437731@qq.com>

ranshid approved these changes Jan 6, 2026

View reviewed changes

ranshid merged commit f5d7dc8 into valkey-io:unstable Jan 28, 2026
24 checks passed

Uh oh!

Conversation

li-benson commented Dec 19, 2025

Uh oh!

dvkashapov commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

li-benson commented Dec 19, 2025

Uh oh!

dvkashapov commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

li-benson commented Dec 19, 2025

Uh oh!

dvkashapov commented Dec 19, 2025

Uh oh!

li-benson commented Dec 19, 2025

Uh oh!

dvkashapov left a comment

Choose a reason for hiding this comment

Uh oh!

ranshid commented Dec 23, 2025

Uh oh!

li-benson commented Dec 23, 2025

Uh oh!

ranshid commented Dec 23, 2025

Uh oh!

ranshid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranshid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ranshid commented Jan 6, 2026

Uh oh!

li-benson commented Jan 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dvkashapov commented Dec 19, 2025 •

edited

Loading

codecov Bot commented Dec 19, 2025 •

edited

Loading

dvkashapov commented Dec 19, 2025 •

edited

Loading