Skip to content

Do not migrate function in new atomic slot migration#2547

Merged
enjoy-binbin merged 3 commits into
valkey-io:unstablefrom
enjoy-binbin:slot_migration_function
Aug 29, 2025
Merged

Do not migrate function in new atomic slot migration#2547
enjoy-binbin merged 3 commits into
valkey-io:unstablefrom
enjoy-binbin:slot_migration_function

Conversation

@enjoy-binbin

Copy link
Copy Markdown
Member

If all cluster nodes have functions, slot migration will fail
since the target will return the function already exists error
when doing the FUNCTION LOAD.

And in addition, the target's replica could panic when it executes
the FUNCTION LOAD propagated from the primary (see propagation-error-behavior).

Introduced in #1949.

If all cluster nodes have functions, slot migration will fail
since the target will return the function already exists error
when doing the FUNCTION LOAD.

And in addition, the target's replica could panic when it executes
the FUNCTION LOAD propagated from the primary (see propagation-error-behavior).

Introduced in valkey-io#1949.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@enjoy-binbin enjoy-binbin moved this to In Progress in Valkey 9.0 Aug 26, 2025
@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label Aug 26, 2025
@enjoy-binbin enjoy-binbin requested review from PingXie, madolson, murphyjacob4 and zuiderkwast and removed request for murphyjacob4 August 26, 2025 12:16
@enjoy-binbin enjoy-binbin marked this pull request as ready for review August 26, 2025 12:17
Signed-off-by: Binbin <binloveplay1314@qq.com>
@codecov

codecov Bot commented Aug 26, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 72.17%. Comparing base (40fe422) to head (89f56ee).
⚠️ Report is 33 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2547      +/-   ##
============================================
+ Coverage     72.04%   72.17%   +0.12%     
============================================
  Files           126      126              
  Lines         70615    70614       -1     
============================================
+ Hits          50872    50963      +91     
+ Misses        19743    19651      -92     
Files with missing lines Coverage Δ
src/aof.c 81.17% <ø> (-0.02%) ⬇️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread tests/unit/cluster/cluster-migrateslots.tcl

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. (I didn't review the test.)

Signed-off-by: Binbin <binloveplay1314@qq.com>
@zuiderkwast

Copy link
Copy Markdown
Contributor

I don't think we need to mention this in the release notes. We just modified (fixed) behavior that's not yet released.

@enjoy-binbin

Copy link
Copy Markdown
Member Author

I don't think we need to mention this in the release notes. We just modified (fixed) behavior that's not yet released.

We release this in RC1, didn't we? So when we release RC2 we need to mention the fix?

@enjoy-binbin enjoy-binbin merged commit 39fade4 into valkey-io:unstable Aug 29, 2025
52 checks passed
@github-project-automation github-project-automation Bot moved this from In Progress to Done in Valkey 9.0 Aug 29, 2025
@enjoy-binbin enjoy-binbin deleted the slot_migration_function branch August 29, 2025 02:18
@enjoy-binbin enjoy-binbin moved this from Done to In Progress in Valkey 9.0 Aug 29, 2025
@enjoy-binbin enjoy-binbin moved this from In Progress to RC2 in Valkey 9.0 Aug 29, 2025
@zuiderkwast

Copy link
Copy Markdown
Contributor

I don't think we need to mention this in the release notes. We just modified (fixed) behavior that's not yet released.

We release this in RC1, didn't we? So when we release RC2 we need to mention the fix?

Good point. But this was not documented behaviour. Should this PR be listed as a bugfix in the RC2 release notes?

@zuiderkwast zuiderkwast added the bug Something isn't working label Aug 30, 2025
@madolson

madolson commented Sep 2, 2025

Copy link
Copy Markdown
Member

Good point. But this was not documented behaviour. Should this PR be listed as a bugfix in the RC2 release notes?

I thought this was intended behavior. How are users supposed to migrate functions? Through the CLI?

@zuiderkwast

Copy link
Copy Markdown
Contributor

@madolson Right. Migrating functions was not intended but it happened and it could cause trouble, so it's a bug in RC 1 and this is bugfix to be released in RC 2. Correct?

Or don't we need to mention it in the release notes of rc2?

@madolson

madolson commented Sep 3, 2025

Copy link
Copy Markdown
Member

We could mention it as a bug fix. I do wonder if we shouldn't just migrate the functions though? Instead of crashing we could send them with replace.

@enjoy-binbin

Copy link
Copy Markdown
Member Author

In our fork, back to 7.0, when we adapted the function, i do converted it to a replace so that the target node will have the function (Noted that we are adding an empty target node to the cluster when doing the migration).

And in the OSS version, at the time (7.0), i had the impression that functions were something that cluster administrators needed to load. For example, when adding a new node, the function needed to be loaded on all nodes (including the new one) (this is why we added function support to the CLI). In another scenario, in our OSS, the target node already exists in the cluster (in most cases), so it should already have the function.

I guess we should identify whether a function needs migration or not. Note that in the old setslot migration, we did not migrate the function either.

@zuiderkwast

Copy link
Copy Markdown
Contributor

@enjoy-binbin Agree, adding functions is more related to adding a new node to the cluster.

We could make valkey-cli --cluster add-node fetch the functions from another in the cluster, such as from the node it sends CLUSTER MEET to. The cli is more or less as a reference implementation for control planes. It's even used by some iiuc.

@madolson madolson moved this from RC2 to Done in Valkey 9.0 Sep 3, 2025
rjd15372 pushed a commit to rjd15372/valkey that referenced this pull request Sep 19, 2025
If all cluster nodes have functions, slot migration will fail
since the target will return the function already exists error
when doing the FUNCTION LOAD.

And in addition, the target's replica could panic when it executes
the FUNCTION LOAD propagated from the primary (see
propagation-error-behavior).

Introduced in valkey-io#1949.

Signed-off-by: Binbin <binloveplay1314@qq.com>
rjd15372 pushed a commit that referenced this pull request Sep 23, 2025
If all cluster nodes have functions, slot migration will fail
since the target will return the function already exists error
when doing the FUNCTION LOAD.

And in addition, the target's replica could panic when it executes
the FUNCTION LOAD propagated from the primary (see
propagation-error-behavior).

Introduced in #1949.

Signed-off-by: Binbin <binloveplay1314@qq.com>
hpatro pushed a commit to hpatro/valkey that referenced this pull request Oct 3, 2025
If all cluster nodes have functions, slot migration will fail
since the target will return the function already exists error
when doing the FUNCTION LOAD.

And in addition, the target's replica could panic when it executes
the FUNCTION LOAD propagated from the primary (see
propagation-error-behavior).

Introduced in valkey-io#1949.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Harkrishn Patro <harkrisp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cluster release-notes This issue should get a line item in the release notes

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants