Skip to content

Fix heap-use-after-free in ACL LOAD when client free is deferred#3800

Merged
ranshid merged 4 commits into
valkey-io:unstablefrom
ranshid:fix/acl-load-uaf-protected-client
May 21, 2026
Merged

Fix heap-use-after-free in ACL LOAD when client free is deferred#3800
ranshid merged 4 commits into
valkey-io:unstablefrom
ranshid:fix/acl-load-uaf-protected-client

Conversation

@ranshid

@ranshid ranshid commented May 21, 2026

Copy link
Copy Markdown
Member

Bug

When ACL LOAD deletes a user, it calls freeClientOrCloseLater() on clients authenticated as that user. If freeClient() defers the actual free (because the client has flag.protected set or has pending IO), the client remains in server.clients with c->user still pointing to the old user object.

After raxFreeWithCallback(old_users, ACLFreeUserVoid) frees all old user objects, c->user becomes a dangling pointer. Any subsequent code that dereferences c->user on the deferred client triggers a heap-use-after-free.

Crash sequence

  1. Client A runs ACL LOAD
  2. Client B is authenticated as a user that was removed from the ACL file
  3. Client B has flag.protected set (e.g. has pending IO (IO threads)
  4. ACLLoadFromFilefreeClientOrCloseLater(B, 0)freeClient(B) → defers to freeClientAsync because B is protected
  5. raxFreeWithCallback(old_users, ...) frees all old user objects including B's user
  6. B is still in server.clients with c->user pointing to freed memory
  7. CLIENT LIST or any path that calls catClientInfoString dereferences c->user->nameUAF

Fix

Set c->user to default user before calling freeClientOrCloseLater() in ACLLoadFromFile. The existing NULL guard in catClientInfoString (client->user ? client->user->name : "(superuser)") handles this safely.

Test

Added DEBUG PROTECT-CLIENT <client-id> subcommand that calls protectClient() on the target, forcing freeClient to defer to async. This enables deterministic reproduction without timing races.

The test:

  1. Creates a user, connects a client as that user
  2. Protects the client via DEBUG PROTECT-CLIENT
  3. Removes the user from the ACL file and runs ACL LOAD
  4. Runs CLIENT LIST which dereferences c->user->name

Without the fix, this crashes under ASAN with:

ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile

When ACL LOAD deletes a user, it calls freeClientOrCloseLater() on
clients authenticated as that user. If freeClient() defers the free
(because the client is protected or has pending IO), the client remains
in server.clients with c->user still pointing to the old user object.
After raxFreeWithCallback(old_users, ...) frees all old users, c->user
becomes a dangling pointer.

Any subsequent code that dereferences c->user on the deferred client
(e.g. CLIENT LIST via catClientInfoString, or clientsCron sub-functions)
triggers a heap-use-after-free.

Fix by setting c->user = NULL before calling freeClientOrCloseLater(),
so the pointer is scrubbed before the old user objects are freed. The
existing NULL guard in catClientInfoString (client->user ? ... :
"(superuser)") handles this safely.

Added DEBUG PROTECT-CLIENT <id> subcommand to enable deterministic
reproduction by forcing freeClient to defer via protectClient().

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Clears a client's c->user before disconnecting it during ACL LOAD when pub/sub permissions change, adds DEBUG PROTECT-CLIENT <id> to protect clients during tests, and adds a regression test verifying protected clients survive an ACL LOAD that removes their user.

Changes

ACL Load Client User Pointer Fix

Layer / File(s) Summary
Debug command for client protection
src/debug.c
Adds PROTECT-CLIENT <id> to DEBUG help and implements a handler that parses an unsigned client id, looks up the client, and calls protectClient(); replies with OK, No such client, or Invalid client id.
ACL load user pointer cleanup
src/acl.c
When ACL LOAD disconnects a pub/sub client due to permission revocation, sets c->user = DefaultUser before calling freeClientOrCloseLater() to avoid dangling user pointers.
Regression test for ACL load fix
tests/unit/acl.tcl
Adds a {needs:debug} test that creates a temporary ACL user, protects an authenticated client via DEBUG PROTECT-CLIENT, removes the user from the ACL file, runs ACL LOAD, and verifies the protected client remains in CLIENT LIST.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main fix: resolving a heap-use-after-free issue in ACL LOAD when client deallocation is deferred.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description thoroughly explains the bug, crash sequence, fix, and test, all directly related to the changeset in src/acl.c, src/debug.c, and tests/unit/acl.tcl.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@dvkashapov dvkashapov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I should have thought about that in #3182 xD

@ranshid ranshid marked this pull request as ready for review May 21, 2026 04:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/debug.c`:
- Around line 1072-1080: The DEBUG subcommand list doesn't include the new
"protect-client" subcommand; update the DEBUG HELP output to advertise it by
adding a help line for "protect-client" to the same help table/string used to
generate DEBUG HELP (look for the HELP handling in debug.c—the function/array
that emits the DEBUG subcommands list, e.g., the DEBUG help text or
debugCommand/help branch). Add a concise entry matching the style of existing
entries, e.g. "protect-client <id> -- protect a client from eviction/closing",
and ensure formatting and ordering are consistent with the other DEBUG help
lines.
- Around line 1072-1074: The current parse of the client id using
strtoull(objectGetVal(c->argv[2]), NULL, 10) can accept partial numeric strings
like "12x"; change it to use a char *endptr and errno checks: call strtoull with
&endptr, set errno=0 beforehand, then ensure endptr points to the string
terminator (no extra characters), the start is not equal to endptr (must parse
at least one digit), and errno is not ERANGE (overflow). If any check fails,
return an error response and do not call lookupClientByID; update the
"protect-client" branch that reads c->argv[2] accordingly.

In `@tests/unit/acl.tcl`:
- Around line 1140-1141: The test currently only checks that the CLIENT LIST
output (variable cl from r CLIENT LIST) is non-empty, which can pass even if the
protected client is missing; update the assertion to explicitly verify the
protected client id is present in cl (e.g., assert {[string match -nocase
"*$protected_client_id*" $cl]} or use [string first $protected_client_id $cl] >=
0) and include a clear error message like "expected protected client
$protected_client_id in CLIENT LIST"; replace the existing assert {[string
length $cl] > 0} with this explicit presence check referencing cl and the
protected client id variable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: e8c294db-3232-41a6-89d3-d9ee30aef147

📥 Commits

Reviewing files that changed from the base of the PR and between 4b25a70 and ba81baf.

📒 Files selected for processing (3)
  • src/acl.c
  • src/debug.c
  • tests/unit/acl.tcl

Comment thread src/debug.c
Comment thread src/debug.c
Comment thread tests/unit/acl.tcl Outdated

@enjoy-binbin enjoy-binbin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, did not review the test code.

@ranshid ranshid force-pushed the fix/acl-load-uaf-protected-client branch from ba81baf to 2ca3e90 Compare May 21, 2026 05:03
…rtion

- Add PROTECT-CLIENT to DEBUG HELP output
- Validate client ID with endptr/errno checks to reject partial
  numeric strings like "12x"
- Strengthen test assertion to verify the protected client ID is
  present in CLIENT LIST output, not just that output is non-empty

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid force-pushed the fix/acl-load-uaf-protected-client branch from 2ca3e90 to f8b51c7 Compare May 21, 2026 05:05

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/acl.c`:
- Around line 2673-2674: Currently the code sets c->user = NULL before calling
freeClientOrCloseLater(c, 0), which momentarily makes the connection pass ACL
checks (NULL == allow-all); instead, rebind the client to the
unauthenticated/default ACL user (the same mechanism used in
ACLFreeUserAndKillClients) before calling freeClientOrCloseLater so the client
is not treated as privileged—i.e., replace c->user = NULL with code that assigns
the unauthenticated/default user (and handle refcounts if applicable) and then
call freeClientOrCloseLater(c, 0).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: aba235f7-73af-4cdf-840d-f6a610f11aec

📥 Commits

Reviewing files that changed from the base of the PR and between ba81baf and 2ca3e90.

📒 Files selected for processing (3)
  • src/acl.c
  • src/debug.c
  • tests/unit/acl.tcl

Comment thread src/acl.c Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/unit/acl.tcl`:
- Line 1141: The test currently uses assert_match "*id=$cid*" which can match
IDs like "id=${cid}7"; update the pattern to require a trailing space so the
client id is matched as a full token (e.g. change the glob to "*id=$cid *" or an
equivalent regex with a word boundary) in the assert_match call that references
$cid to avoid false positives.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d015ac0c-4f98-48aa-ae5d-62ff3a31719c

📥 Commits

Reviewing files that changed from the base of the PR and between 2ca3e90 and f8b51c7.

📒 Files selected for processing (2)
  • src/debug.c
  • tests/unit/acl.tcl

Comment thread tests/unit/acl.tcl Outdated
ranshid added 2 commits May 21, 2026 08:10
As noted in review, setting c->user = NULL momentarily makes the
client pass ACL checks as unrestricted. Use clientSetUser(c,
DefaultUser, 0) instead, consistent with ACLFreeUserAndKillClients.

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Add trailing space to glob pattern to avoid false positive on
client IDs that are prefixes of other IDs (e.g. id=5 matching id=57).

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.92308% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.63%. Comparing base (54fbe4a) to head (a039676).
⚠️ Report is 2 commits behind head on unstable.

Files with missing lines Patch % Lines
src/debug.c 75.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #3800      +/-   ##
============================================
- Coverage     76.68%   76.63%   -0.06%     
============================================
  Files           162      162              
  Lines         80697    80710      +13     
============================================
- Hits          61885    61854      -31     
- Misses        18812    18856      +44     
Files with missing lines Coverage Δ
src/acl.c 92.53% <100.00%> (+<0.01%) ⬆️
src/debug.c 55.16% <75.00%> (+0.20%) ⬆️

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ranshid ranshid merged commit aa7488f into valkey-io:unstable May 21, 2026
62 of 63 checks passed
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 8.1 May 21, 2026
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 8.0 May 21, 2026
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.1 May 21, 2026
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 9.0 May 21, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request May 21, 2026
### Bug

When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

### Crash sequence

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

### Fix

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

### Test

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 21, 2026
When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 21, 2026
When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@ranshid ranshid added the bug Something isn't working label May 21, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request May 29, 2026
### Bug

When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

### Crash sequence

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

### Fix

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

### Test

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
zuiderkwast pushed a commit that referenced this pull request Jun 2, 2026
When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Jun 2, 2026
@zuiderkwast zuiderkwast moved this from To be backported to 8.1.8 (WIP) in Valkey 8.1 Jun 2, 2026
zuiderkwast pushed a commit that referenced this pull request Jun 2, 2026
When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 3, 2026
When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 10, 2026
### Bug

When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

### Crash sequence

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

### Fix

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

### Test

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
ranshid added a commit that referenced this pull request Jun 11, 2026
# Backport sweep for 9.1

Automated cherry-picks from PRs marked "To be backported".

## Applied

| Source PR | Title | Detail |
|---|---|---|
| #3743 | Fix buffered_reply assert in HFE commands with module keyspace
notifications | cherry-picked in a prior sweep |
| #3766 | Fix flaky block_keyspace_notification test for HGETDEL notify
race | cherry-picked in a prior sweep |
| #3800 | Fix heap-use-after-free in ACL LOAD when client free is
deferred | cherry-picked in a prior sweep |
| #3723 | Fix double-finish and RESP reply violation in cluster slot
migration | cherry-picked in a prior sweep |
| #3872 | Redacting customer information when hide_user_data_from_log is
true in rdb.c, networking.c, debug.c and t_hash | cherry-picked in a
prior sweep |
| #3846 | Fix use-after-free in VM_RegisterClusterMessageReceiver |
cherry-picked in a prior sweep |
| #3806 | Add ALL_DBS flag to CLUSTER FLUSHSLOT for database-level ACL |
cherry-picked in a prior sweep |
| #3847 | Harden SENTINEL commands and config rewrite against
control-character injection | |
| #3801 | Validate every DB clause in COPY against ACL db= permissions |
|
| #3851 | Replace AUTOMATION_PAT with Valkeyrie Bot GitHub App token | |
| #3848 | Fix cluster AUX-field control-character and delimiter
injection | |
| #3544 | Revert "IO-Threads redesign cleanup work (#3544)" |
cherry-picked in a prior sweep |
| #3888 | Report exact dbid for COPY in ACL LOG when db= access is
denied | conflicts resolved by Claude Code |
| #3942 | Fix shard_id format specifier in UPDATE message log |  |
| #3941 | Avoid random() % 0 undefined behaviour when
cluster-node-timeout < 30 | |

---
*Generated by valkey-ci-agent using Claude Code.*

---------

Signed-off-by: Binbin <binloveplay1314@qq.com>
Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
Signed-off-by: chx9 <lovelypiska@outlook.com>
Signed-off-by: zackcam <zackcam@amazon.com>
Signed-off-by: Eran Ifrah <eifrah@amazon.com>
Signed-off-by: Jules Lasarte <lasartej@amazon.com>
Signed-off-by: jjuleslasarte <jules.lasarte@gmail.com>
Signed-off-by: akash kumar <akumdev@amazon.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: lovelypiska <lovelypiska@outlook.com>
Co-authored-by: zackcam <zackcam@amazon.com>
Co-authored-by: eifrah-aws <eifrah@amazon.com>
Co-authored-by: jjuleslasarte <jules.lasarte@gmail.com>
Co-authored-by: Jules Lasarte <lasartej@amazon.com>
Co-authored-by: Akash Kumar <45854686+akashkgit@users.noreply.github.com>
@sarthakaggarwal97 sarthakaggarwal97 moved this from To be backported to Done in Valkey 9.1 Jun 15, 2026
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 18, 2026
### Bug

When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

### Crash sequence

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

### Fix

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

### Test

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
sarthakaggarwal97 pushed a commit that referenced this pull request Jun 18, 2026
When `ACL LOAD` deletes a user, it calls `freeClientOrCloseLater()` on
clients authenticated as that user. If `freeClient()` defers the actual
free (because the client has `flag.protected` set or has pending IO),
the client remains in `server.clients` with `c->user` still pointing to
the old user object.

After `raxFreeWithCallback(old_users, ACLFreeUserVoid)` frees all old
user objects, `c->user` becomes a dangling pointer. Any subsequent code
that dereferences `c->user` on the deferred client triggers a
heap-use-after-free.

1. Client A runs `ACL LOAD`
2. Client B is authenticated as a user that was removed from the ACL
file
3. Client B has `flag.protected` set (e.g. has pending IO (IO threads)
4. `ACLLoadFromFile` → `freeClientOrCloseLater(B, 0)` → `freeClient(B)`
→ defers to `freeClientAsync` because B is protected
5. `raxFreeWithCallback(old_users, ...)` frees all old user objects
including B's user
6. B is still in `server.clients` with `c->user` pointing to freed
memory
7. `CLIENT LIST` or any path that calls `catClientInfoString`
dereferences `c->user->name` → **UAF**

Set `c->user` to default user before calling `freeClientOrCloseLater()`
in `ACLLoadFromFile`. The existing NULL guard in `catClientInfoString`
(`client->user ? client->user->name : "(superuser)"`) handles this
safely.

Added `DEBUG PROTECT-CLIENT <client-id>` subcommand that calls
`protectClient()` on the target, forcing `freeClient` to defer to async.
This enables deterministic reproduction without timing races.

The test:
1. Creates a user, connects a client as that user
2. Protects the client via `DEBUG PROTECT-CLIENT`
3. Removes the user from the ACL file and runs `ACL LOAD`
4. Runs `CLIENT LIST` which dereferences `c->user->name`

Without the fix, this crashes under ASAN with:
```
ERROR: AddressSanitizer: heap-use-after-free in catClientInfoString
READ of size 8
freed by: raxRecursiveFree → raxFreeWithCallback → ACLLoadFromFile
```

---------

Signed-off-by: Ran Shidlansik <ranshid@amazon.com>
@valkeyrie-ops valkeyrie-ops Bot moved this from To be backported to Done in Valkey 8.0 Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working release-notes This issue should get a line item in the release notes

Projects

Status: Done
Status: 8.1.8
Status: To be backported
Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants