Skip to content

client struct: lazy init components and optimize struct layout#1405

Merged
ranshid merged 7 commits into
valkey-io:unstablefrom
uriyage:client-optimize
Jan 8, 2025
Merged

client struct: lazy init components and optimize struct layout#1405
ranshid merged 7 commits into
valkey-io:unstablefrom
uriyage:client-optimize

Conversation

@uriyage

@uriyage uriyage commented Dec 8, 2024

Copy link
Copy Markdown
Contributor

Refactor client structure to use modular data components

Current State

The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality.

This PR's Change

  1. Lazy Initialization
  • Components are only allocated when first used:
    • PubSubData: Created on first SUBSCRIBE/PUBLISH operation
    • ReplicationData: Initialized only for replica connections
    • ModuleData: Allocated when module interaction begins
    • BlockingState: Created when first blocking command is issued
    • MultiState: Initialized on MULTI command
  1. Memory Layout Optimization:

    • Grouped related fields for better locality
    • Moved rarely accessed fields (e.g., client->name) to struct end
    • Optimized field alignment to eliminate padding
  2. Additional changes:

    • Moved watched_keys to be static allocated in the mstate struct
    • Relocated replication init logic to replication.c

Key Benefits

  • Efficient Memory Usage:
  • 45% smaller base client structure - Basic clients now use 528 bytes (down from 896).
  • Better memory locality for related operations
  • Performance improvement in high throughput scenarios. No performance regressions in other cases.

Performance Impact

Tested with 650 clients and 512 bytes values.

Single Thread Performance

Operation Dataset New (ops/sec) Old (ops/sec) Change %
SET 1 key 261,799 258,261 +1.37%
SET 3M keys 209,134 ~209,000 ~0%
GET 1 key 281,564 277,965 +1.29%
GET 3M keys 231,158 228,410 +1.20%

8 IO Threads Performance

Operation Dataset New (ops/sec) Old (ops/sec) Change %
SET 1 key 1,331,578 1,331,626 -0.00%
SET 3M keys 1,254,441 1,152,645 +8.83%
GET 1 key 1,293,149 1,289,503 +0.28%
GET 3M keys 1,152,898 1,101,791 +4.64%

Pipeline Performance (3M keys)

Operation Pipeline Size New (ops/sec) Old (ops/sec) Change %
SET 10 548,964 538,498 +1.94%
SET 20 606,148 594,872 +1.89%
SET 30 631,122 616,606 +2.35%
GET 10 628,482 624,166 +0.69%
GET 20 687,371 681,659 +0.84%
GET 30 725,855 721,102 +0.66%

Observations:

  1. Single-threaded operations show consistent improvements (1-1.4%)
  2. Multi-threaded performance shows significant gains for large datasets:
    • SET with 3M keys: +8.83% improvement
    • GET with 3M keys: +4.64% improvement
  3. Pipeline operations show consistent improvements:
    • SET operations: +1.89% to +2.35%
    • GET operations: +0.66% to +0.84%
  4. No performance regressions observed in any test scenario

Related issue:#761

@codecov

codecov Bot commented Dec 8, 2024

Copy link
Copy Markdown

Codecov Report

Attention: Patch coverage is 88.08594% with 61 lines in your changes missing coverage. Please review.

Project coverage is 70.79%. Comparing base (f20d629) to head (ff1b620).
Report is 30 commits behind head on unstable.

Files with missing lines Patch % Lines
src/module.c 4.08% 47 Missing ⚠️
src/replication.c 95.04% 10 Missing ⚠️
src/rdb.c 50.00% 2 Missing ⚠️
src/blocked.c 98.38% 1 Missing ⚠️
src/networking.c 98.36% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1405      +/-   ##
============================================
- Coverage     70.81%   70.79%   -0.02%     
============================================
  Files           118      119       +1     
  Lines         63561    64728    +1167     
============================================
+ Hits          45010    45826     +816     
- Misses        18551    18902     +351     
Files with missing lines Coverage Δ
src/acl.c 88.77% <ø> (-0.06%) ⬇️
src/aof.c 80.20% <100.00%> (+0.07%) ⬆️
src/cluster.c 88.65% <100.00%> (+0.14%) ⬆️
src/cluster_legacy.c 86.76% <ø> (+0.06%) ⬆️
src/multi.c 97.30% <100.00%> (+0.87%) ⬆️
src/pubsub.c 96.82% <100.00%> (-0.23%) ⬇️
src/script.c 89.06% <100.00%> (+1.41%) ⬆️
src/server.c 87.45% <100.00%> (+0.06%) ⬆️
src/server.h 100.00% <ø> (ø)
src/timeout.c 90.00% <100.00%> (ø)
... and 6 more

... and 46 files with indirect coverage changes

@zuiderkwast

Copy link
Copy Markdown
Contributor
  • 45% smaller base client structure - Basic clients now use 528 bytes (down from 896).

Amazing!

@zuiderkwast zuiderkwast self-requested a review December 9, 2024 10:29

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good! It's a large diff but most of the changed lines are trivial changes. I can't verify everything so I have to trust you and the tests for the correctness.

I noticed in many helper functions, we're using these client sub-structs, for example c->mstate->something, without checking that c->mstate exists. It should always exist in these cases so it's not an error, but would it make sense to add serverAssert(c->mstate != NULL) in all these places?

If we have code coverage by tests, we will find these errors anyway, so IMHO it doens't matter too match if it's a null-pointer crash or an assert. What is your reasoning about this?

Comment thread src/module.c Outdated
Comment thread src/module.c Outdated
Comment thread src/multi.c Outdated
Comment thread src/server.h Outdated
Comment thread src/multi.c

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nits now.

Comment thread src/networking.c Outdated
Comment thread src/networking.c Outdated
Comment thread src/networking.c Outdated
Comment thread src/networking.c Outdated
@zuiderkwast zuiderkwast added the release-notes This issue should get a line item in the release notes label Dec 17, 2024

@ranshid ranshid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the overall change looks O.K, but some cases are not 100% trivial IMO and the change impacts almost all logical flows. I would suggest we run the daily tests to verify this change.

Comment thread src/replication.c
Comment thread src/networking.c
@ranshid ranshid added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jan 6, 2025
Uri Yagelnik and others added 6 commits January 6, 2025 08:23
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: Uri Yagelnik <uriy@amazon.com>
@ranshid ranshid merged commit 6c09eea into valkey-io:unstable Jan 8, 2025
kronwerk pushed a commit to kronwerk/valkey that referenced this pull request Jan 27, 2025
…y-io#1405)

# Refactor client structure to use modular data components

## Current State
The client structure allocates memory for replication / pubsub /
multi-keys / module / blocked data for every client, despite these
features being used by only a small subset of clients. In addition the
current field layout in the client struct is suboptimal, with poor
alignment and unnecessary padding between fields, leading to a larger
than necessary memory footprint of 896 bytes per client. Furthermore,
fields that are frequently accessed together during operations are
scattered throughout the struct, resulting in poor cache locality.

## This PR's Change

1.  Lazy Initialization 
- **Components are only allocated when first used:**
  - PubSubData: Created on first SUBSCRIBE/PUBLISH operation
  - ReplicationData: Initialized only for replica connections
  - ModuleData: Allocated when module interaction begins
  - BlockingState: Created when first blocking command is issued
  - MultiState: Initialized on MULTI command

2. Memory Layout Optimization:
   - Grouped related fields for better locality
   - Moved rarely accessed fields (e.g., client->name) to struct end
   - Optimized field alignment to eliminate padding

3. Additional changes:
   - Moved watched_keys to be static allocated in the `mstate` struct
   - Relocated replication init logic to replication.c
  

### Key Benefits
- **Efficient Memory Usage:**
- 45% smaller base client structure - Basic clients now use 528 bytes
(down from 896).
- Better memory locality for related operations
- Performance improvement in high throughput scenarios. No performance
regressions in other cases.


### Performance Impact

Tested with 650 clients and 512 bytes values.

#### Single Thread Performance
| Operation   | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET        | 1 key   | 261,799      | 258,261      | +1.37%    |
| SET        | 3M keys | 209,134      | ~209,000     | ~0%       |
| GET        | 1 key   | 281,564      | 277,965      | +1.29%    |
| GET        | 3M keys | 231,158      | 228,410      | +1.20%    |

#### 8 IO Threads Performance
| Operation   | Dataset | New (ops/sec) | Old (ops/sec) | Change % |
|------------|---------|---------------|---------------|-----------|
| SET        | 1 key   | 1,331,578    | 1,331,626    | -0.00%    |
| SET        | 3M keys | 1,254,441    | 1,152,645    | +8.83%    |
| GET        | 1 key   | 1,293,149    | 1,289,503    | +0.28%    |
| GET        | 3M keys | 1,152,898    | 1,101,791    | +4.64%    |

#### Pipeline Performance (3M keys)
| Operation | Pipeline Size | New (ops/sec) | Old (ops/sec) | Change % |
|-----------|--------------|---------------|---------------|-----------|
| SET       | 10          | 548,964      | 538,498      | +1.94%    |
| SET       | 20          | 606,148      | 594,872      | +1.89%    |
| SET       | 30          | 631,122      | 616,606      | +2.35%    |
| GET       | 10          | 628,482      | 624,166      | +0.69%    |
| GET       | 20          | 687,371      | 681,659      | +0.84%    |
| GET       | 30          | 725,855      | 721,102      | +0.66%    |

### Observations:
1. Single-threaded operations show consistent improvements (1-1.4%)
2. Multi-threaded performance shows significant gains for large
datasets:
   - SET with 3M keys: +8.83% improvement
   - GET with 3M keys: +4.64% improvement
3. Pipeline operations show consistent improvements:
   - SET operations: +1.89% to +2.35%
   - GET operations: +0.66% to +0.84%
4. No performance regressions observed in any test scenario


Related issue:valkey-io#761

---------

Signed-off-by: Uri Yagelnik <uriy@amazon.com>
Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 7, 2025
yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 7, 2025
yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 7, 2025
yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 10, 2025
This pr will fix valkey-io#1683

After valkey-io#1405, cause this
coredump.

Signed-off-by: bodong.ybd <bodong.ybd@alibaba-inc.com>
enjoy-binbin added a commit that referenced this pull request Feb 10, 2025
After #1405, `client trackinginfo` will crash when tracking is off
```
/lib64/libpthread.so.0(+0xf630)[0x7fab74931630]
./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db]
./src/valkey-server *:6379(call+0x5ba)[0x5a791a]
./src/valkey-server *:6379(processCommand+0x968)[0x5a8938]
./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d]
./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9]
./src/valkey-server *:6379[0x46fa4d]
./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9]
./src/valkey-server *:6379(main+0x4e1)[0x455821]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555]
./src/valkey-server *:6379[0x4564f2]
```

The reason is that we did not init pubsub_data by default, we only
init it when tracking on.

Fixes #1683.

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
…#1684)

After valkey-io#1405, `client trackinginfo` will crash when tracking is off
```
/lib64/libpthread.so.0(+0xf630)[0x7fab74931630]
./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db]
./src/valkey-server *:6379(call+0x5ba)[0x5a791a]
./src/valkey-server *:6379(processCommand+0x968)[0x5a8938]
./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d]
./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9]
./src/valkey-server *:6379[0x46fa4d]
./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9]
./src/valkey-server *:6379(main+0x4e1)[0x455821]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555]
./src/valkey-server *:6379[0x4564f2]
```

The reason is that we did not init pubsub_data by default, we only
init it when tracking on.

Fixes valkey-io#1683.

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
xbasel pushed a commit to xbasel/valkey that referenced this pull request Mar 27, 2025
…#1684)

After valkey-io#1405, `client trackinginfo` will crash when tracking is off
```
/lib64/libpthread.so.0(+0xf630)[0x7fab74931630]
./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db]
./src/valkey-server *:6379(call+0x5ba)[0x5a791a]
./src/valkey-server *:6379(processCommand+0x968)[0x5a8938]
./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d]
./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9]
./src/valkey-server *:6379[0x46fa4d]
./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9]
./src/valkey-server *:6379(main+0x4e1)[0x455821]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555]
./src/valkey-server *:6379[0x4564f2]
```

The reason is that we did not init pubsub_data by default, we only
init it when tracking on.

Fixes valkey-io#1683.

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
murphyjacob4 pushed a commit to enjoy-binbin/valkey that referenced this pull request Apr 13, 2025
…#1684)

After valkey-io#1405, `client trackinginfo` will crash when tracking is off
```
/lib64/libpthread.so.0(+0xf630)[0x7fab74931630]
./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db]
./src/valkey-server *:6379(call+0x5ba)[0x5a791a]
./src/valkey-server *:6379(processCommand+0x968)[0x5a8938]
./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d]
./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9]
./src/valkey-server *:6379[0x46fa4d]
./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9]
./src/valkey-server *:6379(main+0x4e1)[0x455821]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555]
./src/valkey-server *:6379[0x4564f2]
```

The reason is that we did not init pubsub_data by default, we only
init it when tracking on.

Fixes valkey-io#1683.

Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>
enjoy-binbin added a commit to enjoy-binbin/valkey that referenced this pull request Mar 13, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in valkey-io#1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
@github-project-automation github-project-automation Bot moved this to To be backported in Valkey 8.1 Mar 13, 2026
@enjoy-binbin enjoy-binbin moved this from To be backported to Done in Valkey 8.1 Mar 13, 2026
enjoy-binbin added a commit that referenced this pull request Mar 17, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
JimB123 pushed a commit that referenced this pull request Mar 19, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request Apr 16, 2026
…#3359)

The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in valkey-io#1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
madolson pushed a commit that referenced this pull request Apr 27, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 7, 2026
…#3359)

The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in valkey-io#1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
sarthakaggarwal97 pushed a commit to sarthakaggarwal97/valkey that referenced this pull request May 7, 2026
…#3359)

The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in valkey-io#1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request May 18, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast pushed a commit that referenced this pull request Jun 2, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
zuiderkwast pushed a commit that referenced this pull request Jun 2, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 5, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
valkeyrie-ops Bot pushed a commit that referenced this pull request Jun 10, 2026
The multiStateMemOverhead() function was incorrectly calculating the
memory overhead for watched keys. It used sizeof(c->mstate->watched_keys)
which is the size of the list structure itself, instead of sizeof(watchedKey)
which is the actual per-key overhead.

This was introduced in #1405.

Signed-off-by: Binbin <binloveplay1314@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes This issue should get a line item in the release notes run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants