client struct: lazy init components and optimize struct layout by uriyage · Pull Request #1405 · valkey-io/valkey

uriyage · 2024-12-08T05:30:16Z

Refactor client structure to use modular data components

Current State

The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality.

This PR's Change

Lazy Initialization

Components are only allocated when first used:
- PubSubData: Created on first SUBSCRIBE/PUBLISH operation
- ReplicationData: Initialized only for replica connections
- ModuleData: Allocated when module interaction begins
- BlockingState: Created when first blocking command is issued
- MultiState: Initialized on MULTI command

Memory Layout Optimization:
- Grouped related fields for better locality
- Moved rarely accessed fields (e.g., client->name) to struct end
- Optimized field alignment to eliminate padding
Additional changes:
- Moved watched_keys to be static allocated in the mstate struct
- Relocated replication init logic to replication.c

Key Benefits

Efficient Memory Usage:
45% smaller base client structure - Basic clients now use 528 bytes (down from 896).
Better memory locality for related operations
Performance improvement in high throughput scenarios. No performance regressions in other cases.

Performance Impact

Tested with 650 clients and 512 bytes values.

Single Thread Performance

Operation	Dataset	New (ops/sec)	Old (ops/sec)	Change %
SET	1 key	261,799	258,261	+1.37%
SET	3M keys	209,134	~209,000	~0%
GET	1 key	281,564	277,965	+1.29%
GET	3M keys	231,158	228,410	+1.20%

8 IO Threads Performance

Operation	Dataset	New (ops/sec)	Old (ops/sec)	Change %
SET	1 key	1,331,578	1,331,626	-0.00%
SET	3M keys	1,254,441	1,152,645	+8.83%
GET	1 key	1,293,149	1,289,503	+0.28%
GET	3M keys	1,152,898	1,101,791	+4.64%

Pipeline Performance (3M keys)

Operation	Pipeline Size	New (ops/sec)	Old (ops/sec)	Change %
SET	10	548,964	538,498	+1.94%
SET	20	606,148	594,872	+1.89%
SET	30	631,122	616,606	+2.35%
GET	10	628,482	624,166	+0.69%
GET	20	687,371	681,659	+0.84%
GET	30	725,855	721,102	+0.66%

Observations:

Single-threaded operations show consistent improvements (1-1.4%)
Multi-threaded performance shows significant gains for large datasets:
- SET with 3M keys: +8.83% improvement
- GET with 3M keys: +4.64% improvement
Pipeline operations show consistent improvements:
- SET operations: +1.89% to +2.35%
- GET operations: +0.66% to +0.84%
No performance regressions observed in any test scenario

Related issue:#761

codecov · 2024-12-08T05:45:42Z

Codecov Report

Attention: Patch coverage is 88.08594% with 61 lines in your changes missing coverage. Please review.

Project coverage is 70.79%. Comparing base (f20d629) to head (ff1b620).
Report is 30 commits behind head on unstable.

Files with missing lines	Patch %	Lines
src/module.c	4.08%	47 Missing ⚠️
src/replication.c	95.04%	10 Missing ⚠️
src/rdb.c	50.00%	2 Missing ⚠️
src/blocked.c	98.38%	1 Missing ⚠️
src/networking.c	98.36%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1405      +/-   ##
============================================
- Coverage     70.81%   70.79%   -0.02%     
============================================
  Files           118      119       +1     
  Lines         63561    64728    +1167     
============================================
+ Hits          45010    45826     +816     
- Misses        18551    18902     +351

Files with missing lines	Coverage Δ
src/acl.c	`88.77% <ø> (-0.06%)`	⬇️
src/aof.c	`80.20% <100.00%> (+0.07%)`	⬆️
src/cluster.c	`88.65% <100.00%> (+0.14%)`	⬆️
src/cluster_legacy.c	`86.76% <ø> (+0.06%)`	⬆️
src/multi.c	`97.30% <100.00%> (+0.87%)`	⬆️
src/pubsub.c	`96.82% <100.00%> (-0.23%)`	⬇️
src/script.c	`89.06% <100.00%> (+1.41%)`	⬆️
src/server.c	`87.45% <100.00%> (+0.06%)`	⬆️
src/server.h	`100.00% <ø> (ø)`
src/timeout.c	`90.00% <100.00%> (ø)`
... and 6 more

... and 46 files with indirect coverage changes

zuiderkwast · 2024-12-09T10:29:22Z

45% smaller base client structure - Basic clients now use 528 bytes (down from 896).

Amazing!

zuiderkwast

Very good! It's a large diff but most of the changed lines are trivial changes. I can't verify everything so I have to trust you and the tests for the correctness.

I noticed in many helper functions, we're using these client sub-structs, for example c->mstate->something, without checking that c->mstate exists. It should always exist in these cases so it's not an error, but would it make sense to add serverAssert(c->mstate != NULL) in all these places?

If we have code coverage by tests, we will find these errors anyway, so IMHO it doens't matter too match if it's a null-pointer crash or an assert. What is your reasoning about this?

zuiderkwast

Just a few nits now.

ranshid

I think the overall change looks O.K, but some cases are not 100% trivial IMO and the change impacts almost all logical flows. I would suggest we run the daily tests to verify this change.

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

…y-io#1405) # Refactor client structure to use modular data components ## Current State The client structure allocates memory for replication / pubsub / multi-keys / module / blocked data for every client, despite these features being used by only a small subset of clients. In addition the current field layout in the client struct is suboptimal, with poor alignment and unnecessary padding between fields, leading to a larger than necessary memory footprint of 896 bytes per client. Furthermore, fields that are frequently accessed together during operations are scattered throughout the struct, resulting in poor cache locality. ## This PR's Change 1. Lazy Initialization - **Components are only allocated when first used:** - PubSubData: Created on first SUBSCRIBE/PUBLISH operation - ReplicationData: Initialized only for replica connections - ModuleData: Allocated when module interaction begins - BlockingState: Created when first blocking command is issued - MultiState: Initialized on MULTI command 2. Memory Layout Optimization: - Grouped related fields for better locality - Moved rarely accessed fields (e.g., client->name) to struct end - Optimized field alignment to eliminate padding 3. Additional changes: - Moved watched_keys to be static allocated in the `mstate` struct - Relocated replication init logic to replication.c ### Key Benefits - **Efficient Memory Usage:** - 45% smaller base client structure - Basic clients now use 528 bytes (down from 896). - Better memory locality for related operations - Performance improvement in high throughput scenarios. No performance regressions in other cases. ### Performance Impact Tested with 650 clients and 512 bytes values. #### Single Thread Performance | Operation | Dataset | New (ops/sec) | Old (ops/sec) | Change % | |------------|---------|---------------|---------------|-----------| | SET | 1 key | 261,799 | 258,261 | +1.37% | | SET | 3M keys | 209,134 | ~209,000 | ~0% | | GET | 1 key | 281,564 | 277,965 | +1.29% | | GET | 3M keys | 231,158 | 228,410 | +1.20% | #### 8 IO Threads Performance | Operation | Dataset | New (ops/sec) | Old (ops/sec) | Change % | |------------|---------|---------------|---------------|-----------| | SET | 1 key | 1,331,578 | 1,331,626 | -0.00% | | SET | 3M keys | 1,254,441 | 1,152,645 | +8.83% | | GET | 1 key | 1,293,149 | 1,289,503 | +0.28% | | GET | 3M keys | 1,152,898 | 1,101,791 | +4.64% | #### Pipeline Performance (3M keys) | Operation | Pipeline Size | New (ops/sec) | Old (ops/sec) | Change % | |-----------|--------------|---------------|---------------|-----------| | SET | 10 | 548,964 | 538,498 | +1.94% | | SET | 20 | 606,148 | 594,872 | +1.89% | | SET | 30 | 631,122 | 616,606 | +2.35% | | GET | 10 | 628,482 | 624,166 | +0.69% | | GET | 20 | 687,371 | 681,659 | +0.84% | | GET | 30 | 725,855 | 721,102 | +0.66% | ### Observations: 1. Single-threaded operations show consistent improvements (1-1.4%) 2. Multi-threaded performance shows significant gains for large datasets: - SET with 3M keys: +8.83% improvement - GET with 3M keys: +4.64% improvement 3. Pipeline operations show consistent improvements: - SET operations: +1.89% to +2.35% - GET operations: +0.66% to +0.84% 4. No performance regressions observed in any test scenario Related issue:valkey-io#761 --------- Signed-off-by: Uri Yagelnik <uriy@amazon.com> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>

This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.

This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump. Signed-off-by: bodong.ybd <bodong.ybd@alibaba-inc.com>

After #1405, `client trackinginfo` will crash when tracking is off ``` /lib64/libpthread.so.0(+0xf630)[0x7fab74931630] ./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db] ./src/valkey-server *:6379(call+0x5ba)[0x5a791a] ./src/valkey-server *:6379(processCommand+0x968)[0x5a8938] ./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d] ./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9] ./src/valkey-server *:6379[0x46fa4d] ./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9] ./src/valkey-server *:6379(main+0x4e1)[0x455821] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555] ./src/valkey-server *:6379[0x4564f2] ``` The reason is that we did not init pubsub_data by default, we only init it when tracking on. Fixes #1683. Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>

…#1684) After valkey-io#1405, `client trackinginfo` will crash when tracking is off ``` /lib64/libpthread.so.0(+0xf630)[0x7fab74931630] ./src/valkey-server *:6379(clientTrackingInfoCommand+0x12b)[0x57f8db] ./src/valkey-server *:6379(call+0x5ba)[0x5a791a] ./src/valkey-server *:6379(processCommand+0x968)[0x5a8938] ./src/valkey-server *:6379(processInputBuffer+0x18d)[0x58381d] ./src/valkey-server *:6379(readQueryFromClient+0x59)[0x585ea9] ./src/valkey-server *:6379[0x46fa4d] ./src/valkey-server *:6379(aeMain+0x89)[0x5bf3e9] ./src/valkey-server *:6379(main+0x4e1)[0x455821] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7fab74576555] ./src/valkey-server *:6379[0x4564f2] ``` The reason is that we did not init pubsub_data by default, we only init it when tracking on. Fixes valkey-io#1683. Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Co-authored-by: ranshid <88133677+ranshid@users.noreply.github.com>

The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in valkey-io#1405. Signed-off-by: Binbin <binloveplay1314@qq.com>

The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in #1405. Signed-off-by: Binbin <binloveplay1314@qq.com>

…#3359) The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in valkey-io#1405. Signed-off-by: Binbin <binloveplay1314@qq.com>

The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in #1405. Signed-off-by: Binbin <binloveplay1314@qq.com>

…#3359) The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in valkey-io#1405. Signed-off-by: Binbin <binloveplay1314@qq.com>

The multiStateMemOverhead() function was incorrectly calculating the memory overhead for watched keys. It used sizeof(c->mstate->watched_keys) which is the size of the list structure itself, instead of sizeof(watchedKey) which is the actual per-key overhead. This was introduced in #1405. Signed-off-by: Binbin <binloveplay1314@qq.com>

uriyage mentioned this pull request Dec 9, 2024

Followup items from #758 Async IO threads #761

Open

10 tasks

zuiderkwast self-requested a review December 9, 2024 10:29

zuiderkwast reviewed Dec 12, 2024

View reviewed changes

Comment thread src/module.c Outdated

Comment thread src/module.c Outdated

Comment thread src/multi.c Outdated

Comment thread src/server.h Outdated

Comment thread src/multi.c

zuiderkwast reviewed Dec 15, 2024

View reviewed changes

Comment thread src/networking.c Outdated

Comment thread src/networking.c Outdated

Comment thread src/networking.c Outdated

Comment thread src/networking.c Outdated

uriyage force-pushed the client-optimize branch from b84c52d to b4755cb Compare December 16, 2024 08:20

zuiderkwast approved these changes Dec 17, 2024

View reviewed changes

zuiderkwast added the release-notes This issue should get a line item in the release notes label Dec 17, 2024

ranshid reviewed Jan 6, 2025

View reviewed changes

Comment thread src/replication.c

Comment thread src/networking.c

ranshid added the run-extra-tests Run extra tests on this PR (Runs all tests from daily except valgrind and RESP) label Jan 6, 2025

Uri Yagelnik and others added 6 commits January 6, 2025 08:23

client struct: lazy init components and optimize struct layout

f410876

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Addressed PR comments

1b5065e

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

Update src/networking.c

6b76216

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>

Update src/networking.c

6cecf30

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>

Update src/networking.c

3e980ca

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: uriyage <78144248+uriyage@users.noreply.github.com>

Update src/networking.c

d4ebb91

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

uriyage force-pushed the client-optimize branch from b4755cb to d4ebb91 Compare January 6, 2025 08:36

Fix merge conflict

765a486

Signed-off-by: Uri Yagelnik <uriy@amazon.com>

ranshid approved these changes Jan 8, 2025

View reviewed changes

ranshid merged commit 6c09eea into valkey-io:unstable Jan 8, 2025

zuiderkwast mentioned this pull request Jan 8, 2025

Optimize memory usage per client #698

Closed

yangbodong22011 mentioned this pull request Feb 7, 2025

Fix client trackinginfo crash when tracking off by default #1684

Merged

yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 7, 2025

Fix client trackinginfo coredump when tacking off by default

624c113

This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.

yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 7, 2025

Fix client trackinginfo coredump when tacking off by default

fd7e6fd

This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.

yangbodong22011 added a commit to yangbodong22011/valkey that referenced this pull request Feb 7, 2025

Fix client trackinginfo coredump when tacking off by default

4087b28

This pr will fix valkey-io#1683 After valkey-io#1405, cause this coredump.

zuiderkwast mentioned this pull request Apr 4, 2025

Fix engine crash on module client blocking during keyspace events #1819

Merged

enjoy-binbin mentioned this pull request Mar 13, 2026

Fix incorrect memory overhead calculation for watched keys #3359

Merged

enjoy-binbin added this to Valkey 8.1 Mar 13, 2026

github-project-automation Bot moved this to To be backported in Valkey 8.1 Mar 13, 2026

enjoy-binbin moved this from To be backported to Done in Valkey 8.1 Mar 13, 2026

YangboLong mentioned this pull request May 4, 2026

client struct optimization and lazy components allocation redis/redis#15161

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

client struct: lazy init components and optimize struct layout#1405

client struct: lazy init components and optimize struct layout#1405
ranshid merged 7 commits into
valkey-io:unstablefrom
uriyage:client-optimize

uriyage commented Dec 8, 2024 •

edited

Loading

Uh oh!

codecov Bot commented Dec 8, 2024 •

edited

Loading

Uh oh!

zuiderkwast commented Dec 9, 2024

Uh oh!

zuiderkwast left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranshid left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

uriyage commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactor client structure to use modular data components

Current State

This PR's Change

Key Benefits

Performance Impact

Single Thread Performance

8 IO Threads Performance

Pipeline Performance (3M keys)

Observations:

Uh oh!

codecov Bot commented Dec 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zuiderkwast commented Dec 9, 2024

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranshid left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

uriyage commented Dec 8, 2024 •

edited

Loading

codecov Bot commented Dec 8, 2024 •

edited

Loading