Skip to content

Fix string embedding decision with keys#2087

Merged
ranshid merged 1 commit into
valkey-io:unstablefrom
poiuj:fix-embedding
May 15, 2025
Merged

Fix string embedding decision with keys#2087
ranshid merged 1 commit into
valkey-io:unstablefrom
poiuj:fix-embedding

Conversation

@poiuj

@poiuj poiuj commented May 14, 2025

Copy link
Copy Markdown
Contributor

This commit fixes the size calculation for embedded strings with keys in createStringObjectWithKeyAndExpire(). The previous implementation used a fixed size of 3 bytes for the key's header, which was incorrect. The new implementation properly calculates the required size based on the key's length and SDS type.

Added a unit test (test_embedded_string_with_key) to verify that:

  1. Short values with long keys are properly encoded as embedded strings
  2. Values that would cause the total size to exceed 64 bytes are encoded as raw

@zuiderkwast zuiderkwast left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, you're right!

So what's the impact of this bug? We sometimes use embedded strings when the total allocation size needed is 65 or 66?

@poiuj

poiuj commented May 14, 2025

Copy link
Copy Markdown
Contributor Author

That's right. And when we need 65 or 66 bytes, we actually get 80 bytes as this is the next jemalloc bucket.

@ranshid ranshid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Vadym!

The change LGTM. added some minor addition which IMO makes the code more clear.

Comment thread src/object.c Outdated
size += sdsReqSize(key_len, sdsReqType(key_len)) + 1; /* 1 byte for prefixed sds hdr size */
}
size += (expire != -1) * sizeof(long long);
size += 4 + len; /* embstr header (3) + nullterm (1) */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although no problem here, I would love to see this part changed as well:
size += sdsReqSize(len, SDS_TYPE_8)

Suggested change
size += 4 + len; /* embstr header (3) + nullterm (1) */
size += sdsReqSize(len, SDS_TYPE_8); /* embstr header (3) + nullterm (1) */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and in that case, we don't need the comment on the right.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Let met update the PR.

This commit fixes the size calculation for embedded strings with keys in
createStringObjectWithKeyAndExpire(). The previous implementation used a
fixed size of 3 bytes for the key's header, which was incorrect. The new
implementation properly calculates the required size based on the key's
length and SDS type.

Added a unit test (test_embedded_string_with_key) to verify that:
1. Short values with long keys are properly encoded as embedded strings
2. Values that would cause the total size to exceed 64 bytes are encoded as raw

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
@ranshid

ranshid commented May 15, 2025

Copy link
Copy Markdown
Member

@zuiderkwast I think this should be backported right?

@zuiderkwast

zuiderkwast commented May 15, 2025

Copy link
Copy Markdown
Contributor

@zuiderkwast I think this should be backported right?

We can if you want, but I don't think it's very important. We save 8 bytes in very few specific cases. It's not a regression because 8.1 saves memory anyway compared too earlier versions.

Only when key >= 32, value < 7 and key + value < 48 (or 40 if there is an expire), we save 8 bytes here. In this case we get a 64 bytes allocation jemalloc bucket for the robj and 8 bytes for the value.

When the value is >= 7, we need a 16 bytes bucket for it, so the sum is 80 bytes again.

@codecov

codecov Bot commented May 15, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.22%. Comparing base (9e87473) to head (1f818f7).
Report is 17 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #2087      +/-   ##
============================================
+ Coverage     71.05%   71.22%   +0.17%     
============================================
  Files           122      122              
  Lines         66143    66024     -119     
============================================
+ Hits          46997    47025      +28     
+ Misses        19146    18999     -147     
Files with missing lines Coverage Δ
src/object.c 81.35% <100.00%> (-0.17%) ⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ranshid ranshid merged commit 11a2b6c into valkey-io:unstable May 15, 2025
Comment thread src/unit/test_object.c
TEST_ASSERT(embstr_obj->encoding == OBJ_ENCODING_EMBSTR);

robj *raw_obj = objectSetKeyAndExpire(longer_val_obj, key, -1);
TEST_ASSERT(raw_obj->encoding == OBJ_ENCODING_RAW);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pointer is 32 bits so the robj struct is smaller.

@zuiderkwast

zuiderkwast commented May 22, 2025

Copy link
Copy Markdown
Contributor

Bump. This PR broke the daily build. I'm willing to revert this PR if it isn't fixed. Sorry that was rude. If you're not able to fix it, someone else will.

@zuiderkwast

Copy link
Copy Markdown
Contributor

I have a fix here:

@poiuj

poiuj commented May 25, 2025

Copy link
Copy Markdown
Contributor Author

@zuiderkwast thanks for the fix!
Sorry, didn't get the notification.

shanwan1 pushed a commit to shanwan1/valkey that referenced this pull request Jun 13, 2025
This commit fixes the size calculation for embedded strings with keys in
createStringObjectWithKeyAndExpire(). The previous implementation used a
fixed size of 3 bytes for the key's header, which was incorrect. The new
implementation properly calculates the required size based on the key's
length and SDS type.

Added a unit test (test_embedded_string_with_key) to verify that:
1. Short values with long keys are properly encoded as embedded strings
2. Values that would cause the total size to exceed 64 bytes are encoded
as raw

Signed-off-by: Vadym Khoptynets <vadymkh@amazon.com>
Signed-off-by: shanwan1 <shanwan1@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants