Skip to content

Conversation

@cbi42
Copy link
Contributor

@cbi42 cbi42 commented Jan 16, 2025

Summary: CompactRange() currently picks input files until compaction just exceeds max_compaction_bytes. This can cause an overly large compaction in some cases. For example, consider the following example. If the size of L6 files are large, picking an additional L5 file F3 (after picking F2) can cause the compaction to be too big.

L5 F1[1,2] F2[3,4]                                        F3[1000, 1001]
L6                  [5,8][9,12]...[998,999]

This PR updates the file picking logic to try to keep compaction size under max_compaction_bytes.

Test plan: a new unit test to test the above example

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@cbi42 cbi42 requested review from archang19 and jaykorean January 16, 2025 22:02
Copy link
Contributor

@archang19 archang19 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my concern about not including overlap from levels between the input and output levels is not valid, then this PR looks good to me. If it is valid, then we can update this PR and add another test for that scenario

@cbi42 cbi42 force-pushed the compact-range-max-compaction-bytes branch from cf2dd97 to 8d3aa39 Compare January 17, 2025 00:58
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@jaykorean jaykorean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a couple of nit comments. I think we'd want a release note this for this behavior change. Thanks!

output_level = vstorage->base_level();
assert(output_level > 0);
}
for (int i = input_level + 1; i < output_level; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this loop won't do anything in prod build. wrap it under #ifndef NDEBUG? Or is the compiler smart enough not to include this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it should be optimized away.

ASSERT_EQ(5U, compaction->input(1, 0)->fd.GetNumber());
}

TEST_F(CompactionPickerTest, CompactRangeMaxCompactionBytes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a follow up: It'd be nice to include couple of corner cases in the test where

  1. First input file where input file size > max_compaction_bytes
  2. First input file where input file size < max_compaction_bytes but including the output from the first input > max_compaction_bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both these two cases, compaction will have to be larger than max_compaction_bytes. The other unit test in db_compaction_test.cc should cover the case where max_compaction_bytes is smaller than an input file.

smallest = &inputs[i]->smallest;
}
for (size_t i = 1; i < inputs.size(); ++i) {
// Consider whether to include inputs[i]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "Check whether to include inputs[i] by checking ..." (just to be more clear)

@cbi42 cbi42 force-pushed the compact-range-max-compaction-bytes branch from ecc0670 to b2fde71 Compare January 17, 2025 18:02
@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@cbi42
Copy link
Contributor Author

cbi42 commented Jan 17, 2025

LGTM. Just a couple of nit comments. I think we'd want a release note this for this behavior change. Thanks!

Added a release note.

@cbi42
Copy link
Contributor Author

cbi42 commented Jan 17, 2025

Thanks @archang19 @jaykorean for the quick review.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cbi42 merged this pull request in 0013aca.

ybtsdst pushed a commit to ybtsdst/rocksdb that referenced this pull request Apr 27, 2025
…cebook#13306)

Summary:
CompactRange() currently [picks](https://github.com/facebook/rocksdb/blob/6e97a813dc8c4b574fa0743df3099b09e87af7e0/db/compaction/compaction_picker.cc?fbclid=IwZXh0bgNhZW0CMTEAAR08agyVwgQW-Tq5XApF52y9gw5UzmfIn3cEG44yvClFRIsTDH7zykfcb9Q_aem_23mjVBO1jSxQZ4_M4UyntA#L747-L753) input files until compaction just exceeds `max_compaction_bytes`. This can cause an overly large compaction in some cases. For example, consider the following example. If the size of L6 files are large, picking an additional L5 file F3 (after picking F2) can cause the compaction to be too big.
```
L5 F1[1,2] F2[3,4]                                        F3[1000, 1001]
L6                  [5,8][9,12]...[998,999]
```
This PR updates the file picking logic to try to keep compaction size under `max_compaction_bytes`.

Pull Request resolved: facebook#13306

Test Plan: a new unit test to test the above example

Reviewed By: jaykorean, archang19

Differential Revision: D68290846

Pulled By: cbi42

fbshipit-source-id: ffb4647002b47e5a92dd0a06afd4b4a4fbf94b7a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants