Skip to content

implement the base library for compacting logs#17632

Merged
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
YuJuncen:compact-log-backup-core
Nov 18, 2024
Merged

implement the base library for compacting logs#17632
ti-chi-bot[bot] merged 7 commits intotikv:masterfrom
YuJuncen:compact-log-backup-core

Conversation

@YuJuncen
Copy link
Contributor

@YuJuncen YuJuncen commented Oct 11, 2024

What is changed and how it works?

Issue Number: Close #17631

What's Changed:

Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs.

The directory hierarchy:

./components/compact-log-backup/
├── Cargo.toml
└── src
    ├── compaction ← Things about compacting logs, including metadata and the logic of executing an compaction.
    │   ├── collector.rs ← Give some files, what compaction we will make?
    │   ├── exec.rs ← How we read and sort log entries and make them SSTs?
    │   ├── meta.rs ← How to describe a compaction?
    │   └── mod.rs
    ├── errors.rs ← Error definitions.
    ├── exec_hooks ← Common hooks.
    │   ├── checkpoint.rs ← Skip compactions that already done.
    │   ├── consistency.rs ← Lock the storage to keep it consistent.
    │   ├── mod.rs
    │   ├── observability.rs ← Expose metrics and print logs.
    │   └── save_meta.rs ← Save the compaction metadata to the external storage.
    ├── execute
    │   ├── hooking.rs ← What hooks we have?
    │   ├── mod.rs ← The controller of a compaction over time.
    │   └── test.rs
    ├── lib.rs
    ├── source.rs ← Data source, for now only logs are supported.
    ├── statistic.rs 
    ├── storage.rs ← Things helps to read the backup storage.
    ├── test_util.rs
    └── util.rs

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Release note

None

Signed-off-by: hillium <yujuncen@pingcap.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Oct 11, 2024
Signed-off-by: hillium <yujuncen@pingcap.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
Signed-off-by: hillium <yujuncen@pingcap.com>
};

#[derive(Clone)]
struct CompactionSpy(Sender<SubcompactionResult>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move the CompactionSpy into mod test because it is only used in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the file itself is in the mod test. Do you meaning the test_utli mod?

self.items.drain().map(|(key, c)| {
// Hacking: update the statistic when we really yield the compaction.
// (At `poll_next`.)
c.form(&key, &self.cfg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the self.stat here, too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stat will be updated in the poll_next, so the stat will only be updated when the user read the compaction from the stream.


fn before_a_subcompaction_start(&mut self, _cid: CId, cx: SubcompactionStartCtx<'_>) {
let hash = cx.subc.crc64();
if self.loaded.contains(&hash) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if two sub-compactions have the same crc64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the second one cannot be executed. Thankfully its input will probably be saved as the final subcompaction won't be written.

let key = *o.key();
let u = o.get_mut();
u.add_file(file);
if u.size > self.cfg.subcompaction_size_threshold {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be a lot of self.items to make it OOM if each entry of self.items are small.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps. We may add something like memory quota in the future. But for now in fact we cannot do better.

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Oct 21, 2024
Signed-off-by: hillium <yujuncen@pingcap.com>
Copy link
Contributor

@3pointer 3pointer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM


/// Finishing one tiny task. This will yield the current carrier thread
/// when needed.
pub fn step(&mut self) -> Step {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use a wrapper of tokio::task::yield_now().await; here?

@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 18, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 18, 2024

[LGTM Timeline notifier]

Timeline:

  • 2024-10-21 05:43:37.192398691 +0000 UTC m=+243417.889189295: ☑️ agreed by Leavrth.
  • 2024-11-18 03:15:37.603555487 +0000 UTC m=+844499.794424483: ☑️ agreed by 3pointer.

@YuJuncen
Copy link
Contributor Author

/hold

@ti-chi-bot ti-chi-bot bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2024
Signed-off-by: hillium <yujuncen@pingcap.com>
@YuJuncen
Copy link
Contributor Author

/unhold

@ti-chi-bot ti-chi-bot bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 18, 2024
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 18, 2024

@YuJuncen: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Nov 18, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3pointer, iosmanthus, Leavrth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Nov 18, 2024
@ti-chi-bot ti-chi-bot bot merged commit 9eb3eda into tikv:master Nov 18, 2024
@ti-chi-bot ti-chi-bot bot added this to the Pool milestone Nov 18, 2024
YuJuncen added a commit to YuJuncen/tikv that referenced this pull request Dec 4, 2025
close tikv#17631

Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs.

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
YuJuncen added a commit that referenced this pull request Dec 4, 2025
* br: batch download and merge download sst before ingest (#19062)

close #19086

Add a new PRC method called batch-download to download batch SST.

Signed-off-by: RidRisR <79858083+RidRisR@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* fix build

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* make format

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* implement the base library for compacting logs (#17632)

close #17631

Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs.

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* added `compact-log-bakcup` to `tikv-ctl` (#17845)

close #17844

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: record `min_input_ts` and `max_input_ts` in Compaction (#18085)

close #18084

`min_input_ts` and `max_input_ts` will present in a log files compaction.

Signed-off-by: hillium <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: fix typo (#18090)

ref #15990

Fixed a typo: `Migartion` -> `Migration`.

Signed-off-by: hillium <yu745514916@live.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: filter out meta files by migration (#18123)

close #18122

Now, `StreamMetaStorage` is able to filter out files by meta edits.

Signed-off-by: hillium <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: added minimal compactions size (#18235)

close #18234

Added `--minimal-compact-size` to `compact-log-backup`.

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* log backup: fix several issues during compact log backup.  (#18298)

close #18308

log backup compact: fix several issues during compact a log backup

Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: correct version assignment in subcompaction metadata (#18389)

close #18390

Fixed a bug that caused the time range of compaction generated SSTs are too huge.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: add new field to track fully compacted data KV files and fix metafile filtering (#18837)

close #18843

compact_log_backup: add new field to track fully compacted data KV files and fix metafile filtering

Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: use max ts among all storage checkpoint ts (#18848)

close #18847

Now, `consistency` hook checks the storage checkpoint by the max value among them.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: fix compact meta edit filter (#18842)

close #18843

Merge the same meta edit from different migrations instead of replacing.

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: offload reading meta to diff cpus (#18885)

close #18884

This PR spawns read s3 file tasks to remote threads.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: read meta from checkpoint (#19068)

close #19069

This PR makes `compact-log-backup` fills the migration with subcompactions skipped by checkpoint.

Signed-off-by: hillium <yu745514916@live.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* fix build

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

---------

Signed-off-by: RidRisR <79858083+RidRisR@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: hillium <yu745514916@live.com>
Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
Co-authored-by: ris <79858083+RidRisR@users.noreply.github.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: 3pointer <luancheng@pingcap.com>
Co-authored-by: Jianjun Liao <36503113+Leavrth@users.noreply.github.com>
YuJuncen added a commit to YuJuncen/tikv that referenced this pull request Dec 5, 2025
* br: batch download and merge download sst before ingest (tikv#19062)

close tikv#19086

Add a new PRC method called batch-download to download batch SST.

Signed-off-by: RidRisR <79858083+RidRisR@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* fix build

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* make format

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* implement the base library for compacting logs (tikv#17632)

close tikv#17631

Added a new crate named `compact-log-backup`. Now it can merge some log files generated by log backup and make them become SSTs.

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* added `compact-log-bakcup` to `tikv-ctl` (tikv#17845)

close tikv#17844

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: record `min_input_ts` and `max_input_ts` in Compaction (tikv#18085)

close tikv#18084

`min_input_ts` and `max_input_ts` will present in a log files compaction.

Signed-off-by: hillium <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: fix typo (tikv#18090)

ref tikv#15990

Fixed a typo: `Migartion` -> `Migration`.

Signed-off-by: hillium <yu745514916@live.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: filter out meta files by migration (tikv#18123)

close tikv#18122

Now, `StreamMetaStorage` is able to filter out files by meta edits.

Signed-off-by: hillium <yu745514916@live.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: added minimal compactions size (tikv#18235)

close tikv#18234

Added `--minimal-compact-size` to `compact-log-backup`.

Signed-off-by: hillium <yujuncen@pingcap.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* log backup: fix several issues during compact log backup.  (tikv#18298)

close tikv#18308

log backup compact: fix several issues during compact a log backup

Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: correct version assignment in subcompaction metadata (tikv#18389)

close tikv#18390

Fixed a bug that caused the time range of compaction generated SSTs are too huge.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: add new field to track fully compacted data KV files and fix metafile filtering (tikv#18837)

close tikv#18843

compact_log_backup: add new field to track fully compacted data KV files and fix metafile filtering

Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: use max ts among all storage checkpoint ts (tikv#18848)

close tikv#18847

Now, `consistency` hook checks the storage checkpoint by the max value among them.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: fix compact meta edit filter (tikv#18842)

close tikv#18843

Merge the same meta edit from different migrations instead of replacing.

Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: offload reading meta to diff cpus (tikv#18885)

close tikv#18884

This PR spawns read s3 file tasks to remote threads.

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* compact_log_backup: read meta from checkpoint (tikv#19068)

close tikv#19069

This PR makes `compact-log-backup` fills the migration with subcompactions skipped by checkpoint.

Signed-off-by: hillium <yu745514916@live.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

* fix build

Signed-off-by: Juncen Yu <yujuncen@pingcap.com>

---------

Signed-off-by: RidRisR <79858083+RidRisR@users.noreply.github.com>
Signed-off-by: Juncen Yu <yujuncen@pingcap.com>
Signed-off-by: hillium <yu745514916@live.com>
Signed-off-by: 3pointer <luancheng@pingcap.com>
Signed-off-by: Jianjun Liao <jianjun.liao@outlook.com>
Signed-off-by: 山岚 <36239017+YuJuncen@users.noreply.github.com>
Co-authored-by: ris <79858083+RidRisR@users.noreply.github.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: 3pointer <luancheng@pingcap.com>
Co-authored-by: Jianjun Liao <36503113+Leavrth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

compact_log_backup: implement the basic library for compacting

4 participants