refactor(general): Index compaction enhancement #3901

Lyndon-Li · 2024-06-04T07:42:40Z

Fix the problem that runTaskEpochMaintenanceQuick never gets called in runQuickMaintenance
Add a hard limit to epochAdvanceOnTotalSizeBytesThreshold for shouldAdvance. Epoch is advanced when the total index size in the epoch exceeds the hard limit. In this way, we keep indexes in an epoch in a reasonable number/size, so that the index compaction for that epoch won't take extremely high CPU and memory

julio-lopez · 2024-06-04T17:55:04Z

internal/epoch/epoch_advance.go

+	if totalSize >= totalSizeBytesThresholdHard {
+		return true
+	}
+


@Lyndon-Li Please remove this check from this PR (and the corresponding tests). At the moment, we are not going to change the conditions for compaction. Also, if it were to do it, then it should be separately in its own PR.

Suggested change

if totalSize >= totalSizeBytesThresholdHard {

return true

}

If we don't add this change, I am afraid the epoch maintenance in the quick maintenance doesn't make any difference --- since the minEpochDuration is 24 hour, even though quick maintenance launches the epoch maintenance every 1 hour, it actually does nothing since the epoch is not advanced.

Running epoch maintenance in quick maintenance would make a difference. Specifically regarding epoch advancement, it would advance it within an hour of the reaching the minimum epoch duration age. Otherwise, in the worst case scenario the epoch would be advance close to 24 hours after being reaching the minimum epoch age.

Yes, but that is not enough, or the epoch advancing is not timely enough once a burst of index generation happens. The consequence is that the next index compaction takes huge memory.

You can lower the minimum epoch duration for the repository.

May I know the concern here? are we expecting the epochs to always be regular in terms of time?

Is it safe?

@julio-lopez

Is it safe?

Sorry, I didn't see this comment by the first time.

Below is my thinking:

The index compaction won't delete the superseded indexes immediately. After they are compacted in a quick maintenance, they will be deleted in the following FULL maintenance ONLY when they are old enough (controlled by CleanupSafetyMargin, by default 4 hours).

Therefore, adding the hard limit ostensibly increase the risk of data lose, but risk is actually decided by CleanupSafetyMargin. If data safety is the concern, users should increase CleanupSafetyMargin.

Additionally, at present, we support a customized duration time as shorter as 10min. So users should have already been using this experience --- advance the epoch in a short time.

Data safety is always with top priority, but scalability and performance is also important. I am not sure if the changes in this PR is 100% correct, but it brings a possible solution of considering the two aspects together.
So appreciate your further comments.

@Lyndon-Li

You raise an important point about the safety of deleting compacted indexes. Indeed a safety margin is needed so clients that are in the process of reading an old index snapshot can read that snapshot.

The proposed change adds a new criterion for advancing the epoch (totalSize >= totalSizeBytesThresholdHard). The original protocol design uses 2 other criteria to reach consensus about:(a) when it is time to advance and epoch; and (b) when it safe to assume that the epoch will no longer be modified, so it can be compacted.

The new criterion appears to be violating one of the original assumptions. It may be that the proposed change is safe, but it is not yet clear, at least not to me.

OK, I got you points and understand your concern of data safety.

And it is also a fact that the current behavior of epoch advancing brings burst burden to repo maintenance (and other operations which need to read the compacted indexes), as index creations are never linearly streamed along with time, but influx with backups.
Let's see if this brings problems to other cases, then we can come back to discuss this.

repo/maintenance/maintenance_run.go

Changes: * test that quick maintenance runs when epoch manager is enabled * fix(general): run epoch maintenance for quick maintenance Change based on a known-to-be-safe portion of the changes proposed in #3901 * cleanup: pass epoch manager to `runTaskEpochMaintenanceQuick` The caller needs to get the epoch manager to determine whether or not the epoch manager is enabled. The caller now passes the epoch manager to `runTaskEpochMaintenanceQuick` * wrap the error inside runTaskEpochMaintenanceQuick

julio-lopez · 2024-10-23T05:55:24Z

Mostly superseded by #4185. Closing for now.

Credits to @Lyndon-Li for catching the issue and proposing the solution. Thanks again.

Changes: * test that quick maintenance runs when epoch manager is enabled * fix(general): run epoch maintenance for quick maintenance Change based on a known-to-be-safe portion of the changes proposed in kopia#3901 * cleanup: pass epoch manager to `runTaskEpochMaintenanceQuick` The caller needs to get the epoch manager to determine whether or not the epoch manager is enabled. The caller now passes the epoch manager to `runTaskEpochMaintenanceQuick` * wrap the error inside runTaskEpochMaintenanceQuick

run epoch maintenance for quick maintenance

5ebbc99

Lyndon-Li marked this pull request as ready for review June 4, 2024 07:45

Lyndon-Li changed the title ~~Index compaction enhancement~~ refactor(general): Index compaction enhancement Jun 4, 2024

Lyndon-Li force-pushed the index-compaction-enhancement branch from 2bc321e to 5b2c2c3 Compare June 4, 2024 07:58

advance epoch on hard size limit

b47ffba

Lyndon-Li force-pushed the index-compaction-enhancement branch from 5b2c2c3 to b47ffba Compare June 4, 2024 07:59

julio-lopez requested changes Jun 4, 2024

View reviewed changes

julio-lopez reviewed Jun 6, 2024

View reviewed changes

repo/maintenance/maintenance_run.go Show resolved Hide resolved

julio-lopez mentioned this pull request Oct 19, 2024

fix(general): run epoch maintenance in quick maintenance #4185

Merged

julio-lopez closed this Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(general): Index compaction enhancement #3901

refactor(general): Index compaction enhancement #3901

Uh oh!

Lyndon-Li commented Jun 4, 2024 •

edited

Loading

Uh oh!

julio-lopez Jun 4, 2024

Uh oh!

Lyndon-Li Jun 4, 2024

Uh oh!

julio-lopez Jun 5, 2024

Uh oh!

Lyndon-Li Jun 7, 2024

Uh oh!

julio-lopez Jun 7, 2024

Uh oh!

Lyndon-Li Jun 8, 2024

Uh oh!

julio-lopez Jun 8, 2024

Uh oh!

Lyndon-Li Oct 22, 2024 •

edited

Loading

Uh oh!

julio-lopez Oct 22, 2024

Uh oh!

Lyndon-Li Oct 23, 2024 •

edited

Loading

Uh oh!

Uh oh!

julio-lopez commented Oct 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

refactor(general): Index compaction enhancement #3901

refactor(general): Index compaction enhancement #3901

Uh oh!

Conversation

Lyndon-Li commented Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lyndon-Li Oct 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lyndon-Li Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

julio-lopez commented Oct 23, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Lyndon-Li commented Jun 4, 2024 •

edited

Loading

Lyndon-Li Oct 22, 2024 •

edited

Loading

Lyndon-Li Oct 23, 2024 •

edited

Loading