Skip to content

Conversation

@Lyndon-Li
Copy link
Contributor

Maintenance is critical for healthy of the repository.
On the other hand, Maintenance is complex, because it runs multiple sub tasks each may generate different results according to the maintenance policy. The results may include deleting/combining/adding data/metadata to the repository.

It is worthy to add more observability for these tasks for below reasons:

  • It is helpful for troubleshooting. Any data change to the repository is critical, the observability info helps to understand what happened during the maintenance and why that happened
  • It is helpful for users to understand/predict the repo's behavior. The repo data may be stored in a public cloud for which costs are sensitive to scale/duration of data stored. On the other hand, repository has its own policy to manage the data, so the data is not deleted until it is safe enough according to the policy. The observability info helps users to understand how much data is in-use, how much data is out of use and when it is deleted

There will be a serial of PRs to add observability info for each sub task.
The current PR add the stats info for AdvanceEpoch sub task.

@Lyndon-Li Lyndon-Li marked this pull request as ready for review October 30, 2025 08:10
Copy link
Collaborator

@julio-lopez julio-lopez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see small inline comment, it can be done in a separate PR

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds observability statistics for the AdvanceEpoch maintenance subtask to improve troubleshooting and help users understand repository behavior. The enhancement provides detailed information about epoch advancement operations, including whether an epoch was actually advanced and the current epoch number.

Key changes:

  • Introduces AdvanceEpochStats struct to track epoch advancement metrics
  • Modifies MaybeAdvanceWriteEpoch to return statistics instead of just an error
  • Updates all test cases to handle the new return value structure

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
repo/maintenancestats/stats_advance_epoch.go New file defining the AdvanceEpochStats struct with JSON serialization and summary methods
repo/maintenancestats/builder_test.go Added test cases for serializing and deserializing AdvanceEpochStats
repo/maintenancestats/builder.go Updated BuildFromExtra to handle the new advanceEpochStatsKind
repo/maintenance/maintenance_run.go Modified runTaskEpochAdvance to capture and return statistics from MaybeAdvanceWriteEpoch
internal/epoch/epoch_manager_test.go Updated all test cases to handle new return signature and validate returned statistics
internal/epoch/epoch_manager.go Changed MaybeAdvanceWriteEpoch signature to return AdvanceEpochStats along with error

Comment on lines +27 to +32
var message string
if as.WasAdvanced {
message = fmt.Sprintf("Advanced epoch to %v", as.CurrentEpoch)
}

return message
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Summary method returns an empty string when WasAdvanced is false. Consider returning a meaningful message like 'No epoch advancement needed' for better user experience and consistency.

Suggested change
var message string
if as.WasAdvanced {
message = fmt.Sprintf("Advanced epoch to %v", as.CurrentEpoch)
}
return message
if as.WasAdvanced {
return fmt.Sprintf("Advanced epoch to %v", as.CurrentEpoch)
}
return "No epoch advancement needed"

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be addressed in a separate PR.
Let's let CI finish, we can merge once it passes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the maintenance which are with various sub tasks is already complex, I want to make the the output simple --- if a sub task doesn't make any change, just give no output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@julio-lopez What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either way is fine, your call.

Comment on lines +758 to 769
result := &maintenancestats.AdvanceEpochStats{
CurrentEpoch: cs.WriteEpoch,
}

if shouldAdvance(cs.UncompactedEpochSets[cs.WriteEpoch], p.MinEpochDuration, p.EpochAdvanceOnCountThreshold, p.EpochAdvanceOnTotalSizeBytesThreshold) {
return errors.Wrap(e.advanceEpochMarker(ctx, cs), "error advancing epoch")
if err := e.advanceEpochMarker(ctx, cs); err != nil {
return nil, errors.Wrap(err, "error advancing epoch")
}

result.CurrentEpoch++
result.WasAdvanced = true
}
Copy link

Copilot AI Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CurrentEpoch field is set to cs.WriteEpoch initially and then incremented when advanced. This creates a dependency on the order of operations. Consider setting CurrentEpoch directly to the final value to make the logic clearer and less error-prone.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me change it to:

result.CurrentEpoch = cs.WriteEpoch + 1

Since the CI has started, I can make this change in the next PR.

@codecov
Copy link

codecov bot commented Oct 31, 2025

Codecov Report

❌ Patch coverage is 73.07692% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.02%. Comparing base (cb455c6) to head (1702175).
⚠️ Report is 722 commits behind head on master.

Files with missing lines Patch % Lines
repo/maintenancestats/stats_advance_epoch.go 41.66% 6 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4937      +/-   ##
==========================================
+ Coverage   75.86%   78.02%   +2.15%     
==========================================
  Files         470      541      +71     
  Lines       37301    31288    -6013     
==========================================
- Hits        28299    24411    -3888     
+ Misses       7071     4836    -2235     
- Partials     1931     2041     +110     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@julio-lopez julio-lopez merged commit ed7ef85 into kopia:master Oct 31, 2025
22 of 23 checks passed
@julio-lopez
Copy link
Collaborator

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants