feat: add checksum gauge metric and move checksumInterval to storage level#890
Merged
mattisonchao merged 11 commits intomainfrom Feb 23, 2026
Merged
feat: add checksum gauge metric and move checksumInterval to storage level#890mattisonchao merged 11 commits intomainfrom
mattisonchao merged 11 commits intomainfrom
Conversation
84a2884 to
5d3d342
Compare
…level Move ChecksumInterval config from DatabaseOptions to StorageOptions as checksum recording is a storage-level concern. Add SyncGauge metric type and record oxia_dataserver_db_checksum on both leader and follower after applying RecordChecksumRequest entries through the WAL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5d3d342 to
c03b107
Compare
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Record the WAL commit offset as a dynamic attribute on the checksum gauge so replicas can be compared at the same offset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TestControlRequestRecordChecksum to verify the db_checksum gauge metric is correctly exposed via the Prometheus endpoint with expected labels (shard, namespace, commit_offset) and a non-zero value. Refactor mock.NewServer into NewServerWithOptions to allow tests to customize dataserver options (e.g. checksum interval). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add a second round of puts + checksum recording to the integration test to verify that the checksum gauge produces a different value with a higher commit offset after new writes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add t.Helper() call and rename unused parameter to _. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Change the default checksum scheduler interval from 1m to 5m. Allow disabling the scheduler by setting the interval to 0s or a negative value (e.g. -1s). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… request oneof Add nil check on lc.db in leader's IsFeatureEnabled to prevent panic if called after close. Use switch on proto oneof type instead of if/if chains in ControlProposal.Apply and ApplyLogEntry. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Member
Author
PR Review SummaryOverviewThis PR implements the checksum gauge metric end-to-end: from proto definition through WAL replication to Prometheus-visible metrics on both leader and follower paths, with a periodic scheduler and integration test coverage. Reviewed Areas
Bugs Found & Fixed During Review
No Outstanding Issues
🤖 Generated with Claude Code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The
RecordChecksumRequestWAL entry was a no-op marker that never actually recorded the DB checksum as a metric.Modification
SyncGaugemetric type (non-callback based) tocommon/metric/gauge.goRecordChecksumRequestto theControlRequestproto oneofoxia_dataserver_db_checksumgauge to both leader and follower controllersControlProposal.Apply()andApplyLogEntry()now read and return the DB checksum viaApplyResponse.Checksumwhen processing aRecordChecksumRequestproposal.Apply()in the generic propose pathApplyLogEntry()in the committed entries pathchecksum_scheduler.goto periodically trigger checksum recordingscheduler.checksum.intervalconfig option