Skip to content

feat: add WAL checksum gauge metric#891

Merged
mattisonchao merged 1 commit intomainfrom
feat/wal-checksum-metric
Feb 23, 2026
Merged

feat: add WAL checksum gauge metric#891
mattisonchao merged 1 commit intomainfrom
feat/wal-checksum-metric

Conversation

@mattisonchao
Copy link
Copy Markdown
Member

Motivation

Add the oxia_dataserver_wal_checksum metric specified in Discussion #849 to complement the existing oxia_dataserver_db_checksum for shard consistency validation. The WAL already computes chained CRC values at the codec level but never exposes them.

Modification

  • Thread the chained CRC through the WAL read path: ReadRecordWithValidation (codec) → Read (segment) → readAtIndex (WAL) → ReadNext (reader) → controllers
  • Change AppendAndSync callback to carry the entry CRC, capturing LastCrc() after append under the WAL lock
  • Add walChecksumGauge to both leader and follower controllers, recording the WAL CRC alongside the existing DB checksum gauge
  • Update integration test to verify oxia_dataserver_wal_checksum appears with correct attributes and non-zero values

🤖 Generated with Claude Code

Thread the chained CRC computed at the WAL codec level through the read
path (codec → segment → WAL → reader → controllers) and expose it as a
Prometheus gauge metric `oxia_dataserver_wal_checksum`, complementing
the existing `oxia_dataserver_db_checksum`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattisonchao
Copy link
Copy Markdown
Member Author

Review

LGTM. Reviewed the full diff against main (21 files, +154/-109).

What's good

  • Efficient CRC threading: The chained CRC flows through the existing read path (codec → segment → WAL → reader → controllers) with zero redundant reads. Callers that don't need the CRC simply discard it with _.
  • AppendAndSync callback design: Capturing LastCrc() after appendAsync0 while still holding t.Lock() is correct — no concurrent append can change the CRC before it's bound in the closure. The doSync/runSync batching is left untouched.
  • V1 codec: Returning 0 for CRC is the right behavior since V1 has no CRC support.
  • Integration test: parseChecksumMetrics is properly parameterized, and both oxia_dataserver_db_checksum and oxia_dataserver_wal_checksum are asserted for non-zero values that change after additional writes.

Notes

  • The WAL checksum gauge is only recorded when resp.Checksum != nil (i.e., when the DB checksum feature is enabled). This ties the WAL metric visibility to the DB checksum feature flag, which seems intentional since they serve the same consistency validation purpose.
  • All 21 files are consistent in naming (entryCrc) and interface signatures.

@mattisonchao mattisonchao merged commit 7a88a34 into main Feb 23, 2026
11 of 12 checks passed
@mattisonchao mattisonchao deleted the feat/wal-checksum-metric branch February 23, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant