Skip to content

chore(ci): Graduate sanitizer CI jobs from Phase 0 to Phase 1 across ecosystem #394

Description

@kcenon

What

All 7 ecosystem projects currently run ASan, TSan, and UBSan sanitizer jobs in CI with continue-on-error: true (labeled "Phase 0"). This means sanitizer failures are collected but do not block merges. Multiple project READMEs claim "zero sanitizer issues" and "ThreadSanitizer clean" but these claims are not enforced by CI.

Current state across ecosystem:

Project ASan TSan UBSan Blocking?
common_system ctest || true ctest || true ctest || true No
thread_system continue-on-error: true continue-on-error: true continue-on-error: true No
logger_system continue-on-error: true continue-on-error: true continue-on-error: true No
container_system continue-on-error: true continue-on-error: true continue-on-error: true No
monitoring_system continue-on-error: true continue-on-error: true continue-on-error: true No
database_system continue-on-error: true continue-on-error: true continue-on-error: true No
network_system continue-on-error: true continue-on-error: true continue-on-error: true No

Why

  • False confidence: README claims of "zero issues" without CI enforcement create a misleading quality signal
  • Regression risk: New code can introduce data races, UB, or memory errors without any CI gate
  • Production safety: These are C++ libraries handling threading, networking, and database operations — sanitizer clean builds are essential
  • Ecosystem credibility: Users evaluating these libraries for production use will check CI enforcement

Where

  • Each project's .github/workflows/ci.yml or .github/workflows/sanitizers.yml
  • Specific lines vary per project but all use continue-on-error: true or || true

How

Phase 1 (this issue): Make sanitizers blocking for new regressions

  1. Remove continue-on-error: true from sanitizer jobs in all 7 projects
  2. Replace ctest ... || true with ctest ... (allow failures to propagate)
  3. Add suppression files for any known pre-existing issues that cannot be fixed immediately
  4. Document known suppressions in each project's docs/KNOWN_ISSUES.md

Phase 2 (follow-up): Eliminate all suppressions

  • Address each suppressed issue and remove suppressions one by one
  • Target: zero sanitizer suppressions across all projects

Recommended rollout order (by dependency tier):

  1. common_system (Tier 0 — foundation, fewest suppressions expected)
  2. thread_system, container_system (Tier 1)
  3. logger_system (Tier 2)
  4. monitoring_system, database_system (Tier 3)
  5. network_system (Tier 4 — most complex, likely most suppressions)

Acceptance Criteria

  • All sanitizer CI jobs across 7 projects fail the build on sanitizer errors
  • No continue-on-error: true or || true on sanitizer-related CI steps
  • Known pre-existing issues documented in suppression files with issue references
  • At least common_system and container_system run clean (zero suppressions)
  • README sanitizer claims match CI enforcement reality

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-cdCI/CD and build automationpriority:mediumMedium priority issuetestingTesting related issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions