Skip to content

Adding support for test failure detection in valkey#24

Merged
sarthakaggarwal97 merged 36 commits into
valkey-io:mainfrom
BChan-0:main
Jun 23, 2026
Merged

Adding support for test failure detection in valkey#24
sarthakaggarwal97 merged 36 commits into
valkey-io:mainfrom
BChan-0:main

Conversation

@BChan-0

@BChan-0 BChan-0 commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Test Failure Detector (Original: PR 3358)

Monitors the Daily CI workflow on valkey-io/valkey, detects test failures, and automatically creates or updates GitHub issues to track them.

Primary Changes from PR 3358

  • Find and read workflows/artifacts cross-repo (GitHub App token), cannot run immediately after Valkey workflow yet
  • Python modules instead of inline JS: download.py, parse_failures.py, manage_issues.py, main.py
  • Typed data models instead of untyped JS object literals: UniqueFailure, JobReference
  • Test suite / unit tests for the detector: test_download.py, test_failure_parser.py, test_issue_manager.py (for testing)
  • Manual input for non-recent workflows (+repo, branch, dry run) (for testing)
  • Job summary (for testing)

How it works

  1. Find the run — locates the most recent completed (non-cancelled) Daily workflow run on the unstable branch, or uses a manually input run ID
  2. Download artifact — fetches the all-test-failures artifact from the CI workflow. Uses an HTTP handler to strip the Authorization header on the redirect to Azure blob storage
  3. Get job URLs — fetches job metadata from the run to build CI links for each failure, with normalized name variants for fuzzy matching against artifact names
  4. Parse and deduplicate — iterates the nested JSON ({job → suite → [failures]}) and groups by {test_name, test_file} such that a test failing across multiple jobs becomes one unique failure with multiple job references
  5. Create or update issues — for each unique failure:
    • If an open issue with matching title ([TEST-FAILURE] {test_name} in {test_file}) already exists: updates the environments list and adds a recurrence comment with the date
    • Otherwise: creates a new issue with the test-failure label, error stack trace, CI links, and environment list

A GitHub Actions job summary is emitted at every exit path with a table of metrics (failures detected, issues created/updated).

Prerequisites: Cross-repo Authentication

The workflow generates a GitHub App installation token scoped to the valkey-io org using the same App secrets as the backport workflow (VALKEYRIE_BOT_APP_ID + VALKEYRIE_BOT_PRIVATE_KEY). This token provides actions:read (to download artifacts) and issues:write (to create/update issues) on valkey-io/valkey.

Usage

Scheduled (automatic)

Runs daily at 23:00 UTC via cron. The workflow runs on valkey-io/valkey-ci-agent and uses a GitHub App token to read artifacts from and write issues to valkey-io/valkey. Valkey Daily CI runs daily at 00:00 UTC, with runs typically completing within 4-7 hours, with slight exception (from valkey-io/valkey's history of completed runs, <10 runs exceed 7 hours, with the longest lasting 10h 02m). As valkey's test suite grows, the run time for daily will increase, so attempting to capture runs at an "earliest" time would require frequent maintenance. In the other direction, the The Failure Detector's runtime will remain nearly constant (from valkey-io and forked history of completed runs, has never exceeded 30s of runtime and runner availability is less severe on valkey-ci-agent), so it is safer for cron to be closer to the start of the Daily CI workflow as opposed to the end. As such, the Test Failure Detector should always capture the current day's workflow. In any case of a skip, manual dispatch is available. Observation of runner availability will continue post-merge for confirmation of this arrangement in practice.

Manual dispatch

gh workflow run test-failure-detector-sweep.yml \
  --repo valkey-io/valkey-ci-agent \
  --field repo=valkey-io/valkey \
  --field run_id=12345678 \
  --field dry_run=true
  • repo — target repository to scan (default: valkey-io/valkey)
  • run_id — specific workflow run ID to analyze (empty = latest Daily run)
  • dry_run — parse and report only, don't create/update issues

Relevant Tests (all passing) (examples run from terminal instead of dispatch)

  • tests/test_testfailuredetector_download.py

  • tests/test_testfailuredetector_failure_parser.py

  • tests/test_testfailuredetector_issue_manager.py

  • tests/test_issue_dedup.py

  • See commenting with environment update function example: Issue Link

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces a complete Test Failure Detector workflow for Valkey CI that monitors the Daily CI pipeline, downloads test-failure artifacts, deduplicates failures by test identity, and manages labeled GitHub issues with environment tracking and recurrence comments across runs.

Changes

Test Failure Detector

Layer / File(s) Summary
Data model and failure parsing
scripts/test_failure_detector/parse_failures.py, tests/test_testfailuredetector_failure_parser.py
Defines JobReference and UniqueFailure dataclasses; parse_and_deduplicate() validates nested JSON, skips malformed entries, deduplicates by {test_name, test_file}, and attaches job URLs from a mapping. Tests verify deduplication across jobs/suites, URL attachment, field validation, and display_name formatting.
Artifact discovery and download
scripts/test_failure_detector/download.py, tests/test_artifact_fetcher.py, tests/test_testfailuredetector_download.py
get_latest_daily_run() retrieves the first completed non-cancelled run; download_all_test_failures() uses ArtifactClient to fetch and extract all-test-failures.json; get_job_urls() maps job names to HTML URLs and adds normalized variants. Tests cover run filtering, artifact extraction, missing-artifact cases, and job name normalization.
Issue rendering and environment management
scripts/test_failure_detector/issue_renderer.py, tests/test_testfailuredetector_issue_manager.py
fingerprint_for() generates deterministic issue keys; render_for() produces IssueContent with title/body/comment/labels; merge_environments() returns a body_transform that extracts existing environments and appends new ones. Tests verify title/body/comment construction, fingerprinting, and environment extraction/updates.
Issue dedup publisher body transform
scripts/common/issue_dedup.py, tests/test_issue_dedup.py
Extends IssueDedupPublisher.upsert() with optional body_transform callback that runs only on the update path before marker processing. Tests validate transform application on update and non-application on create.
Issue management orchestration
scripts/test_failure_detector/manage_issues.py, tests/test_testfailuredetector_issue_manager.py
Simplifies to a thin layer that calls IssueDedupPublisher.upsert() per UniqueFailure with render/fingerprint/body_transform from issue_renderer; accepts optional run_id for idempotency; returns {created, updated, skipped} summary. Tests verify publisher interaction and idempotency-key handling.
Pipeline orchestration and CLI
scripts/test_failure_detector/main.py
Adds run() entrypoint and main() CLI that resolve workflow run_id (latest or explicit), download/parse failures, support dry-run mode (log-only), call process_failures, emit GitHub Actions job-summary table, and expose argparse for repo/run-id/workflow-name/branch/dry-run/verbose.
Workflow definition and documentation
.github/workflows/test-failure-detector-sweep.yml, README.md, docs/architecture.md, scripts/test_failure_detector/__init__.py
Adds daily/manual scheduled workflow that generates GitHub App token, sets up Python, and invokes detector CLI; updates README with active detector listing and detailed flow docs; updates architecture docs with module diagram and detector flow; adds package docstring.

Sequence Diagram(s)

sequenceDiagram
  participant Workflow as GitHub Actions
  participant Main as main.py
  participant Download as download.py
  participant Parser as parse_failures.py
  participant Render as issue_renderer.py
  participant IssueDedup as IssueDedupPublisher
  
  Workflow->>Main: run(token, repo, run_id, dry_run)
  Main->>Download: get_latest_daily_run()
  Download-->>Main: WorkflowRun
  Main->>Download: download_all_test_failures()
  Download-->>Main: all_test_failures bytes
  Main->>Download: get_job_urls()
  Download-->>Main: job_urls map
  Main->>Parser: parse_and_deduplicate()
  Parser-->>Main: list[UniqueFailure]
  alt dry_run mode
    Main-->>Workflow: job_summary (failures, 0 issues)
  else normal mode
    Main->>Render: render_for(), merge_environments()
    loop per UniqueFailure
      Main->>IssueDedup: upsert(fingerprint, render, body_transform)
      IssueDedup-->>Main: created|updated|skipped
    end
    Main-->>Workflow: job_summary (failures, created, updated)
  end
Loading

Possibly related PRs

  • valkey-io/valkey-ci-agent#2: Introduces IssueDedupPublisher and marker-based deduplication; this PR extends upsert() with body_transform hook and integrates it into test-failure issue updates.

Suggested reviewers

  • madolson
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.51% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The PR description is comprehensively related to the changeset, providing detailed context about the Test Failure Detector implementation, architecture, prerequisites, usage, and testing.
Title check ✅ Passed The title clearly summarizes the main addition to the codebase—introducing test failure detection for Valkey CI—and is directly relevant to all the changes in the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (6)
.github/workflows/test-failure-detector-sweep.yml (5)

39-39: 💤 Low value

Consider pinning checkout and setup-python actions to commit SHAs.

Static analysis flags the use of tag-based action references (@v6) instead of commit SHAs. While you've already pinned create-github-app-token to a SHA (line 55), the checkout and setup-python actions remain unpinned.

Pinning to commit SHAs provides supply-chain attack protection but requires more maintenance. This is a repository-wide policy decision.

Also applies to: 45-45

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-failure-detector-sweep.yml at line 39, Replace
tag-based GitHub Action references with pinned commit SHAs for supply-chain
safety: update the uses entries that reference "actions/checkout@v6" and
"actions/setup-python@v4" to their corresponding full commit SHA values
(matching the versions you intend to use), ensuring the workflow still functions
and updating any dependent inputs if necessary; verify the pinned SHAs are from
upstream actions/checkout and actions/setup-python repositories and run the
workflow to confirm no regressions.

24-36: ⚡ Quick win

Clarify permissions intent or remove unused grant.

The workflow-level issues: write (line 26) is overridden by the job-level permissions block (lines 35-36), which grants only contents: read. Since the detector uses the GitHub App token (TARGET_TOKEN) for issue operations rather than the default GITHUB_TOKEN, the workflow-level issues: write grant is unused.

Either remove issues: write from lines 24-26, or add a comment explaining that the App token (not GITHUB_TOKEN) handles issue operations.

📝 Option 1: Remove unused permission
 permissions: 
   contents: read
-  issues: write
📝 Option 2: Document the intent
 permissions: 
   contents: read
-  issues: write
+  issues: write  # Unused by workflow; detector uses GitHub App token for cross-repo issue ops
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-failure-detector-sweep.yml around lines 24 - 36, The
workflow defines a workflow-level permissions block granting "issues: write" but
the job "detect-failures" overrides permissions to only "contents: read" and the
detector actually uses the GitHub App token (TARGET_TOKEN) for issue operations,
so remove the unused workflow-level "issues: write" entry or add a clear comment
above the workflow-level permissions explaining that issue writes are handled by
the App token (TARGET_TOKEN) and not by GITHUB_TOKEN; update the "permissions"
block or add the comment near the top-level permissions and keep the job-level
permissions in the "detect-failures" job as-is to avoid confusion.

66-66: 💤 Low value

Consider simplifying the DRY_RUN expression for clarity.

The current expression ${{ github.event_name == 'workflow_dispatch' && inputs.dry_run == true && 'true' || 'false' }} is correct but hard to parse. The logic ensures dry-run mode is only honored for manual dispatch (scheduled runs always create issues).

Consider splitting the logic or adding a comment to document the intent.

♻️ Alternative: add explanatory comment
+          # dry_run is only honored for manual dispatch; scheduled runs always create issues
           DRY_RUN: ${{ github.event_name == 'workflow_dispatch' && inputs.dry_run == true && 'true' || 'false' }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-failure-detector-sweep.yml at line 66, The DRY_RUN
expression is hard to read; replace the current expression with a ternary for
clarity and add a short comment: set DRY_RUN to ${{ github.event_name ==
'workflow_dispatch' && inputs.dry_run == 'true' ? 'true' : 'false' }} (and add a
one-line comment above explaining "only honor inputs.dry_run for manual
workflow_dispatch runs; scheduled runs always create issues"). Reference the
DRY_RUN env variable and the existing github.event_name and inputs.dry_run
symbols when making the change.

13-16: ⚡ Quick win

Use type: number for the run_id input.

The run_id input represents a workflow run ID, which is numeric. GitHub Actions supports type: number for workflow_dispatch inputs, which better documents the expected value type than type: string.

♻️ Proposed fix
       run_id:
         description: "Specific workflow run ID to analyze (empty = latest completed Daily run)"
         required: false
-        type: string
-        default: ""
+        type: number

Note: With type: number, omitting the input will pass null instead of "", so remove the empty string check on line 76.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-failure-detector-sweep.yml around lines 13 - 16,
Change the workflow_dispatch input "run_id" to use type: number instead of type:
string by updating the run_id input definition (replace type: string with type:
number); then remove or adjust the code that checks for an empty string value
for run_id (the check that compares run_id == ""), since with type: number an
omitted input will be null rather than ""; ensure any subsequent logic expects a
number or null and handle null appropriately.

29-34: 💤 Low value

Consider adding a job timeout for defense in depth.

The job currently has no timeout-minutes set, relying on the GitHub Actions default (360 minutes). While this workflow is expected to complete in minutes (artifact download + parsing + API calls), adding a modest timeout (e.g., 30-60 minutes) would catch hangs or API degradation scenarios.

🛡️ Proposed addition
   detect-failures:
     name: "Detect test failures"
+    timeout-minutes: 60
     if: >
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-failure-detector-sweep.yml around lines 29 - 34, Add
a defensive job timeout to the detect-failures job by setting the
timeout-minutes property (e.g., 30 or 60) under the detect-failures job
definition so the workflow won't hang indefinitely; update the job block that
defines name: "Detect test failures" / runs-on: ubuntu-latest to include
timeout-minutes: 30 (or 60) directly beneath runs-on.
tests/test_issue_manager.py (1)

109-115: ⚡ Quick win

Add a regression case for bodies without an Environments line.

Current coverage only proves replacement works when the marker is already present. Please add a case where an existing issue body has no **Environments:** section, then verify process_failures() appends the new environments instead of silently leaving the body unchanged.

Also applies to: 121-220

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_issue_manager.py` around lines 109 - 115, Add a regression test
for when the issue body lacks an existing "**Environments:**" line: create a new
test (e.g., in class TestUpdateEnvironments) that calls
_update_environments_in_body with a body like "Some text\nMore text" and
environments ["job-a", "job-b"] and assert the returned body now contains a new
"**Environments:** `job-a`, `job-b`" line while preserving the original text;
also add a parallel case calling process_failures() (or the higher-level wrapper
used in tests) to ensure it appends the environments instead of leaving the body
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/test_failure_detector/download.py`:
- Around line 135-143: The ZIP extraction currently picks the first .json entry
and lets ZIP errors propagate; change it to explicitly look for
"all-test-failures.json" (instead of using json_name = next(...)) and only read
that member via zf.read("all-test-failures.json"); catch and handle invalid
archives and missing members (zipfile.BadZipFile, KeyError, IOError) by logging
a warning like "Artifact missing or corrupt: all-test-failures.json" and
returning None so malformed/corrupt cross-repo inputs are treated as misses;
update the block around zipfile.ZipFile, names, json_name and zf.read
accordingly.

In `@scripts/test_failure_detector/main.py`:
- Around line 72-77: When get_latest_daily_run(gh, repo_full_name,
workflow_name, branch) returns None the function currently logs and returns 1
without emitting the CI summary; before returning add a call to
emit_job_summary(...) to record the failure path (include relevant context like
workflow_name, repo_full_name, branch and the error message), e.g. invoke
emit_job_summary(...) right after logger.error("No qualifying workflow run
found.") so the summary is always produced on this early exit; ensure the
function signature and any required parameters for emit_job_summary are provided
from the surrounding scope.

In `@scripts/test_failure_detector/manage_issues.py`:
- Around line 25-42: The ensure_label_exists() bootstrap races when two
processes both see the label missing and one create call fails; wrap the
repo.create_label call (the retry_github_call invocation that uses
repo.create_label and _LABEL_NAME) in a try/except that catches GithubException
from the create and ignores it when it indicates the label already exists (e.g.,
status codes 409/422 or an "already exists" message), logging that another
process created the label, and re-raise the exception for any other status; keep
using retry_github_call for the actual create and only handle the
conflict/duplicate-create error around that call.
- Around line 105-108: The current _update_environments_in_body(body, all_envs)
replaces an existing "**Environments:**" line but does nothing if that marker is
missing; change it to either replace the marker when present or append a new
"**Environments:** ..." line when absent so new environments are persisted for
migrated/manual issues. Update the function _update_environments_in_body to
check for the marker (using the same regex or search) and if found use re.sub as
now, otherwise append the constructed new_env_line (with proper spacing/newline)
in a sensible location (e.g., end of body or before a known section header) so
callers that invoke it (the code that updates matched issues) will always
persist the updated all_envs list.

In `@scripts/test_failure_detector/parse_failures.py`:
- Around line 46-48: Check that the top-level artifact all_failures is a dict
before iterating (the loop using "for job_name, suites in
all_failures.items()"); if it is not a dict (e.g., list/string/null), log an
informative message and return [] to avoid crashing the sweep; update the code
around where grouped: dict[str, UniqueFailure] = {} and the for loop to perform
an isinstance(all_failures, dict) guard, emit a clear error/log including the
unexpected type/value, and return [] for invalid payloads.

---

Nitpick comments:
In @.github/workflows/test-failure-detector-sweep.yml:
- Line 39: Replace tag-based GitHub Action references with pinned commit SHAs
for supply-chain safety: update the uses entries that reference
"actions/checkout@v6" and "actions/setup-python@v4" to their corresponding full
commit SHA values (matching the versions you intend to use), ensuring the
workflow still functions and updating any dependent inputs if necessary; verify
the pinned SHAs are from upstream actions/checkout and actions/setup-python
repositories and run the workflow to confirm no regressions.
- Around line 24-36: The workflow defines a workflow-level permissions block
granting "issues: write" but the job "detect-failures" overrides permissions to
only "contents: read" and the detector actually uses the GitHub App token
(TARGET_TOKEN) for issue operations, so remove the unused workflow-level
"issues: write" entry or add a clear comment above the workflow-level
permissions explaining that issue writes are handled by the App token
(TARGET_TOKEN) and not by GITHUB_TOKEN; update the "permissions" block or add
the comment near the top-level permissions and keep the job-level permissions in
the "detect-failures" job as-is to avoid confusion.
- Line 66: The DRY_RUN expression is hard to read; replace the current
expression with a ternary for clarity and add a short comment: set DRY_RUN to
${{ github.event_name == 'workflow_dispatch' && inputs.dry_run == 'true' ?
'true' : 'false' }} (and add a one-line comment above explaining "only honor
inputs.dry_run for manual workflow_dispatch runs; scheduled runs always create
issues"). Reference the DRY_RUN env variable and the existing github.event_name
and inputs.dry_run symbols when making the change.
- Around line 13-16: Change the workflow_dispatch input "run_id" to use type:
number instead of type: string by updating the run_id input definition (replace
type: string with type: number); then remove or adjust the code that checks for
an empty string value for run_id (the check that compares run_id == ""), since
with type: number an omitted input will be null rather than ""; ensure any
subsequent logic expects a number or null and handle null appropriately.
- Around line 29-34: Add a defensive job timeout to the detect-failures job by
setting the timeout-minutes property (e.g., 30 or 60) under the detect-failures
job definition so the workflow won't hang indefinitely; update the job block
that defines name: "Detect test failures" / runs-on: ubuntu-latest to include
timeout-minutes: 30 (or 60) directly beneath runs-on.

In `@tests/test_issue_manager.py`:
- Around line 109-115: Add a regression test for when the issue body lacks an
existing "**Environments:**" line: create a new test (e.g., in class
TestUpdateEnvironments) that calls _update_environments_in_body with a body like
"Some text\nMore text" and environments ["job-a", "job-b"] and assert the
returned body now contains a new "**Environments:** `job-a`, `job-b`" line while
preserving the original text; also add a parallel case calling
process_failures() (or the higher-level wrapper used in tests) to ensure it
appends the environments instead of leaving the body unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: d2f84d0b-8c49-4872-8f1a-2308f045bf68

📥 Commits

Reviewing files that changed from the base of the PR and between 87b3aaa and 182577a.

📒 Files selected for processing (12)
  • .github/workflows/test-failure-detector-sweep.yml
  • README.md
  • docs/architecture.md
  • scripts/test_failure_detector/__init__.py
  • scripts/test_failure_detector/download.py
  • scripts/test_failure_detector/main.py
  • scripts/test_failure_detector/manage_issues.py
  • scripts/test_failure_detector/parse_failures.py
  • tests/test_artifact_fetcher.py
  • tests/test_download.py
  • tests/test_failure_parser.py
  • tests/test_issue_manager.py

Comment thread scripts/test_failure_detector/download.py Outdated
Comment thread scripts/test_failure_detector/main.py
Comment thread scripts/test_failure_detector/manage_issues.py Outdated
Comment thread scripts/test_failure_detector/manage_issues.py Outdated
Comment thread scripts/test_failure_detector/parse_failures.py Outdated
BChan-0 and others added 21 commits June 4, 2026 16:36
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
…al trigger), running with BChan-0/valkey daily ci

Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
valkey-io -> BChan-0/valkey

Signed-off-by: Bonnie Chan <bonchan35@gmail.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
change line 30 to test on forked repo
valkey-io/valkey -> BChan-0/valkey

Signed-off-by: Bonnie Chan <bonchan35@gmail.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/test-failure-detector-sweep.yml:
- Around line 39-45: Replace mutable action tags with their corresponding commit
SHAs and restrict the GitHub App token scope: change usages of
actions/checkout@v6 and actions/setup-python@v6 to the exact commit SHAs (pin to
the specific SHA for those actions), and update the
actions/create-github-app-token step to include a repositories: entry scoped to
the target repo (e.g., use the inputs.repo or 'valkey-io/valkey') so the token
is not granted access to all repos under owner: valkey-io.
- Around line 56-60: The create-github-app-token step currently provides owner:
valkey-io which yields a token scoped to all installations; update the step that
calls actions/create-github-app-token to include repositories: ${{ inputs.repo
|| 'valkey-io/valkey' }} so the generated installation token is limited to the
targeted repo (matching the later --repo "${{ inputs.repo || 'valkey-io/valkey'
}}" usage) and enforces least privilege while preserving workflow_dispatch
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 168eb16a-d63e-43eb-b209-b25863700f12

📥 Commits

Reviewing files that changed from the base of the PR and between 182577a and 88c584f.

📒 Files selected for processing (12)
  • .github/workflows/test-failure-detector-sweep.yml
  • README.md
  • docs/architecture.md
  • scripts/test_failure_detector/__init__.py
  • scripts/test_failure_detector/download.py
  • scripts/test_failure_detector/main.py
  • scripts/test_failure_detector/manage_issues.py
  • scripts/test_failure_detector/parse_failures.py
  • tests/test_artifact_fetcher.py
  • tests/test_download.py
  • tests/test_failure_parser.py
  • tests/test_issue_manager.py
✅ Files skipped from review due to trivial changes (3)
  • docs/architecture.md
  • scripts/test_failure_detector/init.py
  • README.md
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/test_download.py
  • scripts/test_failure_detector/parse_failures.py
  • scripts/test_failure_detector/main.py
  • tests/test_artifact_fetcher.py
  • tests/test_issue_manager.py
  • scripts/test_failure_detector/download.py
  • scripts/test_failure_detector/manage_issues.py
  • tests/test_failure_parser.py

Comment thread .github/workflows/test-failure-detector-sweep.yml Outdated
Comment thread .github/workflows/test-failure-detector-sweep.yml
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
@hanxizh9910

Copy link
Copy Markdown

I will take a look!

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
.github/workflows/test-failure-detector-sweep.yml (1)

28-37: ⚡ Quick win

Consider adding a timeout to prevent runaway jobs.

The job inherits the default GitHub Actions timeout of 360 minutes, but the GitHub App token expires after 60 minutes. Adding an explicit timeout-minutes: 60 (or 30) would align the job lifecycle with the token lifetime and prevent unexpected long-running scenarios.

⏱️ Suggested addition
 jobs:
   detect-failures:
     name: "Detect test failures"
+    timeout-minutes: 60
     if: >
       (github.event_name == 'workflow_dispatch' ||
       github.repository == 'valkey-io/valkey-ci-agent')

Based on learnings: In the valkey-io/valkey-ci-agent repository, the fuzzer monitor workflow intentionally sets timeout-minutes: 180 as a worst-case safety net, with real runs expected to complete in 10–20 minutes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/test-failure-detector-sweep.yml around lines 28 - 37, The
detect-failures job currently has no explicit timeout and can run up to the
GitHub default (360 minutes); add a timeout-minutes field to the job definition
(the jobs.detect-failures block) to align with the GitHub App token lifetime —
e.g., add timeout-minutes: 60 (or 30) directly under the detect-failures job
metadata (near name/runs-on/permissions) so the job will be forcibly cancelled
if it exceeds that limit.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@scripts/test_failure_detector/download.py`:
- Around line 191-195: The normalized alias can overwrite an existing exact
job-name mapping (job_url_map) causing wrong URLs; ensure you first store the
exact mapping (job_url_map[job.name] = job.html_url) and only add the normalized
alias (normalized) if it differs from job.name and does not already exist as a
key in job_url_map (i.e., check normalized not in job_url_map before assigning
job_url_map[normalized] = job.html_url) so exact names always take precedence.

---

Nitpick comments:
In @.github/workflows/test-failure-detector-sweep.yml:
- Around line 28-37: The detect-failures job currently has no explicit timeout
and can run up to the GitHub default (360 minutes); add a timeout-minutes field
to the job definition (the jobs.detect-failures block) to align with the GitHub
App token lifetime — e.g., add timeout-minutes: 60 (or 30) directly under the
detect-failures job metadata (near name/runs-on/permissions) so the job will be
forcibly cancelled if it exceeds that limit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 6969c79d-14c2-45cd-8195-bf8a8dfbcc6a

📥 Commits

Reviewing files that changed from the base of the PR and between 88c584f and 4334524.

📒 Files selected for processing (12)
  • .github/workflows/test-failure-detector-sweep.yml
  • README.md
  • docs/architecture.md
  • scripts/test_failure_detector/__init__.py
  • scripts/test_failure_detector/download.py
  • scripts/test_failure_detector/main.py
  • scripts/test_failure_detector/manage_issues.py
  • scripts/test_failure_detector/parse_failures.py
  • tests/test_artifact_fetcher.py
  • tests/test_download.py
  • tests/test_failure_parser.py
  • tests/test_issue_manager.py
✅ Files skipped from review due to trivial changes (2)
  • scripts/test_failure_detector/init.py
  • docs/architecture.md
🚧 Files skipped from review as they are similar to previous changes (8)
  • tests/test_failure_parser.py
  • tests/test_download.py
  • README.md
  • scripts/test_failure_detector/manage_issues.py
  • tests/test_issue_manager.py
  • scripts/test_failure_detector/parse_failures.py
  • scripts/test_failure_detector/main.py
  • tests/test_artifact_fetcher.py

Comment thread scripts/test_failure_detector/download.py
@BChan-0

BChan-0 commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Could you link a successful workflow run or sample issues created during testing? I'd like to confirm the end-to-end flow works, especially that duplicate issues aren't created when the same run is processed twice.

@hanxizh9910 Here's a test issue chain on my fork: BChan-0/valkey#7
Right now, I have the script using fingerprints and idempotency from PR 2's dedup logic (/scripts/common). This PR just handles more dedup in the issue bodies, mostly for environments.

Comment thread .github/workflows/test-failure-detector-sweep.yml Outdated
Comment thread .github/workflows/test-failure-detector-sweep.yml Outdated
@hanxizh9910

Copy link
Copy Markdown

Could you link a successful workflow run or sample issues created during testing? I'd like to confirm the end-to-end flow works, especially that duplicate issues aren't created when the same run is processed twice.

@hanxizh9910 Here's a test issue chain on my fork: BChan-0/valkey#7 Right now, I have the script using fingerprints and idempotency from PR 2's dedup logic (/scripts/common). This PR just handles more dedup in the issue bodies, mostly for environments.

@BChan-0 Thanks for this! I noticed the issue was created today (~5:43 PM Pacific), but the CI links point to runs from two days ago, is it expected?

Also, could you verify the update part works as well? Specifically: when a test fails in some new jobs on the next run, does it add a new comment and update the Environments section?

@BChan-0

BChan-0 commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@BChan-0 Thanks for this! I noticed the issue was created today (~5:43 PM Pacific), but the CI links point to runs from two days ago, is it expected?

Also, could you verify the update part works as well? Specifically: when a test fails in some new jobs on the next run, does it add a new comment and update the Environments section?

@hanxizh9910 Yes, the linked CI runs are correct. I have Daily intentionally skipping and only running environments manually for triggering dummy-flaky runs, so the latest valid test was at that time.

Here's a new issue chain where you can see the environment update working more clearly. The logic is similar to 3358, just housed differently.

@asagege

asagege commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

@BChan-0 Thanks for this! I noticed the issue was created today (~5:43 PM Pacific), but the CI links point to runs from two days ago, is it expected?
Also, could you verify the update part works as well? Specifically: when a test fails in some new jobs on the next run, does it add a new comment and update the Environments section?

@hanxizh9910 Yes, the linked CI runs are correct. I have Daily intentionally skipping and only running environments manually for triggering dummy-flaky runs, so the latest valid test was at that time.

Here's a new issue chain where you can see the environment update working more clearly. The logic is similar to 3358, just housed differently.

You can put this example in the Test section in the PR description. There is always a Test section in a PR to say that you wrote test1, test2, test3...they all passed 10/10 (in case they have flakiness). Also in some special cases (like skipping), add a good example with CI runs link for each of them

@asagege asagege left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for README

Comment thread README.md Outdated
Comment thread README.md Outdated
BChan-0 and others added 3 commits June 15, 2026 23:58
Signed-off-by: Bonnie Chan <bonniecv@amazon.com>
Signed-off-by: Bonnie Chan <bonchan35@gmail.com>
…ing to only valkey repo

Signed-off-by: Bonnie Chan <bonniecv@amazon.com>

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer AI Review:

Findings

  • P1 duplicate issue regression: The new publisher dedups only by hidden marker (scripts/common/issue_dedup.py:87, scripts/common/issue_dedup.py:151), but existing open test-failure issues from the old Valkey workflow do not have valkey-ci-agent:test-failure markers. I verified valkey-io/valkey#3992 and #3979 lack the marker, and there are 21 open test-failure issues today. First recurrence after migration will create duplicate issues instead of updating them. Add a migration/backcompat path: lookup by existing [TEST-FAILURE] ... title + label, then inject the marker on update.

  • P1 not mergeable with latest main: PR head b1f32472 conflicts with latest upstream main db76ce22. gh pr view reports mergeable: CONFLICTING; git merge-tree shows conflicts in README.md and docs/architecture.md.

  • P2 unsafe raw fingerprints: scripts/test_failure_detector/issue_renderer.py:28 embeds raw test names and paths into HTML comments and GitHub search queries. Long names, quotes, newlines, or comment-breaking text can break marker search/dedup. Existing shared fuzzer code uses hashed fingerprints via scripts/common/incidents.py; this detector should do the same, with the title fallback above for migration.

  • P3 diff check failure: .github/workflows/test-failure-detector-sweep.yml:66 has trailing whitespace. git diff --check upstream/main...HEAD fails.

@asagege asagege left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

Comment thread tests/test_artifact_fetcher.py Outdated
Comment thread scripts/test_failure_detector/issue_renderer.py
Comment thread scripts/test_failure_detector/parse_failures.py Outdated
Comment thread .github/workflows/test-failure-detector-sweep.yml
Comment thread .github/workflows/test-failure-detector-sweep.yml Outdated
BChan-0 added 3 commits June 17, 2026 23:46
Resolve README.md and docs/architecture.md by keeping both feature sets:
the Test Failure Detector (this branch) and upstream's CI Fix bot, plus
upstream's backport poll / mark-done / revert workflows.

- README: both workflow-table rows and both feature sections; removed a
  duplicate Test Failure Detector section; adopted upstream's hyphen
  house-style in the new prose.
- architecture.md: both flow sections, both layer entries and entry-point
  lists; Planned Workflows keeps PR Reviewer, Autonomous CI-fix poller, and
  Additional Daily CI Analysis.

Signed-off-by: Bonnie Chan <213245020+BChan-0@users.noreply.github.com>
…ration fallback

Signed-off-by: Bonnie Chan <213245020+BChan-0@users.noreply.github.com>
Signed-off-by: Bonnie Chan <213245020+BChan-0@users.noreply.github.com>
@BChan-0 BChan-0 force-pushed the main branch 3 times, most recently from 57afb88 to 34d036e Compare June 18, 2026 20:05

@asagege asagege left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hanxizh9910 hanxizh9910 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me mostly. Did we test if the issue dedup works with existing issues created by the existing framework?

@BChan-0

BChan-0 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Looks good to me mostly. Did we test if the issue dedup works with existing issues created by the existing framework?

@sarthakaggarwal97 Thanks for the review. I had implemented tests for that case based on your earlier AI Review, but I just double-checked to confirm for your comment and found that the code for it was actually stashed from my main branch when I was resolving another issue. I've committed everything back into the branch now.

The dedup uses title_fallback to match by exact issue title (and then updates + marks it) when no issues with matching markers are found. If you'd like to see real cases of it, I'm running the Daily CI and detector on schedule in my forks starting today and can link the runs (I can reopen an old closed issue prior from fingerprinting being added to test this dedup).

…printing with test suite

Signed-off-by: Bonnie Chan <213245020+BChan-0@users.noreply.github.com>

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

sarthakaggarwal97 pushed a commit to valkey-io/valkey that referenced this pull request Jun 23, 2026
#### Purpose

This workflow was originally introduced in PR
[#3358](#3358), where we detect
the failures in our scheduled `daily` runs and create / update github
issues.

We want to do more things with AI with respect to tests failures. It
could include potentially finding the root cause, any PR that broke the
tests, some helpful dashboard to track daily tests, maybe some analysis
or possible fix as well.
To achieve that, we are moving this issue management out of this
repository and into `valkey-ci-agent`.

The Daily workflow in this repository still records per-job test
failures, consolidates them into `all-test-failures.json`, and uploads
the `all-test-failures` artifact. The workflow being removed here was
only responsible for consuming that artifact and creating or updating
GitHub issues.

#### Changes

Remove `.github/workflows/test-failure-detector.yml`.

Issue creation and updates are now handled by the Test Failure Detector
workflow in `valkey-ci-agent` through this PR
[#24](valkey-io/valkey-ci-agent#24).

#### Notes

This should be merged together with the corresponding `valkey-ci-agent`
change so scheduled test-failure detection continues without a gap.

Signed-off-by: Bonnie Chan <bonchan35@gmail.com>

@sarthakaggarwal97 sarthakaggarwal97 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged valkey-io/valkey#3919. Proceeding with this merge as well. Thanks @BChan-0 for the effort.

@sarthakaggarwal97 sarthakaggarwal97 changed the title Migration and Modifications of Test Failure Detection (PR 3358) from valkey-io/valkey Adding support for test failure detection in valkey Jun 23, 2026
@sarthakaggarwal97 sarthakaggarwal97 merged commit 1a96dbf into valkey-io:main Jun 23, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants