Skip to content

fix(bench): predictor DB-localization rule (Runtime gap)#2788

Merged
YauhenBichel merged 2 commits into
mainfrom
fix/2074-bench-runtime-lost-mode-triage
Jun 9, 2026
Merged

fix(bench): predictor DB-localization rule (Runtime gap)#2788
YauhenBichel merged 2 commits into
mainfrom
fix/2074-bench-runtime-lost-mode-triage

Conversation

@YauhenBichel

Copy link
Copy Markdown
Collaborator

Fixes #2074

Describe the changes you have made in this PR -

Add a Database-localization rule to the predictor system prompt to fix the dominant Runtime-stratum failure pattern surfaced by the loss-mode triage on dev-2026-06-09T13:10:13Z. When DB symptoms appear (MySQL connection refused, port mismatch, auth failure, pool exhaustion), the predictor now localizes onto the DB service (app/tsdb-mysql, app/redis-cart) instead of the upstream caller that surfaces the failure.

Root cause from the triage

Runtime loss-mode breakdown on the post-vocab-fix full-N (n=423 cells × 3 modes):

  • OBJECT_MISS (wrong service): 57.9% of opensre+llm Runtime losses
  • OBJECT_HIT_RC_MISS (predictor drift): 24.0%
  • TOP3_a3: 18.0%

Within OBJECT_MISS, tsdb-mysql accounted for 37 of 135 cells (27%). In 20 of those, the predictor picked app/ts-inside-payment-service (the immediate DB caller). Within OBJECT_HIT_RC_MISS, the dominant pattern was mysql_invalid_port → db_connection_exhaustion (25 cells) — the same DB-localization problem at the root_cause level (LLM reaches for the generic exhaustion bucket rather than the specific port-misconfig cause).

Combined: 62 of 233 Runtime losses (27%) are a form of DB-localization error.

Code Understanding and AI Usage

Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?

  • No, I wrote all the code myself
  • Yes, I used AI assistance (continue below)

If you used AI assistance:

  • I have reviewed every single line of the AI-generated code
  • I can explain the purpose and logic of each function/component I added
  • I have tested edge cases and understand how the code handles them
  • I have modified the AI output to follow this project's coding standards and conventions

Explain your implementation approach:

File Change
tests/benchmarks/cloudopsbench/predictor/llm_call.py Add Database-localization rule block between the namespace-scope rule and the performance-fault disambiguation rules in _build_system_prompt(). Symmetric framing to the existing namespace-scope rule (faults live at their origin level).

The rule has three parts:

  1. Localization principle: when DB symptoms + DB service named, fault_object MUST be the DB service, not the upstream caller. Explicitly calls out the ts-inside-payment-service substitution as the wrong-localization pattern.
  2. MySQL root_cause disambiguation: distinguishes mysql_invalid_port (port mismatch, "connection refused" on non-3306), mysql_invalid_credentials (auth failure), and db_connection_exhaustion (actual pool saturation). Targets the 25-cell mysql_invalid_port → db_connection_exhaustion confusion.
  3. Tiebreaker: when uncertain between mysql_invalid_port and db_connection_exhaustion, prefer port — exhaustion is the over-fired generic bucket on this corpus.

Checklist before requesting a review

  • I have added proper PR title and linked to the issue
  • I have performed a self-review of my code
  • I can explain the purpose of every function, class, and logic block I added
  • I understand why my changes work and have tested them thoroughly
  • I have considered potential edge cases and how my code handles them
  • If it is a core feature, I have added thorough tests
  • My code follows the project's style guidelines and conventions

Note: Please check Allow edits from maintainers if you would like us to assist in the PR.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile code review

This repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md.

Run a review — add a PR comment with:

@greptile review

Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5.

Optional: automate with the greploop skill.

@YauhenBichel

Copy link
Copy Markdown
Collaborator Author

@greptile review

@greptile-apps

greptile-apps Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

Adds a "Database-localization rule" block to the predictor system prompt in _build_system_prompt() to fix the dominant Runtime-stratum failure pattern identified by loss-mode triage: the LLM was localizing DB faults onto upstream callers instead of the DB service itself. The new block is inserted between the namespace-scope rule and the performance-fault disambiguation rules, following the same structural pattern as the existing rules.

  • DB-localization principle: when DB symptoms are present and a DB service (tsdb-mysql, redis-cart) is named, fault_object must be the DB service — not the upstream caller experiencing the downstream effect.
  • MySQL root_cause disambiguation: provides explicit evidence cues to distinguish mysql_invalid_port, mysql_invalid_credentials, and db_connection_exhaustion, with a tiebreaker preferring mysql_invalid_port over the over-fired generic exhaustion bucket.
  • Redis root_cause disambiguation: mirrors the MySQL section with cues for missing_secret_binding, db_readonly_mode, and db_connection_exhaustion.

Confidence Score: 5/5

Safe to merge — the change is purely additive prompt text with no modifications to parsing, validation, or any runtime code path.

All changes are confined to the static string returned by _build_system_prompt(). No control-flow, schema validation, scoring logic, or external calls are touched. The new rule block follows the exact structural pattern of existing rules (namespace-scope, performance-fault), and the MySQL tiebreaker is grounded in the triage data cited in the PR description.

No files require special attention.

Important Files Changed

Filename Overview
tests/benchmarks/cloudopsbench/predictor/llm_call.py Adds 52 lines of prompt text for DB-localization, MySQL root-cause disambiguation with tiebreaker, and Redis root-cause disambiguation; no logic changes to parsing, validation, or runtime code paths.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM receives alert + investigation summary] --> B{Investigation summary present?}
    B -- Yes --> C[Treat as AUTHORITATIVE for rank-1]
    B -- No --> D[Reason from alert alone]
    C --> E{Root cause type?}
    D --> E

    E -- namespace_* token --> F["fault_object = namespace/X - Scope Rule"]
    E -- DB symptoms named --> G{Which DB service?}
    E -- Performance anomaly --> H["fault_object = service with highest saturation / latency - Performance-fault Rule"]
    E -- Other --> I[Standard ranking]

    G -- tsdb-mysql / MySQL error --> J["fault_object = app/tsdb-mysql - DB-localization Rule"]
    G -- redis-cart / Redis error --> K["fault_object = app/redis-cart - DB-localization Rule"]

    J --> L{MySQL root_cause?}
    L -- port mismatch / refused on non-3306 --> M[mysql_invalid_port]
    L -- access denied / auth failed --> N[mysql_invalid_credentials]
    L -- too many connections / pool at limit --> O[db_connection_exhaustion]
    L -- Uncertain --> M

    K --> P{Redis root_cause?}
    P -- NOAUTH / WRONGPASS / no requirepass --> Q[missing_secret_binding]
    P -- READONLY replica --> R[db_readonly_mode]
    P -- max clients reached / maxclients --> S[db_connection_exhaustion]
Loading

Reviews (3): Last reviewed commit: "added redis" | Re-trigger Greptile

Comment thread tests/benchmarks/cloudopsbench/predictor/llm_call.py Outdated
@YauhenBichel

Copy link
Copy Markdown
Collaborator Author

@greptile review

@YauhenBichel YauhenBichel marked this pull request as ready for review June 9, 2026 17:47
@YauhenBichel YauhenBichel merged commit be6d2a5 into main Jun 9, 2026
17 checks passed
@YauhenBichel YauhenBichel deleted the fix/2074-bench-runtime-lost-mode-triage branch June 9, 2026 17:49
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

🧠 @YauhenBichel opened a PR. Maintainers feared them. CI genuflected. It merged. 🚨


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Benchmark opensre+LLM vs LLM-alone (Cloudopsbench)

1 participant