feat(bench): full-N readiness + SHA capture fix#2799
Conversation
Greptile code reviewThis repo uses Greptile for automated review. Before merge, aim for Confidence Score: 5/5 with zero unresolved review threads — see CONTRIBUTING.md. Run a review — add a PR comment with: Give it ~5-10 minutes (sometimes longer) for results, then fix feedback and re-trigger until you reach Confidence Score: 5/5. Optional: automate with the greploop skill. |
terraform-bench plan
Plan outputTerraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# aws_ecs_task_definition.bench must be replaced
-/+ resource "aws_ecs_task_definition" "bench" {
~ arn = "arn:aws:ecs:us-east-1:395261708130:task-definition/opensre-bench:34" -> (known after apply)
~ arn_without_revision = "arn:aws:ecs:us-east-1:395261708130:task-definition/opensre-bench" -> (known after apply)
~ container_definitions = jsonencode(
~ [
~ {
~ image = "395261708130.dkr.ecr.us-east-1.amazonaws.com/opensre-bench:3d20513" -> "395261708130.dkr.ecr.us-east-1.amazonaws.com/opensre-bench:bootstrap"
- mountPoints = []
name = "bench"
- portMappings = []
- systemControls = []
- volumesFrom = []
# (4 unchanged attributes hidden)
},
] # forces replacement
)
~ enable_fault_injection = false -> (known after apply)
~ id = "opensre-bench" -> (known after apply)
~ revision = 34 -> (known after apply)
- tags = {} -> null
# (10 unchanged attributes hidden)
}
Plan: 1 to add, 0 to change, 1 to destroy.
Changes to Outputs:
~ task_definition_arn = "arn:aws:ecs:us-east-1:395261708130:task-definition/opensre-bench:34" -> (known after apply)
Updated by |
|
@greptile review |
Greptile SummaryThis PR fixes the SHA capture regression that caused Fargate runs to stamp
Confidence Score: 5/5Safe to merge; all changes are scoped to the benchmark framework and its configuration — no production application code is touched. The three-layer SHA fix (env var resolution, Dockerfile ARG/ENV, runtime regex gate) is coherent and well-tested. The pre-registration and run configs are documentation-heavy and self-contained. The only findings are a stale docstring example, a cheap-before-expensive check ordering preference, and a missing dirty-suffix test case — none affect runtime correctness. No files require special attention; the runner.py changes have dedicated test coverage in the new test_runner_sha_integrity.py. Important Files Changed
Sequence DiagramsequenceDiagram
participant WF as GitHub Actions Workflow
participant Docker as Dockerfile.bench
participant Runner as BenchmarkRunner.run()
participant GitSHA as _git_sha()
participant Gate as _validate_promotable_sha()
WF->>Docker: "--build-arg OPENSRE_SHA=github.sha"
Docker->>Docker: ARG OPENSRE_SHA / ENV OPENSRE_SHA
Note over Docker: Runtime container has OPENSRE_SHA set
Runner->>GitSHA: _git_sha() called in __init__
GitSHA->>GitSHA: os.environ.get(OPENSRE_SHA)
alt OPENSRE_SHA set (Fargate path)
GitSHA-->>Runner: returns full 40-char SHA
else OPENSRE_SHA empty (local dev path)
GitSHA->>GitSHA: git rev-parse --short HEAD
GitSHA-->>Runner: returns short SHA or (no-git)
end
Runner->>Runner: integrity.pre_flight()
Runner->>Gate: _validate_promotable_sha(self._opensre_sha)
alt "SHA matches ^[0-9a-f]{7,40}$"
Gate-->>Runner: passes
Runner->>Runner: "_run_inner(dev_mode=False)"
else Invalid SHA
Gate-->>Runner: raises IntegrityViolation
end
Reviews (3): Last reviewed commit: "fixed git SHA related issues" | Re-trigger Greptile |
|
@greptile review |
|
🛸 Aliens watching our repo just upgraded @YauhenBichel's threat level to: do not engage — too competent. 👽 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Fixes #2074
Describe the changes you have made in this PR -
Three things in one PR, all in service of getting the DB-evidence
pipeline full-N runnable and promotable on #2074:
V2 prompt — alert-anchored upstream-attribution rule. When the
alert names a service and describes a performance / latency /
network / resource problem, trust the alert's named service.
Downstream services with loud error logs are usually victims of the
slow upstream, not causes. Added to the trimmed bench prompt
alongside the existing dependency-traversal rule.
Full-N pre-registration + run config. Locks predictions for each
fault category (Admission +20pp, Performance -5pp, aggregate +5pp)
before seeing data. Decision matrix: ship if all four gates pass;
roll back if Admission or Performance hardens beyond the threshold.
Includes the explicit "do not iterate prompts" rollback rule learned
from the V3 smoke regression.
SHA capture bug fix. The previous full-N stamped
opensre_sha=(no-git)despitedev_mode=false. The Fargatecontainer has no
.gitdirectory, and the integrity gate didn'tenforce the pre-reg's
committed_checkout_required: true. Now:_git_sha()reads theOPENSRE_SHAenv var first.--build-arg OPENSRE_SHA=<tag>.Dockerfile.benchexposes it as a runtime env var.BenchmarkRunner.run()rejects(no-git)/(unknown)SHAs atstart so unverifiable artifacts cannot reach a report
Code Understanding and AI Usage
Did you use AI assistance (ChatGPT, Claude, Copilot, etc.) to write any part of this code?
If you used AI assistance:
Explain your implementation approach:
the score worse. The V2 rule (this PR) is what survived. The test
pins that V3 stays gone.
investigation_a1/translation_loss) shipped separately in feat(bench): investigation_a1 + translation_loss metrics #2798.~/DevBox/tracer-cloud/opensre-notes/as a draft; not in this PR.
Checklist before requesting a review
Note: Please check Allow edits from maintainers if you would like us to assist in the PR.