Agent Persona Exploration - 2026-03-30 #23507
Closed
Replies: 1 comment
-
|
This discussion has been marked as outdated by Agent Persona Explorer. A newer discussion is available at Discussion #23631. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Persona Overview
developer.instructions(agentic-workflows custom agent)Personas & Scenarios Tested
Key Findings
claudefor reasoning-heavy analysis,codexfor sequential log investigation,copilotfor lighter classification/writing tasks.strict: true, scoped bash allowlists, wrote only viasafe-outputs, and appliednoopearly-exit conditions.steps:pre-processing blocks and are never accessible to the agent.Top Patterns
pull_requestwith[opened, synchronize, reopened/ready_for_review]. Scheduled workflows pairschedule:withworkflow_dispatch:for manual re-runs. Event-driven workflows useworkflow_runwith anif:conclusion guard.repo-memoryrecommended for cross-run persistence (baselines, trend tracking).web-fetchused where external API lookups are needed.strict: trueon every workflow;concurrency: cancel-in-progress: trueon all PR workflows;hide-older-comments: trueon PR comment outputs;max:limits on all safe-outputs;noopcondition for empty/no-op runs.View High Quality Responses (Top 3)
Cost Anomaly Report (DevOps-2) — Most architecturally sophisticated. Introduced a two-phase design where AWS credentials exist only in the
steps:pre-processing block; the agent reads a pre-fetched JSON file and never has network access to AWS. Included complete IAM policy with minimum permissions, OIDC trust policy pinned to a specific repo, and first-run behavior documentation. Usedrepo-memoryfor an 8-week rolling baseline.Flaky Test Detector (QA-2) — Strongest cross-run reasoning design. Used
repo-memoryfor week-over-week trend tracking, included a code-change correlation filter to distinguish "broken by regression" from "truly flaky", applied two-layer noise filtering (≥5 runs AND ≥10% failure rate), and correctly setclose-older-issues: false(flaky tests are engineering debt, not auto-resolvable reports).Deployment Failure Analyst (DevOps-1) — Introduced pre-download
steps:to fetch logs before agent starts, reducing agent token budget and maintaining a security boundary. Engine choice ofcodex(vs.claude) was well-justified for sequential, tool-heavy investigation tasks.View Areas for Improvement
1. Inconsistent
paths:filter adoption on PR workflowsThe bundle size workflow correctly used
paths:to avoid running on documentation-only PRs, but the schema migration and test coverage workflows did not. For tasks that only matter when specific file types change,paths:filters reduce CI cost and agent noise significantly. The agent documentation (.github/aw/create-agentic-workflow.mdor similar) could more strongly recommendpaths:for PR-triggered workflows.2. Inconsistent
min-integrityguidanceThe schema migration workflow used
min-integrity: none(correct for fork PRs) while the weekly digest usedmin-integrity: approved. The agent makes reasonable choices per context, but explicit guidance on when to use each value would reduce per-scenario variability. Consider adding a decision table to the workflow authoring guide in.github/aw/*.md.3. First-run behavior for
repo-memoryworkflows not consistently documented in promptsThe cost anomaly workflow explicitly documented first-run behavior ("baseline won't exist yet — skip baseline comparison, write initial snapshot"). The flaky test and performance workflows did not include equivalent handling, which could cause agent confusion on the first run. A prompt pattern for graceful
repo-memorycold-starts would be a useful addition to the documentation in.github/aw/*.md.Recommendations
Add
paths:filter examples to the workflow creation guide (.github/aw/*.md) — show PR-triggered workflows usingpaths:to scope execution to relevant file types. This is a high-leverage optimization for reducing CI cost and agent token consumption.Add a
min-integritydecision table to the documentation — clarify when to usenone(any contributor, fork PRs),low, andapproved. The current behavior is correct but inconsistent across scenarios; explicit guidance would standardize it.Add a
repo-memorycold-start pattern to the prompt library — a reusable snippet showing how to handle the first run when no baseline/history file exists yet. This prevents agent confusion in week 1 of any memory-enabled scheduled workflow.References:
Beta Was this translation helpful? Give feedback.
All reactions