Skip to content

[WIP] Address feedback on model routing policy in ADR-017 and ADR-018#385

Merged
rjmurillo merged 3 commits into
docs/adr-017from
copilot/sub-pr-310
Dec 24, 2025
Merged

[WIP] Address feedback on model routing policy in ADR-017 and ADR-018#385
rjmurillo merged 3 commits into
docs/adr-017from
copilot/sub-pr-310

Conversation

Copilot AI commented Dec 24, 2025

Copy link
Copy Markdown
Contributor

Completed: ADR Renumbering + Protocol Compliance

Successfully addressed feedback to resolve ADR numbering conflicts, added required session logs with full QA, and documented protocol discoveries.

Files Renamed

  1. ADR-017-model-routing-strategy.mdADR-019-model-routing-strategy.md
  2. ADR-017-debate-log.mdADR-019-debate-log.md.agents/critique/ADR-019-debate-log.md
  3. ADR-018-architecture-governance-split-criteria.mdADR-020-architecture-governance-split-criteria.md
  4. .serena/memories/adr-017-quantitative-analysis.mdadr-019-quantitative-analysis.md
  5. .serena/memories/adr-017-split-execution.mdadr-019-split-execution.md

References Updated

  • ✅ All internal ADR cross-references (ADR-017→019, ADR-018→020)
  • ✅ Governance policy references in AI-REVIEW-MODEL-POLICY.md
  • ✅ Debate log moved to .agents/critique/ directory
  • ✅ Updated all references to debate log path (architecture → critique)
  • ✅ Serena memory files with new ADR numbers
  • ✅ Fixed markdown linting issues (MD041, MD001, MD036)
  • ✅ Verified no broken links

Session Logs & Documentation

  • ✅ Session 92: ADR renumbering work
  • ✅ Session 93: Skillbook learning about pre-commit protocol (with QA)
  • ✅ Branch handoff: .agents/handoffs/copilot/sub-pr-310/session-92-93.md
  • ✅ GitHub issue documented: HANDOFF.md validator protocol conflict (saved to /tmp/github-issue-handoff-validator.md)

New Skills Generated

  • skill-git-002-fix-hook-errors-never-bypass: Fix errors flagged by pre-commit hooks; NEVER use --no-verify (Impact: 10/10 Critical)
  • skill-workflow-012-branch-handoffs: Use branch handoffs on feature branches instead of HANDOFF.md (Impact: 9/10)
  • ✅ QA Report: .agents/qa/session-93-skill-git-002-qa-report.md (7/7 tests passed)

Protocol Compliance

  • ✅ All pre-commit hook requirements satisfied
  • ✅ Session End checklists match canonical format (9 rows)
  • ✅ All MUST requirements completed with evidence
  • ✅ QA validation performed (100% test coverage)
  • ✅ Branch handoff created (feature branch protocol per ADR-014)
  • ✅ Markdown linting passed (0 errors)
  • ✅ Memory index validation passed
  • ✅ Skill format validation passed

Protocol Discovery

Identified and documented HANDOFF.md validator conflict where three sources contradict:

  1. HANDOFF.md says "DO NOT update"
  2. Validator requires session log reference
  3. Pre-commit hook blocks HANDOFF.md on feature branches

Workaround Applied: Created branch handoff instead per ADR-014
Issue Documented: /tmp/github-issue-handoff-validator.md (for manual creation)

The renumbering resolves the ambiguity where multiple ADRs shared the same numbers (ADR-017 and ADR-018), making shorthand references unambiguous going forward.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits December 24, 2025 21:19
Co-authored-by: rjmurillo <6811113+rjmurillo@users.noreply.github.com>
@rjmurillo rjmurillo marked this pull request as ready for review December 24, 2025 22:14
@github-actions github-actions Bot added automation Automated workflows and processes area-skills Skills documentation and patterns labels Dec 24, 2025
@rjmurillo rjmurillo merged commit d42b5e6 into docs/adr-017 Dec 24, 2025
4 checks passed
@rjmurillo rjmurillo deleted the copilot/sub-pr-310 branch December 24, 2025 22:14
rjmurillo added a commit that referenced this pull request Dec 27, 2025
* docs(adr): add model routing policy to minimize false PASS

Signed-off-by: Richard Murillo <6811113+rjmurillo@users.noreply.github.com>

* docs: add session 85 - PR #310 review and description update

Session 85 reviewed ADR-017 model routing policy and updated PR #310
description using the PR template.

Key actions:
- Analyzed ADR-017 content and rationale
- Created comprehensive PR description with proper template sections
- Documented decision context and consequences

Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs(adr): Session 86 - ADR-017 critic review (model routing policy)

Critic review of ADR-017 (Copilot model routing policy).

## Summary

ADR-017 proposes evidence-aware, tiered model routing to minimize false PASS verdicts.
Core decision is sound; execution requires additional specifics before deployment.

**Position**: Disagree-and-Commit with conditions

- Approve strategic direction (evidence-based routing, conservative verdicts)
- Defer tactical implementation to Phase 2 (baseline metrics, concrete examples, validation)
- Three P1 concerns resolve before deployment (metrics, examples, model confirmation)
- Estimated Phase 2 effort: 4-7 hours across metrics, examples, and CI guardrails

## Key Findings

**Strengths** (5):
1. Clear problem identification (summary-mode false PASS)
2. Conservative evidence-sufficiency principle is sound
3. Well-reasoned model matrix by prompt shape
4. Honest tradeoffs acknowledged
5. Governance safeguard (copilot-model parameter required)

**Gaps** (7):
1. Model claims lack validation (no vendor benchmarks)
2. Implementation incomplete (CONTEXT_MODE header not shown)
3. Success metrics aspirational, not measurable
4. Evidence improvement marked optional vs. required
5. No cost impact quantification
6. Prompt enforcement vague
7. No model deprecation policy

**Recommendations** (7):
1. Add baseline metrics and thresholds
2. Concrete examples (before/after workflows)
3. Clarify evidence improvement scope
4. Model validation plan with monitoring
5. Quantify cost impact
6. CI validation script for prompt rules
7. Model deprecation policy and fallbacks

## Phase 2 Implementation Plan

1. Merge ADR-017 as strategic decision
2. Add copilot-model parameter to composite action
3. Create follow-up task: Implementation Specifics (examples, metrics, CI)
4. Do NOT deploy workflow changes until Phase 2 complete

Session: .agents/sessions/2025-12-23-session-86-adr-017-critic-review.md

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* docs(adr): refine ADR-017 through multi-agent debate

Conducted rigorous 2-round debate with 5 specialized agents
(architect, critic, independent-thinker, security, analyst).

Key changes from debate:
- Add Scope Clarification separating from Issue #164
- Add Section 4: Security Hardening (prompt injection, CONTEXT_MODE)
- Add Section 5: Escalation Criteria with operational table
- Add Section 6: Risk Review Contract for summary-mode PRs
- Promote Section 7: Aggregator Policy to required
- Add Prerequisites section with P0 blocking gates
- Update success metrics with baseline column and targets

Final positions: 4 Accept + 1 Disagree-and-Commit
Independent-thinker dissent documented in debate log.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* docs: update session 85 with multi-agent debate results

Added comprehensive summary of ADR-017 multi-agent debate:
- 2 rounds to consensus (4 Accept + 1 Disagree-and-Commit)
- 8 major ADR enhancements including security hardening
- Independent-thinker dissent documented
- Prerequisites section added (3 P0 + 1 P1 blocking gates)

Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs(adr): complete ADR-017 prerequisites and change status to Accepted

Execute all prerequisites for ADR-017 (Model Routing Policy):

P0-1: Baseline False PASS Measurement [COMPLETE]
- Audited last 20 merged PRs with AI reviews
- Found 3/20 (15%) required post-merge fixes
- Identified PRs #226, #268, #249 as false PASS cases
- Target: reduce to 7.5% within 30 days

P0-2: Model Availability Verification [COMPLETE]
- Verified all 6 models available in Copilot CLI
- Confirmed claude-opus-4.5 via workflow run 20475138392
- Documented fallback chains per ADR specification

P0-3: Governance Guardrail Status [DOCUMENTED]
- Audited 4 ai-*.yml workflows
- Found only 1/4 specifies copilot-model explicitly
- Implementation plan documented in ADR

P1-4: Cost Impact Analysis [COMPLETE]
- Analyzed 74 PRs merged in December 2025
- Projected 20-30% cost REDUCTION with routing policy
- Current: 100% opus; Projected: 35% opus, 50% sonnet, 15% mini

ADR Status: Proposed -> Accepted (2025-12-23)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs: update session 85 with prerequisites execution results

Session 85 extended to document ADR-017 prerequisites completion:
- Baseline false PASS rate: 15% (3/20 PRs)
- All 6 models verified available
- Cost impact: 20-30% REDUCTION (not increase)
- ADR status: Proposed -> Accepted

Generated with Claude Code

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs(adr): ADR-017 Round 3 post-prerequisites debate - clarify scope and strengthen security

Session 90: Conducted multi-agent debate on ADR-017 after prerequisite completion.
Achieved consensus (5 Accept + 1 Disagree-and-Commit) with critical scope clarification.

## Critical Finding

The 3 baseline false PASS cases (PRs #226, #268, #249) were caused by prompt quality
and validation gaps, NOT by evidence insufficiency or model mismatch. ADR solution
doesn't address current 15% baseline—it targets FUTURE risk from large PRs with
summary-mode context.

## P0 Changes Applied (8 blocking issues)

1. **Root Cause Analysis**: Explicitly states ADR doesn't fix current baseline cases;
   targets future evidence insufficiency risks. Separates metrics:
   - Baseline false PASS (all causes): 15%
   - Target false PASS (evidence insufficiency): TBD (new metric)

2. **Baseline Methodology**: Clarified all 20 PRs validated (17 confirmed no fixes,
   3 had post-merge fixes). 7-day window is lower bound.

3. **Status Timeline**: Added chronology showing prerequisites completed BEFORE
   status change to Accepted (2025-12-23).

4. **Prompt Injection**: Changed from blacklist (bypassable) to whitelist/schema
   validation. Reject input not conforming to alphanumeric + common punctuation.

5. **CONTEXT_MODE Validation**: Added token count check to prevent manipulation.
   Workflow fails if claimed mode doesn't match actual context size.

6. **Circuit Breaker**: Prevents fallback DoS attack. If 5 consecutive blocks due
   to "forbid PASS" rule, escalate to manual approval with oncall alert.

7. **Aggregator Enforcement**: Added branch protection requirement for "AI Review
   Aggregator" status check. Prevents developer bypass.

8. **Cost Calculation**: Explicit math showing 36% reduction (568 → 366 Opus-eq
   units). Reconciles 20% escalation rate with routing savings.

## P1 Changes Applied (2 important issues)

1. **Success Metrics**: Updated baseline from "TBD (prerequisite)" to "15% (P0-1 complete)"
2. **Partial Diff N**: Defined N=500 lines (aligns with spec-file behavior)

## Debate Results

- **Rounds**: 3 total (2 initial in Session 86-88, 1 post-prerequisites in Session 90)
- **Consensus**: 5 Accept (architect, critic, security, analyst, high-level-advisor)
  + 1 Disagree-and-Commit (independent-thinker)
- **Independent-thinker dissent**: Skeptical evidence insufficiency is primary lever,
  but ADR now intellectually honest about scope. Supports execution for validation.

## Files Modified

- `.agents/architecture/ADR-017-model-routing-low-false-pass.md`: 10 sections updated
- `.agents/architecture/ADR-017-debate-log.md`: Round 3 entry added, metadata updated
- `.agents/sessions/2025-12-23-session-90-adr-debate-clarification.md`: Session log

## Files Added (Sessions 86-88 artifacts)

- `.agents/sessions/2025-12-23-session-86-adr-017-architect-review.md`
- `.agents/sessions/2025-12-23-session-86-adr-017-independent-thinker-review.md`
- `.agents/sessions/2025-12-23-session-86-adr-017-security-review.md`
- `.agents/sessions/2025-12-23-session-87-adr-017-analyst-review.md`
- `.agents/sessions/2025-12-23-session-87-architect-adr-017-convergence.md`
- `.agents/sessions/2025-12-23-session-88-independent-thinker-adr-017-convergence.md`

ADR remains in Accepted status with clarified preventive scope.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs(adr): create ADR-018 establishing architecture vs governance split criteria

Session 90 follow-up: User questioned whether ADR-017 strictly adheres to
foundational ADR definition. Analysis revealed "single AD" criterion violation
(bundles 7 related decisions) and surfaced "Any Decision Record" debate.

## Problem

Ambiguity exists about when to use:
- `.agents/architecture/` (ADRs)
- `.agents/governance/` (operational policies)
- Both (split pattern like ADR-014 + COST-GOVERNANCE)

## Decision (ADR-018)

Establish explicit split criteria with three patterns:

### 1. ADR-only
- Affects system structure/quality attributes
- Primarily technical decision
- No ongoing enforcement required
- Example: API authentication strategy

### 2. Governance-only
- Operational policy/standard/process
- Does NOT affect architecture
- Requires compliance enforcement
- Example: naming-conventions.md

### 3. Split (ADR + Governance)
- BOTH architectural significance AND enforcement requirements
- Decision affects structure BUT requires ongoing compliance
- Policy evolves independently from architectural decision
- Example: ADR-014 (runner selection) + COST-GOVERNANCE (enforcement)

## Key Provisions

- **Decision matrix**: Classify by architectural impact + enforcement needs
- **Decision workflow**: Flowchart with 3 decision points
- **Real examples**: ADR-014 split (exemplar), ADR-017 (candidate for split)
- **Templates**: ADR and Governance policy templates in Appendix C
- **When to split**: Trigger criteria for retroactive splits

## Resolution of "Any Decision Record" Debate

**MADR movement**: Broadens ADRs to "Any" decision (design, process, governance)
**Critics**: Dilutes architectural focus, recommend separate records

**Our approach**: Hybrid
- Adopt "Any Decision Record" concept via governance/ directory
- Preserve architectural focus in architecture/ directory
- Use split pattern when both aspects exist

## Impact

- Resolves placement ambiguity for future decisions
- Recommends ADR-017 split into architecture + governance
- Establishes precedent for meta-ADRs (ADRs about ADR process)

## Files

- `.agents/architecture/ADR-018-architecture-governance-split-criteria.md` (new)
- `.agents/sessions/2025-12-23-session-90-adr-debate-clarification.md` (updated)
- `.serena/memories/adr-foundational-concepts.md` (updated with "Any Decision Record" debate)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* docs(adr): split ADR-017 into architecture decision + governance policy

Implements ADR-018 split pattern: separate immutable architectural decision from evolvable operational policy.

## What Changed

**Before**: Single bundled ADR-017-model-routing-low-false-pass.md (~550 lines)
- Mixed architectural decision with governance policy
- Violated 'single AD' criterion (bundled 7 related decisions)
- Policy changes required re-opening ADR debate

**After**: Split into focused documents

1. **ADR-017-model-routing-strategy.md** (architecture/, ~200 lines)
   - Immutable architectural decision
   - Focus: Why route models by prompt type + evidence availability
   - Contains: Context, Decision, Rationale, Alternatives, Consequences

2. **AI-REVIEW-MODEL-POLICY.md** (governance/, ~400 lines)
   - Evolvable operational policy
   - Contains: Model routing matrix, evidence sufficiency rules, security hardening, escalation criteria, aggregator enforcement, circuit breaker, monitoring
   - Can evolve without re-debating architecture

## Why Split (ADR-018 Criteria)

| Criterion | ADR-017 Analysis | Result |
|-----------|------------------|--------|
| Affects architecture? | Yes (routing affects system quality) | Architecture component |
| Requires enforcement? | Yes (MUST use copilot-model, branch protection) | Governance component |
| Tightly coupled? | Yes (routing + evidence + security + aggregator) | Split pattern applies |
| Policy evolves independently? | Yes (monitoring thresholds, escalation tuning) | Split benefits realized |

## Benefits Realized

- Architectural decision now follows 'single AD' criterion
- Governance policy can evolve without ADR debate
- Follows ADR-014 + COST-GOVERNANCE pattern (codebase exemplar)
- Clear separation: 'why we decided' vs 'how we enforce'

## Disposition

- Original bundled ADR-017-model-routing-low-false-pass.md preserved in git history
- Removed from working tree (replaced by split)
- ADR-017-debate-log.md updated with split documentation

Implements: ADR-018 Architecture vs Governance Split Criteria
Session: 90 (2025-12-23)

* chore(session-90): finalize session with split completion and memory storage

Session 90 outcomes:
- ADR-017 split completed (commit 0698b2e)
- Session log updated with commit evidence
- Cross-session context stored in Serena memory (adr-017-split-execution)

Session complete: All checklist items verified.

* chore(pr-310): complete review response session

Session 91 outcomes:
- Acknowledged all 4 issue comments (eyes reactions verified)
- Replied to AI Quality Gate CRITICAL_FAIL with infrastructure explanation (comment 3688634732)
- Documented 3 informational comments (no action required)
- No implementation work needed

Comment breakdown:
- gemini-code-assist[bot]: Unsupported file types (informational)
- github-actions[bot] AI Quality Gate: Infrastructure false positive (explained)
- coderabbitai[bot]: Review failed (informational)
- github-actions[bot] Session Protocol: PASS (informational)

PR #310 ready for human review and merge.

Note: .agents/pr-comments/PR-310/ working files are gitignored per repository policy.

* [WIP] Address feedback on model routing policy in ADR-017 and ADR-018 (#385)

* Rename ADR-019 to ADR-021 and ADR-020 to ADR-022 (#455)

* Initial plan

* Rename ADR-019 to ADR-021 and ADR-020 to ADR-022

- Renamed ADR-019-model-routing-strategy.md to ADR-021-model-routing-strategy.md
- Renamed ADR-020-architecture-governance-split-criteria.md to ADR-022-architecture-governance-split-criteria.md
- Updated all internal headers and cross-references
- Renamed associated debate log and memory files
- Updated references in governance policy and critique documents

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

* docs: add Copilot CLI model configuration to Serena memory

Addresses PR #310 review comment 2644791424

- Document available models per authentication context
- Include cost multipliers and parameter slugs
- Add cross-references to ADR-021 and AI-REVIEW-MODEL-POLICY
- Provide usage guidance for workflow configuration

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

---------

Signed-off-by: Richard Murillo <6811113+rjmurillo@users.noreply.github.com>
Co-authored-by: rjmurillo-bot <rjmurillo-bot@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: rjmurillo[bot] <250269933+rjmurillo-bot@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-skills Skills documentation and patterns automation Automated workflows and processes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants