Skip to content

Enhancement: LLM-assisted changelog filter in tool-release-watch #802

@avifenesh

Description

@avifenesh

Problem

`tool-release-watch.yml` currently opens an issue per tool release with the full upstream changelog attached. For tier-S tools with large release notes (Claude Code, Codex, OpenCode), this means a human has to manually scan for schema/config-relevant items - which is exactly the work done in the 2026-04-25 triage batch (closes #778-#782, #795-#797).

Proposal

Extend the watcher with an LLM-assisted filter:

  1. Per-tool `changes_of_interest` schema in `.github/tool-release-baselines.json`:
    • Which config files agnix validates (already implied by `validators`)
    • Which change-types matter (schema additions, frontmatter fields, hook events, MCP config shape, settings keys)
    • Which don't (UI, model list, perf, telemetry, provider additions)
  2. After fetching the release notes, call the LLM (GLM, infra already present for kiro/windsurf via `scripts/glm-extract.js`) with a per-tool prompt that:
    • Enumerates the relevant config surfaces
    • Asks for an agnix-focused summary ("which items in these notes affect a validator?")
    • Returns structured markdown - `## Relevant / ## Irrelevant / ## Rule candidates`
  3. Open issue with LLM summary. Keep the raw changelog in a `
    Details` block at the bottom.
  4. Fallback: if LLM call fails (API error, empty response, timeout), open issue with raw changelog as today.

Why

  • During the 2026-04-25 triage, 4 of 8 tools (OpenCode, Cline, Cursor, Roo) had zero schema impact but still required reading full release notes.
  • The 4 that did have impact (Claude Code, Codex, Kiro, Gemini) had ~2-5 relevant items buried in 15-50 bullet points.
  • GLM takes ~22s per call at `glm-5`, vastly less than a human changelog read.

Research needed

  • Shape per tool: each upstream has a different release-notes format (GitHub releases body, HTML changelog, RSS). Some are verbose (Codex, Claude Code), some terse (Roo, Cline). Prompt + token budget needs tuning per source.
  • Detect when LLM is unnecessary (e.g. Roo's release body is just "Release v3.53.0" with a link - no point calling LLM on that).
  • Cache per (tool, version) so workflow reruns don't re-query.

Existing infra to reuse

  • `scripts/glm-extract.js` already has GLM client, prompt template, fallback-on-error semantics.
  • Extend it from "extract release notes from HTML" to "summarize release notes for agnix schema impact".

Success criteria

  • After an upstream release, the opened issue body lists only validator-relevant items + rule candidates.
  • Human triage drops from "scan full changelog" to "accept or reject LLM triage".
  • Zero regressions - if LLM fails, behavior is identical to today.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions