Skip to content

Improve Waza with real agentic/product capabilities #322

Description

@spboyer

Summary

We evaluated microsoft/waza against the Agentic Repository Rubric and identified a few real product improvements that would strengthen the framework itself. This issue is not about gaming the leaderboard or adding cosmetic automation; it is about shipping capabilities that make Waza more useful for users.

Goal

Improve Waza in ways that are directly valuable to the product and its users, while also increasing the maturity of the repository’s own agentic workflows as a byproduct.

Proposed work

  • Replace the squad-ci.yml, squad-preview.yml, squad-release.yml, and squad-insider-release.yml stubs with real build, test, validation, and release steps.
  • Add a genuine failure-handling path for evaluation runs (for example: capture failing artifacts, surface a concise triage summary, and open a follow-up issue or safe remediation PR when appropriate).
  • Add a recurring improvement loop that turns telemetry or evaluation output into actionable regression tasks or benchmark updates.
  • Expand run-time observability so eval results, agent activity, and validation output are easier to inspect and compare over time.
  • Increase safe, measurable agent-assisted throughput only where human review and branch protections are still in place.

Non-goals

  • Do not add fake automation just to raise the rubric score.
  • Do not weaken human review or branch protections to make the score look better.
  • Do not change the rubric or leaderboard to hide gaps.

Notes

If these items turn out to be independent enough, this issue can be split into child issues later. The expectation is that each item should result in a real product or platform improvement, not just a scoring artifact.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions