Problem
We have no visibility into how GitHub Copilot CLI interacts with azd. There is no coverage for measuring LLM interactions, command discoverability, or human usability patterns.
Proposal
Add a comprehensive evaluation and testing framework at cli/azd/test/eval/ covering:
- LLM eval (how well an AI agent uses azd)
- Non-LLM unit tests (how well azd surfaces information for human and AI consumption)
See PR #7202 for implementation.
Problem
We have no visibility into how GitHub Copilot CLI interacts with
azd. There is no coverage for measuring LLM interactions, command discoverability, or human usability patterns.Proposal
Add a comprehensive evaluation and testing framework at
cli/azd/test/eval/covering:See PR #7202 for implementation.