Skip to content

copilot-sdk executor does not inject SKILL.md as system context #190

Description

@spboyer

Problem

When running evals with executor: copilot-sdk, the model responds as a generic assistant — it has no awareness of the skill's SKILL.md content. The skill files (SKILL.md + references/) are not injected into the model's system prompt or context.

Evidence

Running waza run aspire-orchestration --model gpt-4.1 --verbose against an Aspire skill that teaches the model to use aspire start instead of dotnet run:

[PROMPT] Start my Aspire app
[RESPONSE] I couldn't find an obvious Aspire app entrypoint... 
Please let me know the name of your main project file...
[PROMPT] How do I restart just the API service?
[RESPONSE] To restart just the API service, you typically use systemctl...

The model should be responding with aspire start and aspire resource <name> restart — the exact guidance in the SKILL.md. Instead, it gives generic answers because it never sees the skill content.

Expected Behavior

The executor should inject the skill's SKILL.md (and optionally referenced files from references/) into the model's system prompt or as context, so the model's responses are guided by the skill's rules and decision tables.

The verbose output shows Skill Directories are detected:

Skill Directories:
  - /path/to/skills/aspire-orchestration

But the content from those directories is not making it into the LLM call.

Reproduction

# 1. Any skill repo with SKILL.md
waza run <skill-name> --model gpt-4.1 --verbose --task <any-task>

# 2. Observe: responses are generic, not skill-aware
# 3. The SKILL.md content is never shown in the prompt

Environment

  • waza v0.23.0
  • macOS (darwin/arm64)
  • executor: copilot-sdk
  • model: gpt-4.1

Workaround

None currently. The mock executor doesn't have this issue because it doesn't call an LLM, but it can't validate real model behavior.

Impact

This blocks eval-driven skill development — the core waza workflow. Without skill injection, evals only measure the base model's knowledge, not the skill's effectiveness at improving the model's responses.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions