Skip to content

fix: discover skills under .github/skills/ directory#60

Closed
spboyer wants to merge 1 commit into
mainfrom
squad/52-github-skills-discovery
Closed

fix: discover skills under .github/skills/ directory#60
spboyer wants to merge 1 commit into
mainfrom
squad/52-github-skills-discovery

Conversation

@spboyer

@spboyer spboyer commented Mar 4, 2026

Copy link
Copy Markdown
Member

Closes #52

Summary

This PR enables waza to discover skills in the .github/skills/ directory in addition to the configured skills/ directory.

Changes

Phase 1: internal/workspace/workspace.go

Modified DetectContext() to search both the configured skillsDir (default skills/) and .github/skills/:

  • After checking the configured skills directory, also checks .github/skills/ if it exists
  • Skills from both locations are merged with deduplication
  • If the same skill name exists in both locations, the one from the configured skillsDir takes precedence
  • Added mergeSkills() helper function for deduplication logic

Phase 2: internal/discovery/discovery.go

Modified Discover() to exempt .github from the hidden-directory skip:

  • Changed the hidden directory check to skip all directories starting with . except .github
  • This allows Discover() to find skills under .github/skills/

Testing

Added comprehensive test coverage:

internal/workspace/workspace_test.go:

  • TestDetectContext_GitHubSkillsDir — skills in .github/skills/ are auto-discovered
  • TestDetectContext_BothSkillsDirs — skills in both skills/ and .github/skills/ are merged
  • TestDetectContext_GitHubSkillsDirDedup — same skill name in both dirs, configured skillsDir wins

internal/discovery/discovery_test.go:

  • TestDiscoverGitHubSkillsDirDiscover() finds skills under .github/skills/
  • TestDiscoverOtherHiddenDirsStillSkipped.hidden/ directories other than .github remain skipped

All tests pass ✅

Verification

go test ./internal/workspace/ ./internal/discovery/ -v  # All pass
go build ./...  # Successful compilation

Co-authored-by: richardpark-msft <51494936+richardpark-msft@users.noreply.github.com>
Copilot AI review requested due to automatic review settings March 4, 2026 22:26
@github-actions github-actions Bot enabled auto-merge (squash) March 4, 2026 22:26

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the prompt grader to use the internal execution.CopilotEngine abstraction (instead of directly using the Copilot SDK client), and extends ExecutionRequest to pass caller-provided tools through to Copilot sessions.

Changes:

  • Refactor internal/graders/prompt_grader.go (and tests) to execute prompts via execution.CopilotEngine, including tool-based grading.
  • Add Tools []copilot.Tool to internal/execution.ExecutionRequest and plumb it through session creation/resume in CopilotEngine.Execute().
  • Introduce a default prompt-grader timeout constant and update logging/output handling accordingly.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
internal/graders/prompt_grader_test.go Updates prompt-grader tests to use execution.CopilotEngine and record session IDs from ExecutionResponse.
internal/graders/prompt_grader.go Refactors prompt grading/pairwise judging to use execution.CopilotEngine and pass tools via ExecutionRequest.
internal/execution/engine.go Extends ExecutionRequest with a Tools field.
internal/execution/copilot.go Passes ExecutionRequest.Tools into Copilot SDK session configs (create/resume).
Comments suppressed due to low confidence (1)

internal/graders/prompt_grader.go:361

  • runPairwiseOnce uses a CopilotEngine without calling Initialize() first. With AutoStart disabled on the client, this can fail at runtime; initialize the engine before calling Execute().
	engine := execution.NewCopilotEngineBuilder("", nil).Build()
	defer func() {
		shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
		defer cancel()
		if err := engine.Shutdown(shutdownCtx); err != nil {
			slog.ErrorContext(ctx, "error shutting down engine for pairwise grader", "error", err)
		}
	}()

	prompt := buildPairwisePrompt(p.args.Prompt, outputA, outputB, labelA, labelB)

	resp, err := engine.Execute(ctx, &execution.ExecutionRequest{
		ModelID: p.args.Model,
		Message: prompt,
		Tools:   tools,
		Timeout: promptGraderDefaultTimeout,
	})

Comment on lines 167 to 187
@@ -184,24 +175,15 @@ func TestUsingPreviousSessionID(t *testing.T) {

randomString = hex.EncodeToString(numBytes[:])

resp, err := session.SendAndWait(context.Background(), copilot.MessageOptions{
Prompt: "Remember this random string: " + randomString,
resp, err := engine.Execute(context.Background(), &execution.ExecutionRequest{
Message: "Remember this random string: " + randomString,
Timeout: 120 * time.Second,
})
require.NoError(t, err)

t.Logf("Content: %s", *resp.Data.Content)
t.Logf("Content: %s", resp.FinalOutput)

resp, err = session.SendAndWait(context.Background(), copilot.MessageOptions{
Prompt: "what was the random string?",
})
require.NoError(t, err)

if resp.Data.Content != nil {
t.Logf("Content: %s", *resp.Data.Content)
}

err = client.Stop()
require.NoError(t, err)
sessionID = resp.SessionID
}

Copilot AI Mar 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses CopilotEngine.Execute() without calling engine.Initialize(), which can fail because AutoStart is disabled in the engine builder. Also, leaving the engine running to preserve the session leaks the copilot process and temp workspaces; consider adding a way to stop the client without deleting the session, and use it here.

Copilot uses AI. Check for mistakes.
Comment on lines 46 to +51
// PermissionHandler called when the copilot SDK wants to determine if a tool can be used.
// Default: allows all tools.
PermissionHandler copilot.PermissionHandlerFunc

// Tools is a list of caller-implemented tools exposed to the agent during the session.
Tools []copilot.Tool

Copilot AI Mar 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description/title indicate changes to skill discovery under .github/skills/, but the changes in this PR add tool support to ExecutionRequest / CopilotEngine and refactor prompt grader usage. If the intent is to fix .github/skills/ discovery (issue #52), the workspace/discovery changes and tests described in the PR summary appear to be missing.

Copilot uses AI. Check for mistakes.
LogLevel: "error",
})
if p.args.ContinueSession && gradingContext.SessionID == "" {
return nil, errors.New("no session id set, can't continue session in prmopt grading")

Copilot AI Mar 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in error message: "prmopt" should be "prompt" (this string is asserted in tests too).

Suggested change
return nil, errors.New("no session id set, can't continue session in prmopt grading")
return nil, errors.New("no session id set, can't continue session in prompt grading")

Copilot uses AI. Check for mistakes.
Comment on lines +77 to 99
engine := execution.NewCopilotEngineBuilder("", nil).Build()
defer func() {
if err := client.Stop(); err != nil {
slog.ErrorContext(ctx, "error stopping client for prompt grader")
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := engine.Shutdown(shutdownCtx); err != nil {
slog.ErrorContext(ctx, "error shutting down engine for prompt grader", "error", err)
}
}()

var session *copilot.Session
var err error
wazaTools := newWazaGraderTools()
req := &execution.ExecutionRequest{
ModelID: p.args.Model,
Message: p.args.Prompt,
Tools: wazaTools.Tools,
Timeout: promptGraderDefaultTimeout,
}

if p.args.ContinueSession {
if gradingContext.SessionID == "" {
return nil, errors.New("no session id set, can't continue session in prmopt grading")
}

// resume the previous session, but use a different model for the judge.
session, err = client.ResumeSessionWithOptions(ctx,
gradingContext.SessionID,
&copilot.ResumeSessionConfig{
OnPermissionRequest: copilot.PermissionHandler.ApproveAll,
Model: p.args.Model,
Streaming: true,
Tools: wazaTools.Tools,
})
} else {
session, err = client.CreateSession(ctx, &copilot.SessionConfig{
OnPermissionRequest: copilot.PermissionHandler.ApproveAll,
Model: p.args.Model,
Streaming: true,
Tools: wazaTools.Tools,
})
req.SessionID = gradingContext.SessionID
}

resp, err := engine.Execute(ctx, req)
if err != nil {
return nil, fmt.Errorf("failed to start up copilot session for prompt grading: %w", err)

Copilot AI Mar 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CopilotEngineBuilder configures the underlying Copilot client with AutoStart disabled, so calling Execute() without first calling engine.Initialize() will fail to start the copilot process. Please initialize the engine (ideally with a bounded timeout) before engine.Execute().

This issue also appears on line 345 of the same file.

Copilot uses AI. Check for mistakes.
Comment on lines +77 to 84
engine := execution.NewCopilotEngineBuilder("", nil).Build()
defer func() {
if err := client.Stop(); err != nil {
slog.ErrorContext(ctx, "error stopping client for prompt grader")
shutdownCtx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := engine.Shutdown(shutdownCtx); err != nil {
slog.ErrorContext(ctx, "error shutting down engine for prompt grader", "error", err)
}
}()

Copilot AI Mar 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gradeIndependent always defers engine.Shutdown(). Shutdown() deletes sessions via client.DeleteSession, which will invalidate gradingContext.SessionID when ContinueSession is true and can break subsequent graders that also need to continue the same session within the same benchmark run. Consider avoiding session deletion for continued sessions (e.g., reuse the runner's engine, add a non-deleting stop/close path, or revert to stopping the SDK client without DeleteSession).

Copilot uses AI. Check for mistakes.
})
if err != nil {
return nil, fmt.Errorf("failed to send pairwise prompt: %w", err)
return nil, fmt.Errorf("failed to create session for pairwise grading: %w", err)

Copilot AI Mar 4, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message on engine.Execute() failure says "failed to create session for pairwise grading", but at this point the failure could be from initialization/startup or from sending the message. Consider adjusting the message to reflect the operation being performed (executing the pairwise prompt).

Suggested change
return nil, fmt.Errorf("failed to create session for pairwise grading: %w", err)
return nil, fmt.Errorf("failed to execute pairwise grading prompt: %w", err)

Copilot uses AI. Check for mistakes.
@spboyer

spboyer commented Mar 4, 2026

Copy link
Copy Markdown
Member Author

Closing — the diff contains CopilotEngine migration code (PR #55) instead of the .github/skills discovery fix. Branch state got crossed during concurrent agent execution. Will re-create with correct changes.

@spboyer spboyer closed this Mar 4, 2026
auto-merge was automatically disabled March 4, 2026 23:04

Pull request was closed

@spboyer spboyer deleted the squad/52-github-skills-discovery branch March 4, 2026 23:05
chlowell pushed a commit to chlowell/waza that referenced this pull request Mar 5, 2026
Add microsoft/skills CI integration workflow and docs. Supports both flat and plugin bundle skill layouts. Closes microsoft#60.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Waza doesn't understand skills under the .github directory

3 participants