Skip to content

fix(github_copilot_provider): better session management#204

Closed
Lixeer wants to merge 0 commit intosipeed:mainfrom
Lixeer:main
Closed

fix(github_copilot_provider): better session management#204
Lixeer wants to merge 0 commit intosipeed:mainfrom
Lixeer:main

Conversation

@Lixeer
Copy link
Contributor

@Lixeer Lixeer commented Feb 15, 2026

No description provided.

@Leeaandrob
Copy link
Collaborator

@Zepan Improves session management for the GitHub Copilot provider. Since the Copilot provider was recently merged (#178), follow-up fixes are expected.

Recommendation: Merge. +43/-20, targeted improvement to a recently-added provider.

@Lixeer
Copy link
Contributor Author

Lixeer commented Feb 16, 2026

Due to the recent frequency of pull requests, it will still take some time to merge this feature. In the meantime, you can temporarily use the code from my fork, or leave your email address and I will send you a binary file.

Copy link
Collaborator

@Leeaandrob Leeaandrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #204 Deep Review — @Lixeer

Thanks for this PR, @Lixeer. The original code had a critical bug: defer client.Stop() in the constructor immediately closed the gRPC connection when NewGitHubCopilotProvider() returned, making every subsequent Chat() call fail on a dead connection. Your fix correctly stores the client in the struct and keeps it alive.


Verification

  • Checked out branch locally: Lixeer/main
  • go vet ./pkg/providers/... — clean
  • go test ./pkg/providers/... -race — all 49 tests passing, no races
  • go build ./pkg/providers/... — clean
  • Confirmed LLMProvider interface (types.go:38-41) requires only Chat() and GetDefaultModel() — no Close() method
  • Confirmed no provider lifecycle/cleanup exists anywhere in the codebase (main.go, agent/loop.go)
  • Traced CreateProvider() call at http_provider.go:324 — returns LLMProvider interface, Close() inaccessible via interface

Summary of Findings

HIGH (Should Fix)

  • H1: Close() is unreachable — not in LLMProvider interface, never called
  • H2: Close() has data race with Chat() — no mutex protection

MEDIUM

  • M1: json.Marshal error silently ignored in Chat()
  • M2: Zero test coverage for this provider

POSITIVE

  1. Correctly fixes the critical defer client.Stop() bug that made the provider non-functional
  2. Proper error wrapping with %w for debugging and error chains
  3. Nil-safe content extraction: if resp != nil && resp.Data.Content != nil
  4. Default case in switch properly rejects unknown connect modes
  5. Explicit error return for unimplemented stdio mode (was silently no-op)
  6. sync.Mutex on Chat() is correct — Copilot session is not goroutine-safe
  7. Send()SendAndWait() upgrade gives proper request-response semantics
  8. CreateSession failure now cleans up the client before returning error

Verdict: REQUEST CHANGES

What needs to change before merge:

  • Make Close() reachable — either add it to LLMProvider interface or implement io.Closer and add a type assertion + close call in the shutdown path (see H1)
  • Protect Close() with the mutex to avoid data race with concurrent Chat() (see H2)
  • Handle json.Marshal error in Chat() (see M1)

Estimated effort: ~1-2 hours. Happy to re-review.

p.client.Stop()
p.client = nil
p.session = nil
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[H1 — HIGH: Close() Is Unreachable Dead Code] @Lixeer — The Close() method is well-implemented, but it can never be called:

  1. LLMProvider interface (types.go:38-41) only requires Chat() and GetDefaultModel() — no Close()
  2. CreateProvider() at http_provider.go:324 returns LLMProvider — the concrete *GitHubCopilotProvider type is erased
  3. No code in the codebase calls Close() on any provider — there is no provider shutdown lifecycle
  4. Result: the gRPC client leaks on every program exit

Suggested fix (pick one):

Option A — Extend the interface (cleanest, forward-looking):

// types.go
type LLMProvider interface {
    Chat(ctx context.Context, messages []Message, tools []ToolDefinition, model string, options map[string]interface{}) (*LLMResponse, error)
    GetDefaultModel() string
}

type CloseableProvider interface {
    LLMProvider
    Close()
}

Then in the shutdown path (main.go gateway/agent shutdown):

if cp, ok := provider.(CloseableProvider); ok {
    cp.Close()
}

Option B — Implement io.Closer (idiomatic Go):

func (p *GitHubCopilotProvider) Close() error {
    // ... (return nil)
}

Then type-assert to io.Closer in shutdown.

if p.client != nil {
p.client.Stop()
p.client = nil
p.session = nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[H2 — HIGH: Data Race in Close()] @LixeerClose() reads and writes p.client and p.session without holding p.mu, but Chat() (line 82-83) holds p.mu and accesses p.session. This is a data race:

Goroutine A (Chat):      p.mu.Lock() → uses p.session → p.mu.Unlock()
Goroutine B (Close):     p.client.Stop() → p.client = nil → p.session = nil  ← NO LOCK

If Close() runs concurrently with Chat(), it can nil out p.session while Chat() is using it, causing a nil pointer dereference panic.

Suggested fix:

func (p *GitHubCopilotProvider) Close() {
    p.mu.Lock()
    defer p.mu.Unlock()
    if p.client != nil {
        p.client.Stop()
        p.client = nil
        p.session = nil
    }
}

@@ -64,18 +79,26 @@ func (p *GitHubCopilotProvider) Chat(ctx context.Context, messages []Message, to
}

fullcontent, _ := json.Marshal(out)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[M1 — MEDIUM: json.Marshal Error Silently Ignored] @Lixeerfullcontent, _ := json.Marshal(out) discards the error. While tempMessage with string fields is unlikely to fail marshaling, defensive coding dictates handling it:

fullcontent, err := json.Marshal(out)
if err != nil {
    return nil, fmt.Errorf("marshal messages: %w", err)
}

This is especially important since the marshaled content is sent directly as a prompt — a silent nil/empty byte slice would produce a confusing empty response from Copilot with no indication of what went wrong.

})
if err := client.Start(context.Background()); err != nil {
return nil, fmt.Errorf("Can't connect to Github Copilot, https://github.com/github/copilot-sdk/blob/main/docs/getting-started.md#connecting-to-an-external-cli-server for details")
return nil, fmt.Errorf("can't connect to Github Copilot: %w; see docs for details", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Positive] @Lixeer — This is the key fix of the PR. The original code had defer client.Stop() here, which killed the gRPC connection immediately when the constructor returned. Every subsequent Chat() call was operating on a dead connection. By removing the defer and storing the client in the struct, the connection now lives for the lifetime of the provider. Well identified and correctly fixed.

@Leeaandrob
Copy link
Collaborator

@Lixeer please re submit the PR using the PR template

@Lixeer
Copy link
Contributor Author

Lixeer commented Feb 17, 2026

soon

@Lixeer Lixeer force-pushed the main branch 3 times, most recently from 80d8dc2 to 0919bbe Compare February 18, 2026 03:38
@Lixeer
Copy link
Contributor Author

Lixeer commented Feb 18, 2026

@Zepan @Leeaandrob thank for your deep code review I fix the code and re sumbit it , review and merge the new feature to repo please.

@Lixeer
Copy link
Contributor Author

Lixeer commented Feb 18, 2026

📝 Description

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Linked Issue

#98

📚 Technical Context (Skip for Docs)

🧪 Test Environment & Hardware

  • Hardware: AMD/Intel PC platform
  • OS: Window
  • Model/Provider: OpenAI GPT-4o
  • Channels: [e.g. Discord, Telegram, Feishu, ...]

📸 Proof of Work (Optional for Docs)

Click to view Logs/Screenshots ✅ Runtime testing passed: Executed existing test suite, all tests passed

⏳ Unit tests: Not yet written (planned for future updates)

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical fix. The original defer client.Stop() in the constructor was killing the gRPC connection immediately after initialization — the Copilot provider was completely non-functional. This PR correctly persists the client, adds proper lifecycle management via CloseableProvider, thread safety via mutex, and fixes all silently-ignored errors. The SendAndWait switch is the correct API. LGTM.

@ilteoood
Copy link

Is there a chance that this gets merged soon?
As is, at the moment the github copilot provider is unusable due to the defer client.stop() which closes immediately the connection with the cli.

@Lixeer
Copy link
Contributor Author

Lixeer commented Feb 22, 2026

Due to the refactoring in the new version, my pull request has conflicts with the main branch and is not fully compatible. I will submit a new pull request. Please wait patiently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants