fix(github_copilot_provider): better session management#204
fix(github_copilot_provider): better session management#204Lixeer wants to merge 0 commit intosipeed:mainfrom
Conversation
|
Due to the recent frequency of pull requests, it will still take some time to merge this feature. In the meantime, you can temporarily use the code from my fork, or leave your email address and I will send you a binary file. |
Leeaandrob
left a comment
There was a problem hiding this comment.
PR #204 Deep Review — @Lixeer
Thanks for this PR, @Lixeer. The original code had a critical bug: defer client.Stop() in the constructor immediately closed the gRPC connection when NewGitHubCopilotProvider() returned, making every subsequent Chat() call fail on a dead connection. Your fix correctly stores the client in the struct and keeps it alive.
Verification
- Checked out branch locally:
Lixeer/main go vet ./pkg/providers/...— cleango test ./pkg/providers/... -race— all 49 tests passing, no racesgo build ./pkg/providers/...— clean- Confirmed
LLMProviderinterface (types.go:38-41) requires onlyChat()andGetDefaultModel()— noClose()method - Confirmed no provider lifecycle/cleanup exists anywhere in the codebase (main.go, agent/loop.go)
- Traced
CreateProvider()call at http_provider.go:324 — returnsLLMProviderinterface,Close()inaccessible via interface
Summary of Findings
HIGH (Should Fix)
- H1:
Close()is unreachable — not inLLMProviderinterface, never called - H2:
Close()has data race withChat()— no mutex protection
MEDIUM
- M1:
json.Marshalerror silently ignored inChat() - M2: Zero test coverage for this provider
POSITIVE
- Correctly fixes the critical
defer client.Stop()bug that made the provider non-functional - Proper error wrapping with
%wfor debugging and error chains - Nil-safe content extraction:
if resp != nil && resp.Data.Content != nil - Default case in switch properly rejects unknown connect modes
- Explicit error return for unimplemented stdio mode (was silently no-op)
sync.MutexonChat()is correct — Copilot session is not goroutine-safeSend()→SendAndWait()upgrade gives proper request-response semanticsCreateSessionfailure now cleans up the client before returning error
Verdict: REQUEST CHANGES
What needs to change before merge:
- Make
Close()reachable — either add it toLLMProviderinterface or implementio.Closerand add a type assertion + close call in the shutdown path (see H1) - Protect
Close()with the mutex to avoid data race with concurrentChat()(see H2) - Handle
json.Marshalerror inChat()(see M1)
Estimated effort: ~1-2 hours. Happy to re-review.
| p.client.Stop() | ||
| p.client = nil | ||
| p.session = nil | ||
| } |
There was a problem hiding this comment.
[H1 — HIGH: Close() Is Unreachable Dead Code] @Lixeer — The Close() method is well-implemented, but it can never be called:
LLMProviderinterface (types.go:38-41) only requiresChat()andGetDefaultModel()— noClose()CreateProvider()at http_provider.go:324 returnsLLMProvider— the concrete*GitHubCopilotProvidertype is erased- No code in the codebase calls
Close()on any provider — there is no provider shutdown lifecycle - Result: the gRPC client leaks on every program exit
Suggested fix (pick one):
Option A — Extend the interface (cleanest, forward-looking):
// types.go
type LLMProvider interface {
Chat(ctx context.Context, messages []Message, tools []ToolDefinition, model string, options map[string]interface{}) (*LLMResponse, error)
GetDefaultModel() string
}
type CloseableProvider interface {
LLMProvider
Close()
}Then in the shutdown path (main.go gateway/agent shutdown):
if cp, ok := provider.(CloseableProvider); ok {
cp.Close()
}Option B — Implement io.Closer (idiomatic Go):
func (p *GitHubCopilotProvider) Close() error {
// ... (return nil)
}Then type-assert to io.Closer in shutdown.
| if p.client != nil { | ||
| p.client.Stop() | ||
| p.client = nil | ||
| p.session = nil |
There was a problem hiding this comment.
[H2 — HIGH: Data Race in Close()] @Lixeer — Close() reads and writes p.client and p.session without holding p.mu, but Chat() (line 82-83) holds p.mu and accesses p.session. This is a data race:
Goroutine A (Chat): p.mu.Lock() → uses p.session → p.mu.Unlock()
Goroutine B (Close): p.client.Stop() → p.client = nil → p.session = nil ← NO LOCK
If Close() runs concurrently with Chat(), it can nil out p.session while Chat() is using it, causing a nil pointer dereference panic.
Suggested fix:
func (p *GitHubCopilotProvider) Close() {
p.mu.Lock()
defer p.mu.Unlock()
if p.client != nil {
p.client.Stop()
p.client = nil
p.session = nil
}
}| @@ -64,18 +79,26 @@ func (p *GitHubCopilotProvider) Chat(ctx context.Context, messages []Message, to | |||
| } | |||
|
|
|||
| fullcontent, _ := json.Marshal(out) | |||
There was a problem hiding this comment.
[M1 — MEDIUM: json.Marshal Error Silently Ignored] @Lixeer — fullcontent, _ := json.Marshal(out) discards the error. While tempMessage with string fields is unlikely to fail marshaling, defensive coding dictates handling it:
fullcontent, err := json.Marshal(out)
if err != nil {
return nil, fmt.Errorf("marshal messages: %w", err)
}This is especially important since the marshaled content is sent directly as a prompt — a silent nil/empty byte slice would produce a confusing empty response from Copilot with no indication of what went wrong.
| }) | ||
| if err := client.Start(context.Background()); err != nil { | ||
| return nil, fmt.Errorf("Can't connect to Github Copilot, https://github.com/github/copilot-sdk/blob/main/docs/getting-started.md#connecting-to-an-external-cli-server for details") | ||
| return nil, fmt.Errorf("can't connect to Github Copilot: %w; see docs for details", err) |
There was a problem hiding this comment.
[Positive] @Lixeer — This is the key fix of the PR. The original code had defer client.Stop() here, which killed the gRPC connection immediately when the constructor returned. Every subsequent Chat() call was operating on a dead connection. By removing the defer and storing the client in the struct, the connection now lives for the lifetime of the provider. Well identified and correctly fixed.
|
@Lixeer please re submit the PR using the PR template |
|
soon |
80d8dc2 to
0919bbe
Compare
|
@Zepan @Leeaandrob thank for your deep code review I fix the code and re sumbit it , review and merge the new feature to repo please. |
📝 Description🗣️ Type of Change
🤖 AI Code Generation
🔗 Linked Issue📚 Technical Context (Skip for Docs)
🧪 Test Environment & Hardware
📸 Proof of Work (Optional for Docs)Click to view Logs/Screenshots✅ Runtime testing passed: Executed existing test suite, all tests passed⏳ Unit tests: Not yet written (planned for future updates) ☑️ Checklist
|
nikolasdehor
left a comment
There was a problem hiding this comment.
Critical fix. The original defer client.Stop() in the constructor was killing the gRPC connection immediately after initialization — the Copilot provider was completely non-functional. This PR correctly persists the client, adds proper lifecycle management via CloseableProvider, thread safety via mutex, and fixes all silently-ignored errors. The SendAndWait switch is the correct API. LGTM.
|
Is there a chance that this gets merged soon? |
|
Due to the refactoring in the new version, my pull request has conflicts with the main branch and is not fully compatible. I will submit a new pull request. Please wait patiently. |
No description provided.