v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)#12
v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)#12
Conversation
- Add Gemini embedder using gemini-embedding-001 (768 dimensions) - Add Gemini LLM client using gemini-2.0-flash-lite for triage - Implement prompt templates for issue analysis and response generation - Add unit tests for prompt building and response parsing - Update go.mod with Google Generative AI SDK dependencies All tests pass.
- Add Qdrant client implementing VectorStore interface - Add operations: CreateCollection, Upsert, Search, Delete - Add unit tests for payload converters - Update go.mod with Qdrant Go SDK dependencies (go mod tidy) All tests pass.
- Update Dependencies struct to hold Embedder, LLMClient, and VectorStore - Update step constructors to accept Dependencies via dependency injection - Implement logic in VectorDBPrep, Indexer, SimilaritySearch, Triage, and ResponseBuilder using real clients - Add deterministic UUID generation for Qdrant points - Add for centralized step registration - Fix import cycles between integrations and pipeline packages
- Create `internal/integrations/github` package - Implement authentication and client wrapper for google/go-github - Add GitHub Client to pipeline Dependencies - Update ActionExecutor to use GitHub client for commenting and labeling - Resolves #4
There was a problem hiding this comment.
Pull request overview
This pull request implements v0.0.2v, which integrates foundational AI and database services into the Simili-Bot modular pipeline architecture established in v0.0.1v. The PR adds support for Gemini AI (embeddings and LLM), Qdrant vector store, and GitHub API client, wiring these integrations into existing pipeline steps.
Changes:
- Integrated Gemini AI for embeddings (gemini-embedding-001) and LLM-based triage (gemini-2.0-flash-lite)
- Implemented Qdrant vector store client for semantic similarity search
- Created GitHub API client for issue operations (comments, labels)
- Refactored all pipeline steps to use dependency injection
- Added file headers to all Go source files
- Updated configuration to support embedding dimensions
Reviewed changes
Copilot reviewed 30 out of 32 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/integrations/gemini/embedder.go | Implements Gemini embedding client with 768-dimensional vectors |
| internal/integrations/gemini/llm.go | LLM client for issue triage analysis |
| internal/integrations/gemini/prompts.go | Prompt templates for triage and response generation |
| internal/integrations/gemini/gemini_test.go | Unit tests for Gemini integration |
| internal/integrations/qdrant/client.go | Qdrant vector store client implementation |
| internal/integrations/qdrant/types.go | Type definitions for Qdrant operations |
| internal/integrations/qdrant/client_test.go | Unit tests for Qdrant client |
| internal/integrations/github/client.go | GitHub API client for issue operations |
| internal/integrations/github/auth.go | GitHub authentication setup |
| internal/steps/*.go | Updated all pipeline steps to accept Dependencies parameter |
| internal/steps/register.go | New step registration system with dependency injection |
| internal/core/pipeline/registry.go | Added Dependencies struct for client injection |
| internal/core/pipeline/pipeline.go | Added SuggestedLabels field to Result struct |
| internal/core/config/config.go | Added Dimensions field to EmbeddingConfig |
| go.mod | Updated dependencies and Go version |
| DOCS/0.0.2v/*.md | Added planning and documentation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| resNumberFloat, ok := res.Payload["number"].(float64) | ||
| if ok && int(resNumberFloat) == ctx.Issue.Number && res.Payload["repo"] == ctx.Issue.Repo { | ||
| continue | ||
| } | ||
|
|
||
| issue := pipeline.SimilarIssue{ | ||
| Number: int(resNumberFloat), | ||
| Title: fmt.Sprintf("%v", res.Payload["title"]), | ||
| URL: fmt.Sprintf("%v", res.Payload["url"]), | ||
| State: fmt.Sprintf("%v", res.Payload["state"]), | ||
| Similarity: float64(res.Score), | ||
| } | ||
| foundIssues = append(foundIssues, issue) |
There was a problem hiding this comment.
In the similarity search, there's unsafe type conversion without proper error handling. The code assumes res.Payload["number"] is a float64 and casts it to int without checking if the conversion succeeds. If the payload contains a different type or the key is missing, this will cause a runtime panic or incorrect behavior. Consider adding proper type assertion checks with ok pattern for all payload field accesses.
| res, err := em.EmbedContentWithTitle(ctx, "", parts...) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to generate batch embeddings: %w", err) | ||
| } | ||
|
|
||
| if res.Embedding == nil || len(res.Embedding.Values) == 0 { | ||
| return nil, fmt.Errorf("empty embedding returned") | ||
| } | ||
|
|
||
| // Note: Gemini returns a single embedding for batch requests | ||
| // For individual embeddings, we need to call Embed for each text | ||
| embeddings := make([][]float32, len(texts)) | ||
| for i, text := range texts { | ||
| embedding, err := e.Embed(ctx, text) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("failed to embed text %d: %w", i, err) | ||
| } | ||
| embeddings[i] = embedding | ||
| } | ||
|
|
||
| return embeddings, nil |
There was a problem hiding this comment.
The EmbedBatch function has inefficient implementation. Lines 75-82 call EmbedContentWithTitle but then ignore the result. Lines 86-93 make individual Embed calls for each text, which defeats the purpose of batch processing. Either remove the unused EmbedContentWithTitle call (lines 75-82) or properly implement batch embedding if the API supports it. The current implementation wastes an API call and doesn't provide any performance benefit over calling Embed in a loop.
| func (c *Client) TransferIssue(ctx context.Context, org, repo string, number int, targetRepo string) error { | ||
| // targetRepo is expected to be "owner/name" or just "name" (implies same owner). | ||
| // We need to parse it. (TODO: Add parser helper or assume "owner/name" format) | ||
|
|
||
| // Check if target is "owner/name" | ||
| // For simplicity, let's assume valid input for now. | ||
|
|
||
| // Actually go-github Issues.Transfer takes (ctx, owner, repo, number, input). | ||
| // Input has NewOwner and NewName. | ||
|
|
||
| // We'll need to implement parsing logic later or assume strict input. | ||
| // Leaving unimplemented for now or just simple logic. | ||
| return fmt.Errorf("transfer not implemented yet") | ||
| } |
There was a problem hiding this comment.
The TransferIssue function is not implemented and simply returns an error. However, the ActionExecutor step in action_executor.go (lines 69-73) attempts to handle transfer logic. This could lead to confusion where transfers are silently logged as "scheduled" but never actually execute. Consider either fully implementing the TransferIssue method or updating the ActionExecutor to check if the method is implemented before attempting to use it.
| ID string `json:"id"` | ||
| Vector []float32 `json:"vector"` | ||
| Payload map[string]interface{} `json:"payload"` | ||
| MetaData map[string]interface{} `json:"metadata"` |
There was a problem hiding this comment.
The Point struct has both Payload and MetaData fields (line 13-14), but in the codebase, only Payload is actually used. The MetaData field is never populated or accessed in client.go. This could cause confusion. Consider removing the unused MetaData field or documenting the intended distinction between Payload and MetaData.
| ID string `json:"id"` | |
| Vector []float32 `json:"vector"` | |
| Payload map[string]interface{} `json:"payload"` | |
| MetaData map[string]interface{} `json:"metadata"` | |
| ID string `json:"id"` | |
| Vector []float32 `json:"vector"` | |
| Payload map[string]interface{} `json:"payload"` |
| func contains(s, substr string) bool { | ||
| return len(s) > 0 && len(substr) > 0 && (s == substr || len(s) >= len(substr) && (s[:len(substr)] == substr || s[len(s)-len(substr):] == substr || containsMiddle(s, substr))) | ||
| } | ||
|
|
||
| func containsMiddle(s, substr string) bool { | ||
| for i := 0; i <= len(s)-len(substr); i++ { | ||
| if s[i:i+len(substr)] == substr { | ||
| return true | ||
| } | ||
| } | ||
| return false | ||
| } |
There was a problem hiding this comment.
The contains helper function has overly complex logic that could be simplified. The function at line 156 attempts to check if substr is at the beginning, middle, or end of s, but this is unnecessarily verbose. Go's standard library provides strings.Contains which does exactly this. Replace this custom implementation with strings.Contains(s, substr) for better readability and maintainability.
| // CreateComment posts a comment on an issue. | ||
| func (c *Client) CreateComment(ctx context.Context, org, repo string, number int, body string) error { | ||
| comment := &github.IssueComment{ | ||
| Body: github.String(body), | ||
| } | ||
| _, _, err := c.client.Issues.CreateComment(ctx, org, repo, number, comment) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to create comment: %w", err) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| // AddLabels adds labels to an issue. | ||
| func (c *Client) AddLabels(ctx context.Context, org, repo string, number int, labels []string) error { | ||
| _, _, err := c.client.Issues.AddLabelsToIssue(ctx, org, repo, number, labels) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to add labels: %w", err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
The GitHub client methods lack input validation. For example, CreateComment doesn't validate that the body parameter is not empty, and AddLabels doesn't check if the labels slice is empty. While the GitHub API may handle these cases, adding basic validation would provide better error messages and prevent unnecessary API calls.
| func (c *Client) CreateCollection(name string, dimension int) error { | ||
| ctx, cancel := c.ctxWithAuth() | ||
| defer cancel() | ||
|
|
||
| // Check if exists first | ||
| exists, err := c.CollectionExists(name) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| if exists { | ||
| return nil |
There was a problem hiding this comment.
The CreateCollection method calls CollectionExists which creates a new context with timeout (line 104). However, CreateCollection also has its own context with timeout (line 71). If CollectionExists takes too long and times out, the parent CreateCollection context will still be waiting, potentially leading to resource leaks. The contexts created in ctxWithAuth use context.Background() instead of inheriting from a parent context, which prevents proper cancellation propagation. Consider accepting a context parameter in these methods or ensure contexts are properly linked.
| // NewClient creates a new GitHub client using the provided token. | ||
| // If token is empty, it returns an unauthenticated client. | ||
| func NewClient(ctx context.Context, token string) *Client { | ||
| var tc *http.Client | ||
|
|
||
| if token != "" { | ||
| ts := oauth2.StaticTokenSource( | ||
| &oauth2.Token{AccessToken: token}, | ||
| ) | ||
| tc = oauth2.NewClient(ctx, ts) | ||
| } | ||
|
|
||
| client := github.NewClient(tc) | ||
|
|
||
| return &Client{ | ||
| client: client, | ||
| } | ||
| } |
There was a problem hiding this comment.
The GitHub client integration (auth.go and client.go) has no test coverage. Given that other integrations in this codebase have test files (gemini_test.go, client_test.go in qdrant), the GitHub client should also have tests to maintain consistency and ensure reliability of API interactions.
| // parseTriageResponse parses the LLM response into a TriageResult. | ||
| // This is a simple parser - in production, you might want structured output. | ||
| func parseTriageResponse(response string) *TriageResult { | ||
| result := &TriageResult{ | ||
| Quality: "good", // Default | ||
| SuggestedLabels: []string{}, | ||
| Reasoning: response, | ||
| } | ||
|
|
||
| lower := strings.ToLower(response) | ||
|
|
||
| // Parse quality | ||
| if strings.Contains(lower, "poor quality") || strings.Contains(lower, "quality: poor") { | ||
| result.Quality = "poor" | ||
| } else if strings.Contains(lower, "needs improvement") || strings.Contains(lower, "quality: needs-improvement") { | ||
| result.Quality = "needs-improvement" | ||
| } | ||
|
|
||
| // Parse labels (look for common patterns) | ||
| labels := []string{} | ||
| if strings.Contains(lower, "bug") { | ||
| labels = append(labels, "bug") | ||
| } | ||
| if strings.Contains(lower, "feature") || strings.Contains(lower, "enhancement") { | ||
| labels = append(labels, "enhancement") | ||
| } | ||
| if strings.Contains(lower, "documentation") || strings.Contains(lower, "docs") { | ||
| labels = append(labels, "documentation") | ||
| } | ||
| if strings.Contains(lower, "question") { | ||
| labels = append(labels, "question") | ||
| } | ||
| result.SuggestedLabels = labels | ||
|
|
||
| // Parse duplicate status | ||
| if strings.Contains(lower, "duplicate") || strings.Contains(lower, "similar to") { | ||
| result.IsDuplicate = true | ||
| result.DuplicateReason = "LLM detected potential duplicate" | ||
| } | ||
|
|
||
| return result | ||
| } |
There was a problem hiding this comment.
The parseTriageResponse function uses a brittle string-matching approach to parse LLM output. This is prone to errors if the LLM output format varies even slightly. Consider using structured output formats (like JSON) that the Gemini API supports, or at least add more robust parsing with regex patterns and error handling. The comment on line 129 acknowledges this is simple but production code should use structured output.
| opts := []grpc.DialOption{ | ||
| grpc.WithTransportCredentials(insecure.NewCredentials()), | ||
| } |
There was a problem hiding this comment.
The Qdrant client hardcodes insecure gRPC credentials (line 38). While the comment on line 36 mentions this is "fine for local development/testing", this is production code that could be deployed to production environments. Consider making TLS configuration conditional based on the environment or URL scheme, or at minimum add a clear warning in the function documentation that this client should not be used in production without modification.
Fixed 11 issues identified in PR #12 review: 1. Added proper type assertion checks in similarity search to prevent panics 2. Removed unused EmbedBatch API call for efficiency 3. Implemented TransferIssue validation and clear error handling 4. Removed unused MetaData field from Point struct 5. Replaced custom contains() with standard strings.Contains 6. Added Close() method to Dependencies for proper resource cleanup 7. Added input validation to GitHub client methods 8. Fixed context propagation in Qdrant client methods 9. Added test coverage for GitHub client 10. Implemented structured JSON output for LLM response parsing 11. Added conditional TLS support for Qdrant cloud connections All changes maintain backward compatibility and pass existing tests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary
This pull request implements the foundational integrations for Simili-Bot v0.0.2v, including Google Gemini, Qdrant Vector Store, and the GitHub API. It establishes the core pipeline architecture and wiring.
Changes
Core Integrations
Embedderinterface usinggemini-embedding-001andLLMClientusinggemini-2.0-flash-lite.VectorStoreclient for vector operations (Upsert, Search, Delete).google/go-githubfor issue management and commenting.Pipeline Architecture
geminipackage to eliminate import cycles.Dependenciesstruct).VectorDBPrep,Indexer,SimilaritySearch,Triage,ActionExecutor) to use injected clients.Documentation
Related Issues
Closes #2, #3, #4