Skip to content

v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)#12

Merged
Kavirubc merged 8 commits intomainfrom
core-0.0.2v-feature-implementation
Feb 2, 2026
Merged

v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)#12
Kavirubc merged 8 commits intomainfrom
core-0.0.2v-feature-implementation

Conversation

@Kavirubc
Copy link
Copy Markdown
Contributor

@Kavirubc Kavirubc commented Feb 2, 2026

Summary

This pull request implements the foundational integrations for Simili-Bot v0.0.2v, including Google Gemini, Qdrant Vector Store, and the GitHub API. It establishes the core pipeline architecture and wiring.

Changes

Core Integrations

  • Gemini: Implemented Embedder interface using gemini-embedding-001 and LLMClient using gemini-2.0-flash-lite.
  • Qdrant: Implemented VectorStore client for vector operations (Upsert, Search, Delete).
  • GitHub: Implemented API client using google/go-github for issue management and commenting.

Pipeline Architecture

  • Refactored gemini package to eliminate import cycles.
  • Introduced dependency injection in the step registry (Dependencies struct).
  • Updated pipeline steps (VectorDBPrep, Indexer, SimilaritySearch, Triage, ActionExecutor) to use injected clients.

Documentation

  • Standardized file headers across the codebase.

Related Issues

Closes #2, #3, #4

- Add Gemini embedder using gemini-embedding-001 (768 dimensions)
- Add Gemini LLM client using gemini-2.0-flash-lite for triage
- Implement prompt templates for issue analysis and response generation
- Add unit tests for prompt building and response parsing
- Update go.mod with Google Generative AI SDK dependencies

All tests pass.
- Add Qdrant client implementing VectorStore interface
- Add operations: CreateCollection, Upsert, Search, Delete
- Add unit tests for payload converters
- Update go.mod with Qdrant Go SDK dependencies (go mod tidy)

All tests pass.
- Update Dependencies struct to hold Embedder, LLMClient, and VectorStore
- Update step constructors to accept Dependencies via dependency injection
- Implement logic in VectorDBPrep, Indexer, SimilaritySearch, Triage, and ResponseBuilder using real clients
- Add deterministic UUID generation for Qdrant points
- Add  for centralized step registration
- Fix import cycles between integrations and pipeline packages
- Create `internal/integrations/github` package
- Implement authentication and client wrapper for google/go-github
- Add GitHub Client to pipeline Dependencies
- Update ActionExecutor to use GitHub client for commenting and labeling
- Resolves #4
@Kavirubc Kavirubc marked this pull request as ready for review February 2, 2026 05:00
@Kavirubc Kavirubc requested a review from Copilot February 2, 2026 05:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements v0.0.2v, which integrates foundational AI and database services into the Simili-Bot modular pipeline architecture established in v0.0.1v. The PR adds support for Gemini AI (embeddings and LLM), Qdrant vector store, and GitHub API client, wiring these integrations into existing pipeline steps.

Changes:

  • Integrated Gemini AI for embeddings (gemini-embedding-001) and LLM-based triage (gemini-2.0-flash-lite)
  • Implemented Qdrant vector store client for semantic similarity search
  • Created GitHub API client for issue operations (comments, labels)
  • Refactored all pipeline steps to use dependency injection
  • Added file headers to all Go source files
  • Updated configuration to support embedding dimensions

Reviewed changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
internal/integrations/gemini/embedder.go Implements Gemini embedding client with 768-dimensional vectors
internal/integrations/gemini/llm.go LLM client for issue triage analysis
internal/integrations/gemini/prompts.go Prompt templates for triage and response generation
internal/integrations/gemini/gemini_test.go Unit tests for Gemini integration
internal/integrations/qdrant/client.go Qdrant vector store client implementation
internal/integrations/qdrant/types.go Type definitions for Qdrant operations
internal/integrations/qdrant/client_test.go Unit tests for Qdrant client
internal/integrations/github/client.go GitHub API client for issue operations
internal/integrations/github/auth.go GitHub authentication setup
internal/steps/*.go Updated all pipeline steps to accept Dependencies parameter
internal/steps/register.go New step registration system with dependency injection
internal/core/pipeline/registry.go Added Dependencies struct for client injection
internal/core/pipeline/pipeline.go Added SuggestedLabels field to Result struct
internal/core/config/config.go Added Dimensions field to EmbeddingConfig
go.mod Updated dependencies and Go version
DOCS/0.0.2v/*.md Added planning and documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +70 to +82
resNumberFloat, ok := res.Payload["number"].(float64)
if ok && int(resNumberFloat) == ctx.Issue.Number && res.Payload["repo"] == ctx.Issue.Repo {
continue
}

issue := pipeline.SimilarIssue{
Number: int(resNumberFloat),
Title: fmt.Sprintf("%v", res.Payload["title"]),
URL: fmt.Sprintf("%v", res.Payload["url"]),
State: fmt.Sprintf("%v", res.Payload["state"]),
Similarity: float64(res.Score),
}
foundIssues = append(foundIssues, issue)
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the similarity search, there's unsafe type conversion without proper error handling. The code assumes res.Payload["number"] is a float64 and casts it to int without checking if the conversion succeeds. If the payload contains a different type or the key is missing, this will cause a runtime panic or incorrect behavior. Consider adding proper type assertion checks with ok pattern for all payload field accesses.

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +95
res, err := em.EmbedContentWithTitle(ctx, "", parts...)
if err != nil {
return nil, fmt.Errorf("failed to generate batch embeddings: %w", err)
}

if res.Embedding == nil || len(res.Embedding.Values) == 0 {
return nil, fmt.Errorf("empty embedding returned")
}

// Note: Gemini returns a single embedding for batch requests
// For individual embeddings, we need to call Embed for each text
embeddings := make([][]float32, len(texts))
for i, text := range texts {
embedding, err := e.Embed(ctx, text)
if err != nil {
return nil, fmt.Errorf("failed to embed text %d: %w", i, err)
}
embeddings[i] = embedding
}

return embeddings, nil
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EmbedBatch function has inefficient implementation. Lines 75-82 call EmbedContentWithTitle but then ignore the result. Lines 86-93 make individual Embed calls for each text, which defeats the purpose of batch processing. Either remove the unused EmbedContentWithTitle call (lines 75-82) or properly implement batch embedding if the API supports it. The current implementation wastes an API call and doesn't provide any performance benefit over calling Embed in a loop.

Copilot uses AI. Check for mistakes.
Comment on lines +59 to +72
func (c *Client) TransferIssue(ctx context.Context, org, repo string, number int, targetRepo string) error {
// targetRepo is expected to be "owner/name" or just "name" (implies same owner).
// We need to parse it. (TODO: Add parser helper or assume "owner/name" format)

// Check if target is "owner/name"
// For simplicity, let's assume valid input for now.

// Actually go-github Issues.Transfer takes (ctx, owner, repo, number, input).
// Input has NewOwner and NewName.

// We'll need to implement parsing logic later or assume strict input.
// Leaving unimplemented for now or just simple logic.
return fmt.Errorf("transfer not implemented yet")
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TransferIssue function is not implemented and simply returns an error. However, the ActionExecutor step in action_executor.go (lines 69-73) attempts to handle transfer logic. This could lead to confusion where transfers are silently logged as "scheduled" but never actually execute. Consider either fully implementing the TransferIssue method or updating the ActionExecutor to check if the method is implemented before attempting to use it.

Copilot uses AI. Check for mistakes.
Comment thread internal/integrations/qdrant/types.go Outdated
Comment on lines +11 to +14
ID string `json:"id"`
Vector []float32 `json:"vector"`
Payload map[string]interface{} `json:"payload"`
MetaData map[string]interface{} `json:"metadata"`
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Point struct has both Payload and MetaData fields (line 13-14), but in the codebase, only Payload is actually used. The MetaData field is never populated or accessed in client.go. This could cause confusion. Consider removing the unused MetaData field or documenting the intended distinction between Payload and MetaData.

Suggested change
ID string `json:"id"`
Vector []float32 `json:"vector"`
Payload map[string]interface{} `json:"payload"`
MetaData map[string]interface{} `json:"metadata"`
ID string `json:"id"`
Vector []float32 `json:"vector"`
Payload map[string]interface{} `json:"payload"`

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +166
func contains(s, substr string) bool {
return len(s) > 0 && len(substr) > 0 && (s == substr || len(s) >= len(substr) && (s[:len(substr)] == substr || s[len(s)-len(substr):] == substr || containsMiddle(s, substr)))
}

func containsMiddle(s, substr string) bool {
for i := 0; i <= len(s)-len(substr); i++ {
if s[i:i+len(substr)] == substr {
return true
}
}
return false
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The contains helper function has overly complex logic that could be simplified. The function at line 156 attempts to check if substr is at the beginning, middle, or end of s, but this is unnecessarily verbose. Go's standard library provides strings.Contains which does exactly this. Replace this custom implementation with strings.Contains(s, substr) for better readability and maintainability.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +49
// CreateComment posts a comment on an issue.
func (c *Client) CreateComment(ctx context.Context, org, repo string, number int, body string) error {
comment := &github.IssueComment{
Body: github.String(body),
}
_, _, err := c.client.Issues.CreateComment(ctx, org, repo, number, comment)
if err != nil {
return fmt.Errorf("failed to create comment: %w", err)
}
return nil
}

// AddLabels adds labels to an issue.
func (c *Client) AddLabels(ctx context.Context, org, repo string, number int, labels []string) error {
_, _, err := c.client.Issues.AddLabelsToIssue(ctx, org, repo, number, labels)
if err != nil {
return fmt.Errorf("failed to add labels: %w", err)
}
return nil
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub client methods lack input validation. For example, CreateComment doesn't validate that the body parameter is not empty, and AddLabels doesn't check if the labels slice is empty. While the GitHub API may handle these cases, adding basic validation would provide better error messages and prevent unnecessary API calls.

Copilot uses AI. Check for mistakes.
Comment thread internal/integrations/qdrant/client.go Outdated
Comment on lines +70 to +80
func (c *Client) CreateCollection(name string, dimension int) error {
ctx, cancel := c.ctxWithAuth()
defer cancel()

// Check if exists first
exists, err := c.CollectionExists(name)
if err != nil {
return err
}
if exists {
return nil
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CreateCollection method calls CollectionExists which creates a new context with timeout (line 104). However, CreateCollection also has its own context with timeout (line 71). If CollectionExists takes too long and times out, the parent CreateCollection context will still be waiting, potentially leading to resource leaks. The contexts created in ctxWithAuth use context.Background() instead of inheriting from a parent context, which prevents proper cancellation propagation. Consider accepting a context parameter in these methods or ensure contexts are properly linked.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +33
// NewClient creates a new GitHub client using the provided token.
// If token is empty, it returns an unauthenticated client.
func NewClient(ctx context.Context, token string) *Client {
var tc *http.Client

if token != "" {
ts := oauth2.StaticTokenSource(
&oauth2.Token{AccessToken: token},
)
tc = oauth2.NewClient(ctx, ts)
}

client := github.NewClient(tc)

return &Client{
client: client,
}
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub client integration (auth.go and client.go) has no test coverage. Given that other integrations in this codebase have test files (gemini_test.go, client_test.go in qdrant), the GitHub client should also have tests to maintain consistency and ensure reliability of API interactions.

Copilot uses AI. Check for mistakes.
Comment thread internal/integrations/gemini/llm.go Outdated
Comment on lines +128 to +169
// parseTriageResponse parses the LLM response into a TriageResult.
// This is a simple parser - in production, you might want structured output.
func parseTriageResponse(response string) *TriageResult {
result := &TriageResult{
Quality: "good", // Default
SuggestedLabels: []string{},
Reasoning: response,
}

lower := strings.ToLower(response)

// Parse quality
if strings.Contains(lower, "poor quality") || strings.Contains(lower, "quality: poor") {
result.Quality = "poor"
} else if strings.Contains(lower, "needs improvement") || strings.Contains(lower, "quality: needs-improvement") {
result.Quality = "needs-improvement"
}

// Parse labels (look for common patterns)
labels := []string{}
if strings.Contains(lower, "bug") {
labels = append(labels, "bug")
}
if strings.Contains(lower, "feature") || strings.Contains(lower, "enhancement") {
labels = append(labels, "enhancement")
}
if strings.Contains(lower, "documentation") || strings.Contains(lower, "docs") {
labels = append(labels, "documentation")
}
if strings.Contains(lower, "question") {
labels = append(labels, "question")
}
result.SuggestedLabels = labels

// Parse duplicate status
if strings.Contains(lower, "duplicate") || strings.Contains(lower, "similar to") {
result.IsDuplicate = true
result.DuplicateReason = "LLM detected potential duplicate"
}

return result
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parseTriageResponse function uses a brittle string-matching approach to parse LLM output. This is prone to errors if the LLM output format varies even slightly. Consider using structured output formats (like JSON) that the Gemini API supports, or at least add more robust parsing with regex patterns and error handling. The comment on line 129 acknowledges this is simple but production code should use structured output.

Copilot uses AI. Check for mistakes.
Comment thread internal/integrations/qdrant/client.go Outdated
Comment on lines +37 to +39
opts := []grpc.DialOption{
grpc.WithTransportCredentials(insecure.NewCredentials()),
}
Copy link

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Qdrant client hardcodes insecure gRPC credentials (line 38). While the comment on line 36 mentions this is "fine for local development/testing", this is production code that could be deployed to production environments. Consider making TLS configuration conditional based on the environment or URL scheme, or at minimum add a clear warning in the function documentation that this client should not be used in production without modification.

Copilot uses AI. Check for mistakes.
@Kavirubc Kavirubc changed the title [WIP] v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API) v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API) Feb 2, 2026
Kavirubc and others added 2 commits February 2, 2026 11:31
Fixed 11 issues identified in PR #12 review:

1. Added proper type assertion checks in similarity search to prevent panics
2. Removed unused EmbedBatch API call for efficiency
3. Implemented TransferIssue validation and clear error handling
4. Removed unused MetaData field from Point struct
5. Replaced custom contains() with standard strings.Contains
6. Added Close() method to Dependencies for proper resource cleanup
7. Added input validation to GitHub client methods
8. Fixed context propagation in Qdrant client methods
9. Added test coverage for GitHub client
10. Implemented structured JSON output for LLM response parsing
11. Added conditional TLS support for Qdrant cloud connections

All changes maintain backward compatibility and pass existing tests.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@Kavirubc Kavirubc merged commit 13eb629 into main Feb 2, 2026
3 checks passed
@Kavirubc Kavirubc deleted the core-0.0.2v-feature-implementation branch February 2, 2026 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[v0.0.2] Implement GitHub API Client [v0.0.2] Implement VectorStore Integration (Qdrant) [v0.0.2] Implement Embedder Integration (Gemini/OpenAI)

2 participants