v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API) by Kavirubc · Pull Request #12 · similigh/simili-bot

Kavirubc · 2026-02-02T04:25:14Z

Summary

This pull request implements the foundational integrations for Simili-Bot v0.0.2v, including Google Gemini, Qdrant Vector Store, and the GitHub API. It establishes the core pipeline architecture and wiring.

Changes

Core Integrations

Gemini: Implemented Embedder interface using gemini-embedding-001 and LLMClient using gemini-2.0-flash-lite.
Qdrant: Implemented VectorStore client for vector operations (Upsert, Search, Delete).
GitHub: Implemented API client using google/go-github for issue management and commenting.

Pipeline Architecture

Refactored gemini package to eliminate import cycles.
Introduced dependency injection in the step registry (Dependencies struct).
Updated pipeline steps (VectorDBPrep, Indexer, SimilaritySearch, Triage, ActionExecutor) to use injected clients.

Documentation

Standardized file headers across the codebase.

Related Issues

Closes #2, #3, #4

- Add Gemini embedder using gemini-embedding-001 (768 dimensions) - Add Gemini LLM client using gemini-2.0-flash-lite for triage - Implement prompt templates for issue analysis and response generation - Add unit tests for prompt building and response parsing - Update go.mod with Google Generative AI SDK dependencies All tests pass.

- Add Qdrant client implementing VectorStore interface - Add operations: CreateCollection, Upsert, Search, Delete - Add unit tests for payload converters - Update go.mod with Qdrant Go SDK dependencies (go mod tidy) All tests pass.

- Update Dependencies struct to hold Embedder, LLMClient, and VectorStore - Update step constructors to accept Dependencies via dependency injection - Implement logic in VectorDBPrep, Indexer, SimilaritySearch, Triage, and ResponseBuilder using real clients - Add deterministic UUID generation for Qdrant points - Add for centralized step registration - Fix import cycles between integrations and pipeline packages

- Create `internal/integrations/github` package - Implement authentication and client wrapper for google/go-github - Add GitHub Client to pipeline Dependencies - Update ActionExecutor to use GitHub client for commenting and labeling - Resolves #4

Copilot

Pull request overview

This pull request implements v0.0.2v, which integrates foundational AI and database services into the Simili-Bot modular pipeline architecture established in v0.0.1v. The PR adds support for Gemini AI (embeddings and LLM), Qdrant vector store, and GitHub API client, wiring these integrations into existing pipeline steps.

Changes:

Integrated Gemini AI for embeddings (gemini-embedding-001) and LLM-based triage (gemini-2.0-flash-lite)
Implemented Qdrant vector store client for semantic similarity search
Created GitHub API client for issue operations (comments, labels)
Refactored all pipeline steps to use dependency injection
Added file headers to all Go source files
Updated configuration to support embedding dimensions

Reviewed changes

Copilot reviewed 30 out of 32 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
internal/integrations/gemini/embedder.go	Implements Gemini embedding client with 768-dimensional vectors
internal/integrations/gemini/llm.go	LLM client for issue triage analysis
internal/integrations/gemini/prompts.go	Prompt templates for triage and response generation
internal/integrations/gemini/gemini_test.go	Unit tests for Gemini integration
internal/integrations/qdrant/client.go	Qdrant vector store client implementation
internal/integrations/qdrant/types.go	Type definitions for Qdrant operations
internal/integrations/qdrant/client_test.go	Unit tests for Qdrant client
internal/integrations/github/client.go	GitHub API client for issue operations
internal/integrations/github/auth.go	GitHub authentication setup
internal/steps/*.go	Updated all pipeline steps to accept Dependencies parameter
internal/steps/register.go	New step registration system with dependency injection
internal/core/pipeline/registry.go	Added Dependencies struct for client injection
internal/core/pipeline/pipeline.go	Added SuggestedLabels field to Result struct
internal/core/config/config.go	Added Dimensions field to EmbeddingConfig
go.mod	Updated dependencies and Go version
DOCS/0.0.2v/*.md	Added planning and documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-02T05:07:17Z

+		resNumberFloat, ok := res.Payload["number"].(float64)
+		if ok && int(resNumberFloat) == ctx.Issue.Number && res.Payload["repo"] == ctx.Issue.Repo {
+			continue
+		}
+
+		issue := pipeline.SimilarIssue{
+			Number:     int(resNumberFloat),
+			Title:      fmt.Sprintf("%v", res.Payload["title"]),
+			URL:        fmt.Sprintf("%v", res.Payload["url"]),
+			State:      fmt.Sprintf("%v", res.Payload["state"]),
+			Similarity: float64(res.Score),
+		}
+		foundIssues = append(foundIssues, issue)


In the similarity search, there's unsafe type conversion without proper error handling. The code assumes res.Payload["number"] is a float64 and casts it to int without checking if the conversion succeeds. If the payload contains a different type or the key is missing, this will cause a runtime panic or incorrect behavior. Consider adding proper type assertion checks with ok pattern for all payload field accesses.

Copilot · 2026-02-02T05:07:18Z

+	res, err := em.EmbedContentWithTitle(ctx, "", parts...)
+	if err != nil {
+		return nil, fmt.Errorf("failed to generate batch embeddings: %w", err)
+	}
+
+	if res.Embedding == nil || len(res.Embedding.Values) == 0 {
+		return nil, fmt.Errorf("empty embedding returned")
+	}
+
+	// Note: Gemini returns a single embedding for batch requests
+	// For individual embeddings, we need to call Embed for each text
+	embeddings := make([][]float32, len(texts))
+	for i, text := range texts {
+		embedding, err := e.Embed(ctx, text)
+		if err != nil {
+			return nil, fmt.Errorf("failed to embed text %d: %w", i, err)
+		}
+		embeddings[i] = embedding
+	}
+
+	return embeddings, nil


The EmbedBatch function has inefficient implementation. Lines 75-82 call EmbedContentWithTitle but then ignore the result. Lines 86-93 make individual Embed calls for each text, which defeats the purpose of batch processing. Either remove the unused EmbedContentWithTitle call (lines 75-82) or properly implement batch embedding if the API supports it. The current implementation wastes an API call and doesn't provide any performance benefit over calling Embed in a loop.

Copilot · 2026-02-02T05:07:18Z

+func (c *Client) TransferIssue(ctx context.Context, org, repo string, number int, targetRepo string) error {
+	// targetRepo is expected to be "owner/name" or just "name" (implies same owner).
+	// We need to parse it. (TODO: Add parser helper or assume "owner/name" format)
+
+	// Check if target is "owner/name"
+	// For simplicity, let's assume valid input for now.
+
+	// Actually go-github Issues.Transfer takes (ctx, owner, repo, number, input).
+	// Input has NewOwner and NewName.
+
+	// We'll need to implement parsing logic later or assume strict input.
+	// Leaving unimplemented for now or just simple logic.
+	return fmt.Errorf("transfer not implemented yet")
+}


The TransferIssue function is not implemented and simply returns an error. However, the ActionExecutor step in action_executor.go (lines 69-73) attempts to handle transfer logic. This could lead to confusion where transfers are silently logged as "scheduled" but never actually execute. Consider either fully implementing the TransferIssue method or updating the ActionExecutor to check if the method is implemented before attempting to use it.

Copilot · 2026-02-02T05:07:18Z

+	ID       string                 `json:"id"`
+	Vector   []float32              `json:"vector"`
+	Payload  map[string]interface{} `json:"payload"`
+	MetaData map[string]interface{} `json:"metadata"`


The Point struct has both Payload and MetaData fields (line 13-14), but in the codebase, only Payload is actually used. The MetaData field is never populated or accessed in client.go. This could cause confusion. Consider removing the unused MetaData field or documenting the intended distinction between Payload and MetaData.

Suggested change

ID string `json:"id"`

Vector []float32 `json:"vector"`

Payload map[string]interface{} `json:"payload"`

MetaData map[string]interface{} `json:"metadata"`

ID string `json:"id"`

Vector []float32 `json:"vector"`

Payload map[string]interface{} `json:"payload"`

Copilot · 2026-02-02T05:07:19Z

+func contains(s, substr string) bool {
+	return len(s) > 0 && len(substr) > 0 && (s == substr || len(s) >= len(substr) && (s[:len(substr)] == substr || s[len(s)-len(substr):] == substr || containsMiddle(s, substr)))
+}
+
+func containsMiddle(s, substr string) bool {
+	for i := 0; i <= len(s)-len(substr); i++ {
+		if s[i:i+len(substr)] == substr {
+			return true
+		}
+	}
+	return false
+}


The contains helper function has overly complex logic that could be simplified. The function at line 156 attempts to check if substr is at the beginning, middle, or end of s, but this is unnecessarily verbose. Go's standard library provides strings.Contains which does exactly this. Replace this custom implementation with strings.Contains(s, substr) for better readability and maintainability.

Copilot · 2026-02-02T05:07:19Z

+// CreateComment posts a comment on an issue.
+func (c *Client) CreateComment(ctx context.Context, org, repo string, number int, body string) error {
+	comment := &github.IssueComment{
+		Body: github.String(body),
+	}
+	_, _, err := c.client.Issues.CreateComment(ctx, org, repo, number, comment)
+	if err != nil {
+		return fmt.Errorf("failed to create comment: %w", err)
+	}
+	return nil
+}
+
+// AddLabels adds labels to an issue.
+func (c *Client) AddLabels(ctx context.Context, org, repo string, number int, labels []string) error {
+	_, _, err := c.client.Issues.AddLabelsToIssue(ctx, org, repo, number, labels)
+	if err != nil {
+		return fmt.Errorf("failed to add labels: %w", err)
+	}
+	return nil
+}


The GitHub client methods lack input validation. For example, CreateComment doesn't validate that the body parameter is not empty, and AddLabels doesn't check if the labels slice is empty. While the GitHub API may handle these cases, adding basic validation would provide better error messages and prevent unnecessary API calls.

Copilot · 2026-02-02T05:07:19Z

+func (c *Client) CreateCollection(name string, dimension int) error {
+	ctx, cancel := c.ctxWithAuth()
+	defer cancel()
+
+	// Check if exists first
+	exists, err := c.CollectionExists(name)
+	if err != nil {
+		return err
+	}
+	if exists {
+		return nil


The CreateCollection method calls CollectionExists which creates a new context with timeout (line 104). However, CreateCollection also has its own context with timeout (line 71). If CollectionExists takes too long and times out, the parent CreateCollection context will still be waiting, potentially leading to resource leaks. The contexts created in ctxWithAuth use context.Background() instead of inheriting from a parent context, which prevents proper cancellation propagation. Consider accepting a context parameter in these methods or ensure contexts are properly linked.

Copilot · 2026-02-02T05:07:20Z

+// NewClient creates a new GitHub client using the provided token.
+// If token is empty, it returns an unauthenticated client.
+func NewClient(ctx context.Context, token string) *Client {
+	var tc *http.Client
+
+	if token != "" {
+		ts := oauth2.StaticTokenSource(
+			&oauth2.Token{AccessToken: token},
+		)
+		tc = oauth2.NewClient(ctx, ts)
+	}
+
+	client := github.NewClient(tc)
+
+	return &Client{
+		client: client,
+	}
+}


The GitHub client integration (auth.go and client.go) has no test coverage. Given that other integrations in this codebase have test files (gemini_test.go, client_test.go in qdrant), the GitHub client should also have tests to maintain consistency and ensure reliability of API interactions.

Copilot · 2026-02-02T05:07:20Z

+// parseTriageResponse parses the LLM response into a TriageResult.
+// This is a simple parser - in production, you might want structured output.
+func parseTriageResponse(response string) *TriageResult {
+	result := &TriageResult{
+		Quality:         "good", // Default
+		SuggestedLabels: []string{},
+		Reasoning:       response,
+	}
+
+	lower := strings.ToLower(response)
+
+	// Parse quality
+	if strings.Contains(lower, "poor quality") || strings.Contains(lower, "quality: poor") {
+		result.Quality = "poor"
+	} else if strings.Contains(lower, "needs improvement") || strings.Contains(lower, "quality: needs-improvement") {
+		result.Quality = "needs-improvement"
+	}
+
+	// Parse labels (look for common patterns)
+	labels := []string{}
+	if strings.Contains(lower, "bug") {
+		labels = append(labels, "bug")
+	}
+	if strings.Contains(lower, "feature") || strings.Contains(lower, "enhancement") {
+		labels = append(labels, "enhancement")
+	}
+	if strings.Contains(lower, "documentation") || strings.Contains(lower, "docs") {
+		labels = append(labels, "documentation")
+	}
+	if strings.Contains(lower, "question") {
+		labels = append(labels, "question")
+	}
+	result.SuggestedLabels = labels
+
+	// Parse duplicate status
+	if strings.Contains(lower, "duplicate") || strings.Contains(lower, "similar to") {
+		result.IsDuplicate = true
+		result.DuplicateReason = "LLM detected potential duplicate"
+	}
+
+	return result
+}


The parseTriageResponse function uses a brittle string-matching approach to parse LLM output. This is prone to errors if the LLM output format varies even slightly. Consider using structured output formats (like JSON) that the Gemini API supports, or at least add more robust parsing with regex patterns and error handling. The comment on line 129 acknowledges this is simple but production code should use structured output.

Copilot · 2026-02-02T05:07:20Z

+	opts := []grpc.DialOption{
+		grpc.WithTransportCredentials(insecure.NewCredentials()),
+	}


The Qdrant client hardcodes insecure gRPC credentials (line 38). While the comment on line 36 mentions this is "fine for local development/testing", this is production code that could be deployed to production environments. Consider making TLS configuration conditional based on the environment or URL scheme, or at minimum add a clear warning in the function documentation that this client should not be used in production without modification.

Fixed 11 issues identified in PR #12 review: 1. Added proper type assertion checks in similarity search to prevent panics 2. Removed unused EmbedBatch API call for efficiency 3. Implemented TransferIssue validation and clear error handling 4. Removed unused MetaData field from Point struct 5. Replaced custom contains() with standard strings.Contains 6. Added Close() method to Dependencies for proper resource cleanup 7. Added input validation to GitHub client methods 8. Fixed context propagation in Qdrant client methods 9. Added test coverage for GitHub client 10. Implemented structured JSON output for LLM response parsing 11. Added conditional TLS support for Qdrant cloud connections All changes maintain backward compatibility and pass existing tests. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Kavirubc added 3 commits February 2, 2026 09:45

Init 0.0.2v Plan

41e6650

docs: add file headers to existing codebase

d2c3c0d

This was linked to issues Feb 2, 2026

[v0.0.2] Implement VectorStore Integration (Qdrant) #3

Closed

[v0.0.2] Implement GitHub API Client #4

Closed

Kavirubc added 3 commits February 2, 2026 10:12

feat: implement Qdrant vector store integration

bd4dc40

- Add Qdrant client implementing VectorStore interface - Add operations: CreateCollection, Upsert, Search, Delete - Add unit tests for payload converters - Update go.mod with Qdrant Go SDK dependencies (go mod tidy) All tests pass.

Kavirubc marked this pull request as ready for review February 2, 2026 05:00

Kavirubc requested a review from Copilot February 2, 2026 05:01

Copilot started reviewing on behalf of Kavirubc February 2, 2026 05:01 View session

Copilot AI reviewed Feb 2, 2026

View reviewed changes

Kavirubc changed the title ~~[WIP] v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)~~ v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API) Feb 2, 2026

Kavirubc and others added 2 commits February 2, 2026 11:31

feat: create and switch to a new session for bug fixes and testing.

0ab1262

Kavirubc merged commit 13eb629 into main Feb 2, 2026
3 checks passed

Kavirubc deleted the core-0.0.2v-feature-implementation branch February 2, 2026 06:28

This was referenced Feb 17, 2026

refactor: rename internal/integrations/gemini to provider-neutral package #68

Closed

ci: add E2E integration test pipeline #71

Merged

[0.2.0][CI] Add E2E integration test pipeline #72

Closed

Sachindu-Nethmin mentioned this pull request Mar 2, 2026

Feature/add grace period minutes #94

Merged

17 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)#12

v0.0.2v: Foundation Integration (Gemini + Qdrant + GitHub API)#12
Kavirubc merged 8 commits intomainfrom
core-0.0.2v-feature-implementation

Kavirubc commented Feb 2, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Copilot AI Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Kavirubc commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Core Integrations

Pipeline Architecture

Documentation

Related Issues

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Kavirubc commented Feb 2, 2026 •

edited

Loading