Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (9)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add PRCollection field to QdrantConfig with yaml tag pr_collection.
Field is intentionally excluded from Validate() — empty value means
PRs are indexed in the main collection (backward compatibility).
mergeConfigs updated to propagate the field from child → parent.
New tests cover optional validation and env-var expansion via
${QDRANT_PR_COLLECTION}.
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
GetPullRequest wraps PullRequests.Get for full PR metadata. ListPullRequestFiles paginates PullRequests.ListFiles to collect all changed file paths regardless of PR size. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
…#43) - processPullRequest() fetches full PR details + changed files, builds PR-specific embedding content, and upserts to pr_collection - buildPREmbeddingContent() is a testable helper that produces "Title / Body / Changed Files" formatted text - Producer routes IsPullRequest() items to a separate prJobs channel (with its own worker pool) when PRCollection is configured; falls through to the main issues channel otherwise for backward compat - PR collection is auto-created alongside the issues collection on startup Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
New command fetches a PR, embeds its content, and searches both the issues collection and the optional pr_collection for duplicates. - Flags: --repo, --number (required), --token, --dry-run, --top-k, --threshold - mergeSearchResults() deduplicates hits from both collections by (type, number) key, excludes the queried PR from results, and sorts by score descending - Optional LLM verdict runs DetectDuplicate() on the top-3 candidates - Outputs structured JSON to stdout - Unit tests cover JSON round-trip, merge dedup/sort, and self-exclusion - index_test.go covers buildPREmbeddingContent with/without file lists Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
- .github/simili.yaml: add pr_collection field pointing to
\${QDRANT_PR_COLLECTION}; empty env var disables dedicated PR indexing
- e2e_test.go: TestPRDuplicateDetection verifies the config layer
accepts pr_collection without requiring a live Qdrant instance
Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
2096442 to
e3fdec8
Compare
Simili Triage ReportNote Quality Score: 9.5/10 (Excellent) Classification
Quality Improvements
Similar Threads
Warning Possible Duplicate (Confidence: 99%) ⏳ This pull request will be automatically closed in 72 hours if no objections are raised. If you believe this is not a duplicate, please leave a comment explaining why. Generated by Simili Bot |
There was a problem hiding this comment.
Pull request overview
Adds optional dedicated Qdrant collection support for indexing GitHub pull requests, plus a simili pr-duplicate CLI to search for semantically similar issues/PRs across collections.
Changes:
- Introduces
qdrant.pr_collectionconfig field (optional) and merges it through inheritance/env expansion. - Extends GitHub integration +
simili indexto fetch PR metadata + changed files and upsert PR embeddings into a dedicated collection when configured. - Adds
simili pr-duplicatecommand (with tests) to embed a PR and query/deduplicate candidates across issues + PR collections.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
internal/core/config/config.go |
Adds PRCollection to QdrantConfig and merges it during inheritance. |
internal/core/config/config_test.go |
Tests that PRCollection is optional and supports env expansion. |
internal/integrations/github/client.go |
Adds GitHub API helpers for fetching PR details and listing PR files (paginated). |
cmd/simili/commands/index.go |
Creates PR collection when configured; adds PR worker path (processPullRequest) and PR embedding content builder. |
cmd/simili/commands/index_test.go |
Unit tests for buildPREmbeddingContent. |
cmd/simili/commands/pr_duplicate.go |
New pr-duplicate command: fetch PR, embed, search both collections, merge/dedup results, optional LLM verdict, output JSON. |
cmd/simili/commands/pr_duplicate_test.go |
Unit tests for output JSON marshalling and search result merging logic. |
tests/integration/e2e_test.go |
Adds an integration test that validates config wiring for pr_collection. |
.github/simili.yaml |
Adds optional pr_collection config field placeholder. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| addHit := func(hit *qdrant.SearchResult) { | ||
| itemType, _ := hit.Payload["type"].(string) | ||
| if itemType == "" { | ||
| itemType = "issue" | ||
| } | ||
|
|
||
| number := payloadInt(hit.Payload, "issue_number", "pr_number") | ||
| title, _ := hit.Payload["title"].(string) //nolint:misspell | ||
| url, _ := hit.Payload["url"].(string) | ||
|
|
||
| // Exclude the PR being checked from its own results. | ||
| if itemType == "pull_request" && number == currentPRNumber { | ||
| return | ||
| } |
There was a problem hiding this comment.
mergeSearchResults only extracts the candidate number from payload keys issue_number / pr_number. However, other index paths in this repo store the identifier under number (e.g. internal/steps/indexer.go), which would make payloadInt return 0 and produce bogus candidates like issue #0 and incorrect deduping/exclusion. Consider also skipping hits where no valid number could be extracted (similar to internal/steps/similarity.go).
| // TestPRDuplicateDetection verifies that the pr_collection field is wired correctly | ||
| // through the config layer and passes validation without a real Qdrant instance. | ||
| func TestPRDuplicateDetection(t *testing.T) { | ||
| cfg := &config.Config{ | ||
| Qdrant: config.QdrantConfig{ | ||
| URL: "https://example.qdrant.io:6334", | ||
| APIKey: "qdrant-key", | ||
| Collection: "simili_bot_v1", | ||
| PRCollection: "simili_prs_v1", | ||
| }, | ||
| Embedding: config.EmbeddingConfig{ | ||
| APIKey: "embedding-key", | ||
| }, | ||
| } | ||
|
|
||
| if err := cfg.Validate(); err != nil { | ||
| t.Fatalf("Config with pr_collection should be valid: %v", err) | ||
| } | ||
| if cfg.Qdrant.PRCollection != "simili_prs_v1" { | ||
| t.Errorf("Expected PRCollection 'simili_prs_v1', got %q", cfg.Qdrant.PRCollection) | ||
| } | ||
| } |
There was a problem hiding this comment.
TestPRDuplicateDetection doesn’t exercise PR duplicate detection logic (it only checks config validation / field assignment) and largely duplicates the new unit tests in internal/core/config/config_test.go. This makes the integration suite misleading; consider renaming it to reflect what it asserts and/or moving/removing it in favor of the existing config-layer tests.
| top := out.Candidates | ||
| if len(top) > 3 { | ||
| top = top[:3] | ||
| } | ||
| similar := make([]ai.SimilarIssueInput, len(top)) | ||
| for i, c := range top { | ||
| similar[i] = ai.SimilarIssueInput{ | ||
| Number: c.Number, | ||
| Title: c.Title, | ||
| URL: c.URL, | ||
| Similarity: c.Score, | ||
| } | ||
| } | ||
| dupResult, dupErr := llmClient.DetectDuplicate(ctx, &ai.DuplicateCheckInput{ | ||
| CurrentIssue: &ai.IssueInput{ | ||
| Title: pr.GetTitle(), | ||
| Body: pr.GetBody(), | ||
| }, | ||
| SimilarIssues: similar, | ||
| }) | ||
| if dupErr == nil { | ||
| out.DuplicateDetected = dupResult.IsDuplicate | ||
| out.DuplicateOf = dupResult.DuplicateOf | ||
| out.Confidence = dupResult.Confidence | ||
| out.Reasoning = dupResult.Reasoning |
There was a problem hiding this comment.
The optional LLM verdict reuses ai.DetectDuplicate, whose prompt/contract is explicitly issue-focused (e.g. “duplicate detection system for GitHub issues” and DuplicateOf is documented as an issue number). Since pr-duplicate merges both PR and issue candidates, the LLM result can be ambiguous/misleading (e.g. duplicate_of: 5 without indicating whether that’s an issue or PR). Consider either restricting the LLM verdict to issue candidates only, or extending the duplicate-check input/output to include the candidate type and updating the prompt accordingly.
| top := out.Candidates | |
| if len(top) > 3 { | |
| top = top[:3] | |
| } | |
| similar := make([]ai.SimilarIssueInput, len(top)) | |
| for i, c := range top { | |
| similar[i] = ai.SimilarIssueInput{ | |
| Number: c.Number, | |
| Title: c.Title, | |
| URL: c.URL, | |
| Similarity: c.Score, | |
| } | |
| } | |
| dupResult, dupErr := llmClient.DetectDuplicate(ctx, &ai.DuplicateCheckInput{ | |
| CurrentIssue: &ai.IssueInput{ | |
| Title: pr.GetTitle(), | |
| Body: pr.GetBody(), | |
| }, | |
| SimilarIssues: similar, | |
| }) | |
| if dupErr == nil { | |
| out.DuplicateDetected = dupResult.IsDuplicate | |
| out.DuplicateOf = dupResult.DuplicateOf | |
| out.Confidence = dupResult.Confidence | |
| out.Reasoning = dupResult.Reasoning | |
| // Build up to 3 similar *issues* only, since the LLM duplicate | |
| // detector is issue-focused and its contract assumes issues. | |
| similar := make([]ai.SimilarIssueInput, 0, 3) | |
| for _, c := range out.Candidates { | |
| if c.Type != "issue" { | |
| continue | |
| } | |
| similar = append(similar, ai.SimilarIssueInput{ | |
| Number: c.Number, | |
| Title: c.Title, | |
| URL: c.URL, | |
| Similarity: c.Score, | |
| }) | |
| if len(similar) == 3 { | |
| break | |
| } | |
| } | |
| if len(similar) > 0 { | |
| dupResult, dupErr := llmClient.DetectDuplicate(ctx, &ai.DuplicateCheckInput{ | |
| CurrentIssue: &ai.IssueInput{ | |
| Title: pr.GetTitle(), | |
| Body: pr.GetBody(), | |
| }, | |
| SimilarIssues: similar, | |
| }) | |
| if dupErr == nil { | |
| out.DuplicateDetected = dupResult.IsDuplicate | |
| out.DuplicateOf = dupResult.DuplicateOf | |
| out.Confidence = dupResult.Confidence | |
| out.Reasoning = dupResult.Reasoning | |
| } |
…ltsNilInputs SA9003 (staticcheck): the previous if-block had no body, causing the linter to flag it as an empty branch. Replace with a concrete assertion on len(candidates) instead. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
🧪 E2E Test✅ Bot responded: yes | Auto-closer (dry-run) | processed: 0 closed: 0 grace: 0 human: 0 | Test repo → gh-simili-bot/simili-e2e-22702559434 Auto-generated by E2E pipeline |
1. payloadInt: add "number" as primary key lookup (used by internal/steps/indexer.go) alongside issue_number / pr_number, matching the pattern in internal/steps/similarity.go. Hits with no extractable number (number==0) are now silently skipped to prevent bogus issue #0 candidates and incorrect deduplication. 2. LLM verdict: restrict DetectDuplicate input to issue-type candidates only. The LLM prompt and DuplicateOf field are issue-focused; mixing in PR candidates would produce an ambiguous/misleading verdict. 3. e2e test: rename TestPRDuplicateDetection → TestPRCollectionConfigWiring to accurately reflect that the test validates config-layer wiring, not end-to-end duplicate detection logic. New test TestMergeSearchResultsNumberKeyVariants covers the "number" key variant and the zero-number skip behaviour. Signed-off-by: Kavirubc <hapuarachchikaviru@gmail.com>
🧪 E2E Test✅ Bot responded: yes | Auto-closer (dry-run) | processed: 0 closed: 0 grace: 0 human: 0 | Test repo → gh-simili-bot/simili-e2e-22702874005 Auto-generated by E2E pipeline |
Closes #43
Summary
qdrant.pr_collection(optional) in config routes PRs into their own Qdrant collection instead of mixing them with issuesprocessPullRequestworker — fetches full PR metadata + changed files via the GitHub API, builds a structured embedding (Title / Body / Changed Files), and upserts topr_collectionpr_collectionis not set, PRs continue to be indexed in the main issues collection unchangedsimili pr-duplicateCLI — searches both collections, deduplicates results by(type, number), optionally runs an LLM duplicate verdict, and outputs JSON to stdoutQdrantConfig.PRCollectionfield added; excluded fromValidate()since it is optional; env-var expanded automatically via${QDRANT_PR_COLLECTION}.github/simili.yamlupdated with the new fieldCommits
feat(config)pr_collectiontoQdrantConfig+ merge + testsfeat(github)GetPullRequest+ListPullRequestFiles(paginated)feat(index)processPullRequest,buildPREmbeddingContent, PR routingfeat(cmd)simili pr-duplicatecommand + unit testschore(config,test)Test plan
go test ./internal/core/config/...—TestPRCollectionIsOptional,TestPRCollectionEnvExpansiongo test ./cmd/simili/commands/...—TestBuildPREmbeddingContent,TestParsePRDuplicateOutput,TestMergeSearchResults*go test ./tests/integration/...—TestPRDuplicateDetectionsimili index --repo owner/repo --include-prs --dry-run(requiresQDRANT_PR_COLLECTIONset)simili pr-duplicate --repo owner/repo --number 1 --dry-run