You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users currently lack a way to perform 'dry-run' analysis on multiple issues against an existing index without write access to the repository. This makes it difficult to test bot logic, verify similarity search, or generate reports for stakeholders without spamming a live repo.
Proposed Solution
Implement a new simili batch CLI command.
Goal
Implement a new CLI command batch to process multiple issues from a JSON file against an existing vector index. This enables "dry-run" analysis on a set of issues using the full pipeline (including LLMs) without performing write actions on GitHub.
User Review Required
Important
The batch command will run the full pipeline for each issue, including LLM calls, but will suppress any side effects (comments, label changes, transfers) by forcing DryRun=true.
Output Formats:
JSON: Full detail of the pipeline result.
CSV: Flattened summary suitable for sharing with stakeholders.
Proposed Changes
cmd/simili/commands
[NEW] batch.go: Create a new command batchCmd.
Flags:
--file: Path to JSON file containing an array of issues.
--out-file: Path to save the analysis results. If extension is .csv, output will be CSV.
--format: Explicit output format (json/csv).
--workers: Number of concurrent workers (default: 1).
Context Overrides: --config, --collection (crucial for targeting specific index).
Problem Statement
Users currently lack a way to perform 'dry-run' analysis on multiple issues against an existing index without write access to the repository. This makes it difficult to test bot logic, verify similarity search, or generate reports for stakeholders without spamming a live repo.
Proposed Solution
Implement a new
simili batchCLI command.Goal
Implement a new CLI command
batchto process multiple issues from a JSON file against an existing vector index. This enables "dry-run" analysis on a set of issues using the full pipeline (including LLMs) without performing write actions on GitHub.User Review Required
Important
The
batchcommand will run the full pipeline for each issue, including LLM calls, but will suppress any side effects (comments, label changes, transfers) by forcingDryRun=true.Output Formats:
Proposed Changes
cmd/simili/commands[NEW]
batch.go: Create a new commandbatchCmd.Flags:
--file: Path to JSON file containing an array of issues.--out-file: Path to save the analysis results. If extension is.csv, output will be CSV.--format: Explicit output format (json/csv).--workers: Number of concurrent workers (default: 1).--config,--collection(crucial for targeting specific index).--threshold,--duplicate-threshold,--top-k.Behavior:
[]Issue).--workers).DryRun = true.internal/core/pipelinepipeline_runner.go: Extract core pipeline execution logic intoExecutePipelineto allow reuse by bothprocessandbatchcommands.Verification Plan
batch_input.jsonwith various flags and verify JSON/CSV outputs.Alternatives Considered
Scripting multiple
simili processcalls, which is slow and doesn't provide unified reporting.Feature Scope
Additional Context
This feature is essential for organization-wide analysis and reporting where write permissions might be restricted.
Contribution