Skip to content

[0.1.5v] Batch Issue Processing CLI #34

@Kavirubc

Description

@Kavirubc

Problem Statement

Users currently lack a way to perform 'dry-run' analysis on multiple issues against an existing index without write access to the repository. This makes it difficult to test bot logic, verify similarity search, or generate reports for stakeholders without spamming a live repo.

Proposed Solution

Implement a new simili batch CLI command.

Goal

Implement a new CLI command batch to process multiple issues from a JSON file against an existing vector index. This enables "dry-run" analysis on a set of issues using the full pipeline (including LLMs) without performing write actions on GitHub.

User Review Required

Important

The batch command will run the full pipeline for each issue, including LLM calls, but will suppress any side effects (comments, label changes, transfers) by forcing DryRun=true.

Output Formats:

  • JSON: Full detail of the pipeline result.
  • CSV: Flattened summary suitable for sharing with stakeholders.

Proposed Changes

cmd/simili/commands
  • [NEW] batch.go: Create a new command batchCmd.

  • Flags:

    • --file: Path to JSON file containing an array of issues.
    • --out-file: Path to save the analysis results. If extension is .csv, output will be CSV.
    • --format: Explicit output format (json/csv).
    • --workers: Number of concurrent workers (default: 1).
    • Context Overrides: --config, --collection (crucial for targeting specific index).
    • Pipeline Tuning: --threshold, --duplicate-threshold, --top-k.
  • Behavior:

    1. Load config and apply overrides from flags.
    2. Read and parse input JSON file (expecting []Issue).
    3. Iterate through issues (concurrently via --workers).
    4. For each issue, execute the full pipeline with DryRun = true.
    5. Format results as JSON or CSV and save to file or stdout.
internal/core/pipeline
  • [MODIFY] pipeline_runner.go: Extract core pipeline execution logic into ExecutePipeline to allow reuse by both process and batch commands.

Verification Plan

  • Automated Tests: Unit tests for flag overrides, JSON parsing, and CSV generation.
  • Manual Verification: Run batch command on batch_input.json with various flags and verify JSON/CSV outputs.

Alternatives Considered

Scripting multiple simili process calls, which is slow and doesn't provide unified reporting.

Feature Scope

  • CLI
  • Pipeline Steps
  • Configuration

Additional Context

This feature is essential for organization-wide analysis and reporting where write permissions might be restricted.

Contribution

  • I would be willing to help implement this feature

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions