Skip to content

Roadmap: what's coming #79

@itsarbit

Description

@itsarbit

Tracking planned work for ArkSim. Items are roughly in priority order.

Next

  • CI/CD support
    • Per-metric score thresholds - Configure pass/fail thresholds for individual metrics (e.g. faithfulness >= 0.8, goal_completion >= 0.9) so pipelines can gate on specific quality criteria.
    • GitHub Actions - Reusable action to run arksim simulate-evaluate as a pipeline step.
  • Config validation - Catch invalid configs and missing environment variables early with clear, actionable error messages instead of cryptic failures mid-run.
  • Custom metrics improvements - Better error reporting for custom metric failures and improved visualization for qualitative metrics.
  • UI improvements - Persist scenarios, simulations, and evaluation results across sessions. Better scenario management and result browsing.

Later

  • Tool call evaluation - Evaluate tool call accuracy by reading tool call responses and validating against expected behavior.
  • Agentic simulation engine - Multi-knowledge simulation for richer, more realistic scenarios. Agents that pull from multiple knowledge sources across conversation turns.
  • Streaming support - Handle agents that stream responses (SSE / chunked transfer), so long-running agents don't time out during simulation.

Have a feature request? Comment below or open an issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    roadmapPlanned features on the roadmap

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions