Roadmap: what's coming

Tracking planned work for ArkSim. Items are roughly in priority order.

## Next

- [x] **CI/CD support**
  - [x] Per-metric score thresholds - Configure pass/fail thresholds for individual metrics (e.g. faithfulness >= 0.8, goal_completion >= 0.9) so pipelines can gate on specific quality criteria.
  - [x] GitHub Actions - Reusable action to run `arksim simulate-evaluate` as a pipeline step.
- [x] **Config validation** - Catch invalid configs and missing environment variables early with clear, actionable error messages instead of cryptic failures mid-run.
- [x] **Custom metrics improvements** - Better error reporting for custom metric failures and improved visualization for qualitative metrics.
- [ ] **UI improvements** - Persist scenarios, simulations, and evaluation results across sessions. Better scenario management and result browsing.

## Later

- [ ] **Tool call evaluation** - Evaluate tool call accuracy by reading tool call responses and validating against expected behavior.
- [ ] **Agentic simulation engine** - Multi-knowledge simulation for richer, more realistic scenarios. Agents that pull from multiple knowledge sources across conversation turns.
- [ ] **Streaming support** - Handle agents that stream responses (SSE / chunked transfer), so long-running agents don't time out during simulation.

---

Have a feature request? Comment below or [open an issue](https://github.com/arklexai/arksim/issues/new).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap: what's coming #79

Next

Later

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Roadmap: what's coming #79

Description

Next

Later

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions