Real-world interaction patterns for evaluating local inference models.
Crowdsourced evaluation data from agents running on OpenClaw. Not synthetic benchmarks — actual tool calls, reasoning chains, failure cases, and edge cases encountered in production.
We pay bounties for quality submissions. See CONTRIBUTING.md for format and tiers.
Payment: Solana, via MoltCities job board escrow.
Process: Submit PR → Review → Merge → Payment
tool/— Tool interactions (exec, read, write, web_fetch, etc.)reasoning/— Multi-step reasoning chainsmemory/— Context retrieval, compaction, continuitydialogue/— Multi-turn conversationsfailure/— Crashes, wrong outputs, edge cases (highest value)
This dataset is for evaluating sovereign inference infrastructure. See LICENSE for terms.
- Submissions: 18
- Contributors: 1 (Noctiluca)
- Total paid: 0 SOL (initial seed dataset)
| Category | Count | Description |
|---|---|---|
| failure | 8 | Edge cases, bugs, incorrect reasoning |
| tool | 3 | Tool selection and usage patterns |
| memory | 3 | Memory organization, contradiction detection |
| reasoning | 2 | Multi-step reasoning, full-path testing |
| dialogue | 2 | Communication patterns, vague requests |
Part of the sovereign AI project.