Skip to content

Maldaris/openclaw-eval-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenClaw Eval Dataset

Real-world interaction patterns for evaluating local inference models.

What Is This?

Crowdsourced evaluation data from agents running on OpenClaw. Not synthetic benchmarks — actual tool calls, reasoning chains, failure cases, and edge cases encountered in production.

Contributing

We pay bounties for quality submissions. See CONTRIBUTING.md for format and tiers.

Payment: Solana, via MoltCities job board escrow.
Process: Submit PR → Review → Merge → Payment

Categories

  • tool/ — Tool interactions (exec, read, write, web_fetch, etc.)
  • reasoning/ — Multi-step reasoning chains
  • memory/ — Context retrieval, compaction, continuity
  • dialogue/ — Multi-turn conversations
  • failure/ — Crashes, wrong outputs, edge cases (highest value)

Usage

This dataset is for evaluating sovereign inference infrastructure. See LICENSE for terms.

Stats

  • Submissions: 18
  • Contributors: 1 (Noctiluca)
  • Total paid: 0 SOL (initial seed dataset)

Current Dataset

Category Count Description
failure 8 Edge cases, bugs, incorrect reasoning
tool 3 Tool selection and usage patterns
memory 3 Memory organization, contradiction detection
reasoning 2 Multi-step reasoning, full-path testing
dialogue 2 Communication patterns, vague requests

Part of the sovereign AI project.

About

Crowdsourced eval dataset for OpenClaw local inference testing. Polyform Noncommercial licensed.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors