Overview
Analyzed canitrunopenclaw and its ClawBench benchmarking tool.
What it is: A hardware compatibility directory for OpenClaw forks. ClawBench measures installation and startup performance, not agent task completion.
Key difference from PinchBench:
- ClawBench = Can the fork run on this hardware? (clone time, install time, disk usage, memory, startup)
- PinchBench = Can the agent complete tasks correctly? (task accuracy, tool usage, reasoning)
ClawBench Scoring (for reference)
| Component |
Weight |
What it measures |
| Latency |
30 pts |
Cold start time (clone + install + startup) |
| Capabilities |
40 pts |
8 capability checks (messaging, browser, code exec, memory, files, search, MCP, tool use) |
| Size |
20 pts |
Total disk footprint after install |
| Build |
10 pts |
Successful install + successful startup |
Capabilities detected via static analysis:
- Messaging (WhatsApp/Telegram/Discord/Slack)
- Browser automation (Puppeteer/Playwright/Selenium)
- Code execution (subprocess/child_process)
- Persistent memory (SQLite/Redis/vectordb)
- File management
- Web search
- MCP support
- Tool use
Relevant Ideas for PinchBench
While ClawBench doesn't have agent tasks to port, some capability checks could inspire new task categories:
1. MCP Server Integration Task
ClawBench checks for MCP support. We could add a task where the agent must:
- Connect to an MCP server
- Discover available tools
- Use an MCP-provided tool to complete a task
Why: MCP is increasingly important in the OpenClaw ecosystem.
2. Multi-Channel Messaging Task
ClawBench checks for messaging platform support. Task idea:
- Send a message via one channel (e.g., mock Discord webhook)
- Verify delivery or response
Challenge: Requires mock infrastructure or webhook.
3. Browser + Code Execution Combined Task
ClawBench separately checks browser and code execution. Task idea:
- Navigate to a page with browser
- Extract data
- Write a Python/JS script to process it
- Execute the script
- Report results
Why: Tests integration of multiple capabilities in a single workflow.
4. Resource-Constrained Performance Task
Inspired by ClawBench's focus on hardware limits:
- Give agent a task with a strict time/token budget
- Grade on completion and efficiency
Challenge: Would need to track token usage in grading.
Not Applicable to PinchBench
- Clone/install benchmarks (not relevant to agent task completion)
- Disk usage metrics (infrastructure, not task)
- Startup time (infrastructure)
- Static capability detection (we test via actual task completion)
Conclusion
ClawBench is complementary to PinchBench, not overlapping. They answer different questions:
- ClawBench: "Will this fork run on my Raspberry Pi?"
- PinchBench: "Which model completes tasks best?"
The MCP integration task idea (item #1) is probably the most actionable addition.
cc @olearycrew
Overview
Analyzed canitrunopenclaw and its ClawBench benchmarking tool.
What it is: A hardware compatibility directory for OpenClaw forks. ClawBench measures installation and startup performance, not agent task completion.
Key difference from PinchBench:
ClawBench Scoring (for reference)
Capabilities detected via static analysis:
Relevant Ideas for PinchBench
While ClawBench doesn't have agent tasks to port, some capability checks could inspire new task categories:
1. MCP Server Integration Task
ClawBench checks for MCP support. We could add a task where the agent must:
Why: MCP is increasingly important in the OpenClaw ecosystem.
2. Multi-Channel Messaging Task
ClawBench checks for messaging platform support. Task idea:
Challenge: Requires mock infrastructure or webhook.
3. Browser + Code Execution Combined Task
ClawBench separately checks browser and code execution. Task idea:
Why: Tests integration of multiple capabilities in a single workflow.
4. Resource-Constrained Performance Task
Inspired by ClawBench's focus on hardware limits:
Challenge: Would need to track token usage in grading.
Not Applicable to PinchBench
Conclusion
ClawBench is complementary to PinchBench, not overlapping. They answer different questions:
The MCP integration task idea (item #1) is probably the most actionable addition.
cc @olearycrew