Task ideas from canitrunopenclaw/ClawBench analysis

## Overview

Analyzed [canitrunopenclaw](https://github.com/codyrobertson/canitrunopenclaw) and its ClawBench benchmarking tool.

**What it is:** A hardware compatibility directory for OpenClaw forks. ClawBench measures *installation and startup performance*, not agent task completion.

**Key difference from PinchBench:**
- **ClawBench** = Can the fork *run* on this hardware? (clone time, install time, disk usage, memory, startup)
- **PinchBench** = Can the agent *complete tasks* correctly? (task accuracy, tool usage, reasoning)

## ClawBench Scoring (for reference)

| Component | Weight | What it measures |
|-----------|--------|------------------|
| Latency | 30 pts | Cold start time (clone + install + startup) |
| Capabilities | 40 pts | 8 capability checks (messaging, browser, code exec, memory, files, search, MCP, tool use) |
| Size | 20 pts | Total disk footprint after install |
| Build | 10 pts | Successful install + successful startup |

**Capabilities detected via static analysis:**
- Messaging (WhatsApp/Telegram/Discord/Slack)
- Browser automation (Puppeteer/Playwright/Selenium)
- Code execution (subprocess/child_process)
- Persistent memory (SQLite/Redis/vectordb)
- File management
- Web search
- MCP support
- Tool use

## Relevant Ideas for PinchBench

While ClawBench doesn't have agent *tasks* to port, some capability checks could inspire new task categories:

### 1. MCP Server Integration Task
ClawBench checks for MCP support. We could add a task where the agent must:
- Connect to an MCP server
- Discover available tools
- Use an MCP-provided tool to complete a task

**Why:** MCP is increasingly important in the OpenClaw ecosystem.

### 2. Multi-Channel Messaging Task
ClawBench checks for messaging platform support. Task idea:
- Send a message via one channel (e.g., mock Discord webhook)
- Verify delivery or response

**Challenge:** Requires mock infrastructure or webhook.

### 3. Browser + Code Execution Combined Task
ClawBench separately checks browser and code execution. Task idea:
- Navigate to a page with browser
- Extract data
- Write a Python/JS script to process it
- Execute the script
- Report results

**Why:** Tests integration of multiple capabilities in a single workflow.

### 4. Resource-Constrained Performance Task
Inspired by ClawBench's focus on hardware limits:
- Give agent a task with a strict time/token budget
- Grade on completion *and* efficiency

**Challenge:** Would need to track token usage in grading.

## Not Applicable to PinchBench

- Clone/install benchmarks (not relevant to agent task completion)
- Disk usage metrics (infrastructure, not task)
- Startup time (infrastructure)
- Static capability detection (we test via actual task completion)

## Conclusion

ClawBench is complementary to PinchBench, not overlapping. They answer different questions:
- ClawBench: "Will this fork run on my Raspberry Pi?"
- PinchBench: "Which model completes tasks best?"

The MCP integration task idea (item #1) is probably the most actionable addition.

cc @olearycrew

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task ideas from canitrunopenclaw/ClawBench analysis #122

Overview

ClawBench Scoring (for reference)

Relevant Ideas for PinchBench

1. MCP Server Integration Task

2. Multi-Channel Messaging Task

3. Browser + Code Execution Combined Task

4. Resource-Constrained Performance Task

Not Applicable to PinchBench

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Component	Weight	What it measures
Latency	30 pts	Cold start time (clone + install + startup)
Capabilities	40 pts	8 capability checks (messaging, browser, code exec, memory, files, search, MCP, tool use)
Size	20 pts	Total disk footprint after install
Build	10 pts	Successful install + successful startup

Task ideas from canitrunopenclaw/ClawBench analysis #122

Description

Overview

ClawBench Scoring (for reference)

Relevant Ideas for PinchBench

1. MCP Server Integration Task

2. Multi-Channel Messaging Task

3. Browser + Code Execution Combined Task

4. Resource-Constrained Performance Task

Not Applicable to PinchBench

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions