RFC: 20 real-world task ideas from ClawBytes use cases

## Overview

Analyzed [ClawBytes](https://kilo.ai/kiloclaw/bytes) — the KiloClaw cookbook of real user automation recipes — to identify what people actually do with OpenClaw. These represent validated use cases that users care about.

**Philosophy:** If users are building "bytes" for these workflows, they're high-value tasks worth benchmarking.

---

## Proposed Tasks (20)

### Developer Workflows (7 tasks)

| # | Task | ClawByte | What to test | Grading |
|---|------|----------|--------------|---------|
| 1 | **Shell Command Generator** | Shell Translator | "List all files over 100MB" → correct `find` command | Automated: run command, verify output |
| 2 | **Git Rescue** | Git Rescue | "I committed to wrong branch" → recovery commands | Automated: create broken repo, verify fixed state |
| 3 | **Commit Message Writer** | Commit Poet | Given a diff, write conventional commit message | LLM judge: quality, format compliance |
| 4 | **Log Analysis** | Log Detective | Parse error logs, identify root cause | Automated: check if correct error identified |
| 5 | **CI/CD Debug** | Pipeline Paramedic | Fix failing GitHub Actions YAML | Automated: validate fixed YAML, run linter |
| 6 | **Test Generation** | Test Factory | Generate tests for a function | Automated: run tests, check coverage |
| 7 | **Dockerfile Optimization** | Dockerfile Doctor | Optimize a bloated Dockerfile | Automated: build both, compare size/layers |

### Productivity & Email (4 tasks)

| # | Task | ClawByte | What to test | Grading |
|---|------|----------|--------------|---------|
| 8 | **Meeting Summary** | Meeting Distiller | Transcript → action items, decisions, TL;DR | LLM judge: completeness, accuracy |
| 9 | **Email Triage** | Inbox Zero Bot | Categorize 20 emails by priority/type | Automated: compare to ground truth labels |
| 10 | **Email Drafting** | Inbox Zero Bot | Draft reply to customer complaint | LLM judge: tone, completeness |
| 11 | **Task Management** | Task Whisperer | "Add task for Friday" → correct API call | Automated: verify mock API received correct request |

### Research & Information (4 tasks)

| # | Task | ClawByte | What to test | Grading |
|---|------|----------|--------------|---------|
| 12 | **Deep Research** | Source Hunter | Research a topic, find primary sources | LLM judge: source quality, citations |
| 13 | **Tech News Digest** | Tech Radar | Summarize HN/Reddit for a topic | LLM judge: relevance, coverage |
| 14 | **Bookmark Organization** | Bookmark Rescuer | Categorize 50 bookmarks, find dead links | Automated: check categories, verify dead links |
| 15 | **Competitive Research** | (general) | Compare 3 products, structured output | LLM judge: accuracy, structure |

### Code Understanding (3 tasks)

| # | Task | ClawByte | What to test | Grading |
|---|------|----------|--------------|---------|
| 16 | **Codebase Navigation** | Codebase GPS | "Where is auth handled?" in unfamiliar repo | Automated: check if correct file(s) identified |
| 17 | **README Generation** | README Reviver | Generate README from repo contents | LLM judge: accuracy, completeness |
| 18 | **Onboarding Guide** | Onboarding Buddy | "How do I run this repo?" setup instructions | LLM judge: correctness, completeness |

### Writing & Content (2 tasks)

| # | Task | ClawByte | What to test | Grading |
|---|------|----------|--------------|---------|
| 19 | **AI Writing Cleanup** | De-Botinator | Remove AI patterns from text | Automated: count AI patterns before/after |
| 20 | **Data Cleaning** | Data Janitor | Normalize messy CSV (dates, names, etc.) | Automated: compare to expected output |

---

## Priority Ranking

**High priority** (core OpenClaw value props, easy to implement):
1. Shell Command Generator — immediate payoff, easy grading
2. Email Triage — common use case, automated grading possible
3. Meeting Summary — high-value, shows information synthesis
4. Data Cleaning — clear before/after, automated grading
5. Git Rescue — dev-focused, verifiable

**Medium priority** (valuable but need mock services):
6. Task Management — needs mock Todoist API
7. Email Drafting — needs mock email service
8. CI/CD Debug — needs fixture repos
9. Test Generation — needs fixture code
10. Commit Message Writer — needs fixture diffs

**Lower priority** (complex setup or subjective grading):
11-20. Research tasks, code understanding, README generation

---

## Implementation Notes

### Tasks that need mock services (see #123)
- Email Triage/Drafting → Mock Email API
- Task Management → Mock Todo API
- CI/CD Debug → Mock GitHub API (or real public repo)

### Tasks that need fixture data
- Log Analysis → Sample error logs with known root causes
- Data Cleaning → Messy CSVs with known clean versions
- Meeting Summary → Sample transcripts with expected outputs
- Bookmark Organization → Bookmark exports with known categories

### Tasks that can use real APIs
- Shell Command Generator → Just needs a shell
- Deep Research → Web search (already available)
- Codebase Navigation → Real public repos

---

## Overlap with Existing Tasks

Checking against current PinchBench tasks:

| ClawByte | Existing PinchBench Task? | Notes |
|----------|---------------------------|-------|
| Meeting Summary | ❌ | New |
| Email Triage | `task_16_email_triage` | ✅ Already exists |
| Data Cleaning | ❌ | New |
| Shell Command | ❌ | New |
| Git Rescue | ❌ | New |
| Research | `task_18_market_research` | Similar, could expand |
| PDF Summary | `task_20_eli5_pdf_summary` | ✅ Already exists |

---

## Next Steps

1. Start with **Shell Command Generator** — simplest to implement, high value
2. Add **Data Cleaning** — clear automated grading
3. Add **Meeting Summary** — showcase information synthesis
4. Build mock services (per #123) to unlock email/task tasks
5. Add code-focused tasks as fixture repos are built

cc @olearycrew

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: 20 real-world task ideas from ClawBytes use cases #124

Overview

Proposed Tasks (20)

Developer Workflows (7 tasks)

Productivity & Email (4 tasks)

Research & Information (4 tasks)

Code Understanding (3 tasks)

Writing & Content (2 tasks)

Priority Ranking

Implementation Notes

Tasks that need mock services (see #123)

Tasks that need fixture data

Tasks that can use real APIs

Overlap with Existing Tasks

Next Steps

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Task	ClawByte	What to test	Grading
1	Shell Command Generator	Shell Translator	"List all files over 100MB" → correct `find` command	Automated: run command, verify output
2	Git Rescue	Git Rescue	"I committed to wrong branch" → recovery commands	Automated: create broken repo, verify fixed state
3	Commit Message Writer	Commit Poet	Given a diff, write conventional commit message	LLM judge: quality, format compliance
4	Log Analysis	Log Detective	Parse error logs, identify root cause	Automated: check if correct error identified
5	CI/CD Debug	Pipeline Paramedic	Fix failing GitHub Actions YAML	Automated: validate fixed YAML, run linter
6	Test Generation	Test Factory	Generate tests for a function	Automated: run tests, check coverage
7	Dockerfile Optimization	Dockerfile Doctor	Optimize a bloated Dockerfile	Automated: build both, compare size/layers

#	Task	ClawByte	What to test	Grading
8	Meeting Summary	Meeting Distiller	Transcript → action items, decisions, TL;DR	LLM judge: completeness, accuracy
9	Email Triage	Inbox Zero Bot	Categorize 20 emails by priority/type	Automated: compare to ground truth labels
10	Email Drafting	Inbox Zero Bot	Draft reply to customer complaint	LLM judge: tone, completeness
11	Task Management	Task Whisperer	"Add task for Friday" → correct API call	Automated: verify mock API received correct request

#	Task	ClawByte	What to test	Grading
12	Deep Research	Source Hunter	Research a topic, find primary sources	LLM judge: source quality, citations
13	Tech News Digest	Tech Radar	Summarize HN/Reddit for a topic	LLM judge: relevance, coverage
14	Bookmark Organization	Bookmark Rescuer	Categorize 50 bookmarks, find dead links	Automated: check categories, verify dead links
15	Competitive Research	(general)	Compare 3 products, structured output	LLM judge: accuracy, structure

#	Task	ClawByte	What to test	Grading
16	Codebase Navigation	Codebase GPS	"Where is auth handled?" in unfamiliar repo	Automated: check if correct file(s) identified
17	README Generation	README Reviver	Generate README from repo contents	LLM judge: accuracy, completeness
18	Onboarding Guide	Onboarding Buddy	"How do I run this repo?" setup instructions	LLM judge: correctness, completeness

#	Task	ClawByte	What to test	Grading
19	AI Writing Cleanup	De-Botinator	Remove AI patterns from text	Automated: count AI patterns before/after
20	Data Cleaning	Data Janitor	Normalize messy CSV (dates, names, etc.)	Automated: compare to expected output

ClawByte	Existing PinchBench Task?	Notes
Meeting Summary	❌	New
Email Triage	`task_16_email_triage`	✅ Already exists
Data Cleaning	❌	New
Shell Command	❌	New
Git Rescue	❌	New
Research	`task_18_market_research`	Similar, could expand
PDF Summary	`task_20_eli5_pdf_summary`	✅ Already exists

RFC: 20 real-world task ideas from ClawBytes use cases #124

Description

Overview

Proposed Tasks (20)

Developer Workflows (7 tasks)

Productivity & Email (4 tasks)

Research & Information (4 tasks)

Code Understanding (3 tasks)

Writing & Content (2 tasks)

Priority Ranking

Implementation Notes

Tasks that need mock services (see #123)

Tasks that need fixture data

Tasks that can use real APIs

Overlap with Existing Tasks

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions