Automated daily research reports powered by Claude Code non-interactive mode and macOS launchd.
Zero Python. Just shell scripts + prompt files.
Every morning at 5:00 AM, Claude autonomously browses the web, selects trending topics, conducts multi-stage deep research, and writes reports directly into your Obsidian vault.
launchd (AM 5:00)
└─ daily-research.sh
├── MCP health check (Haiku)
│ ├── OK → Mem0 enabled
│ └── Fail → continue without Mem0
│
├── Pass 1: Opus (theme selection)
│ ├── Read config.toml # What to research
│ ├── Read past_topics.json # Avoid duplicates
│ ├── WebSearch # Latest trends
│ └── Score & select 2 topics
│
├── Pass 2: Sonnet (research & writing)
│ ├── Mem0 search-memories # Recall past insights (if available)
│ ├── WebSearch x 20-30 # Multi-stage research
│ ├── WebFetch (primary sources)
│ ├── Write 2 reports # → Obsidian vault
│ └── Mem0 add-memory # Store key findings (if available)
│
└── Eval: Opus (quality scoring, non-fatal)
├── 6 dimensions x 2 reports
└── Append to scores.jsonl
2-pass architecture: Opus handles theme selection (deep reasoning), Sonnet handles research and writing (speed + cost efficiency). If Pass 1 fails, Sonnet handles everything as a fallback.
The key insight: Claude Code's -p flag turns it into a fully autonomous research agent. No API plumbing, no Python, no orchestration framework. The intelligence lives in the prompt.
- 2-pass model routing -- Opus for theme selection, Sonnet for research (best of both worlds)
- Configurable research tracks -- Define topics, sources, and scoring criteria in
config.toml - Topic deduplication --
past_topics.jsonprevents covering the same theme twice within 30 days - Multi-stage deep research -- Not just a summary; generates research questions, searches 20-30 times, cross-validates sources
- Weighted topic scoring -- Novelty, momentum, buildability, and "whisper trend" scores
- Obsidian-native output -- Reports with YAML frontmatter, ready for your vault
- Persistent memory (Mem0) -- Optional MCP integration recalls past research insights and stores new findings across sessions
- Robust execution -- Lock files, log rotation, auth checks, macOS notifications, automatic fallback
- Automated quality evaluation -- LLM-as-Judge scores each report on 6 dimensions (30-point scale)
| Requirement | Notes |
|---|---|
| Claude Code CLI | brew install claude or via npm |
| Claude Max plan | For zero-cost non-interactive usage |
| macOS | Uses launchd for scheduling (Linux users: adapt to cron or systemd) |
| Obsidian (optional) | Any markdown-compatible tool works |
# 1. Clone
git clone https://github.com/shimo4228/daily-research.git
cd daily-research
# 2. Configure
cp config.example.toml config.toml
# Edit config.toml: set vault_path and customize tracks
# 3. Make scripts executable
chmod +x scripts/daily-research.sh scripts/eval-run.sh scripts/check-auth.sh
# 4. Verify Claude auth
./scripts/check-auth.sh
# 5. Test run (manual)
./scripts/daily-research.sh
# 6. Schedule with launchd (optional)
cp com.example.daily-research.plist com.daily-research.plist
# Edit: replace YOUR_USERNAME with your macOS username
ln -sf "$(pwd)/com.daily-research.plist" ~/Library/LaunchAgents/
launchctl load ~/Library/LaunchAgents/com.daily-research.plistdaily-research/
├── config.example.toml # Research tracks, scoring, output settings
├── prompts/
│ ├── theme-selection-prompt.md # Pass 1: Theme selection (Opus)
│ ├── task-prompt.md # Pass 2: Research instruction (Sonnet)
│ └── research-protocol.md # Pass 2: Research protocol (system prompt)
├── templates/
│ └── report-template.md # Report format with YAML frontmatter
├── scripts/
│ ├── daily-research.sh # Main entry point (launchd calls this)
│ ├── eval-run.sh # LLM-as-Judge evaluation (post-run)
│ └── check-auth.sh # OAuth token health check
├── evals/
│ ├── prompts/ # Judge rubrics (6 dimensions + system prompt)
│ ├── scores.jsonl # Score log (append-only, gitignored)
│ └── scores.example.jsonl # Schema reference
├── com.example.daily-research.plist # launchd schedule template
├── tests/
│ ├── test-daily-research.bats # Unit tests
│ ├── test-e2e-mock.bats # E2E mock tests
│ ├── test-eval.bats # Evaluation framework tests
│ └── test-log-summary.bats # log_summary parser tests
└── docs/
├── RUNBOOK.md / RUNBOOK.ja.md # Operations guide
├── CONTRIB.md / CONTRIB.ja.md # Development guide
└── plans/ # Future expansion plans
After each successful run, an automated LLM-as-Judge evaluation scores every generated report. This runs as a non-fatal hook -- evaluation failures never block the main pipeline.
Each report is scored by Opus on 6 independent dimensions (1-5 scale, 30 points total):
| Dimension | What It Measures |
|---|---|
| Factual Grounding | Source quality and claim verification |
| Depth of Analysis | Beyond surface-level summaries |
| Coherence | Logical flow and structure |
| Specificity | Concrete examples vs. abstract statements |
| Novelty | Fresh insights beyond common knowledge |
| Actionability | Practical development ideas |
Scores are appended to evals/scores.jsonl. The pipeline_version field enables before/after comparisons when the pipeline changes. See CONTRIB.md for full details.
Edit config.toml to add new tracks:
[tracks.finance]
name = "Finance & Markets"
focus = "Fintech, DeFi, and market trends"
sources = [
"Bloomberg Technology",
"TechCrunch Fintech",
]
scoring_criteria = [
{ name = "Novelty", weight = 30, desc = "Not covered recently" },
{ name = "Momentum", weight = 30, desc = "Actively evolving" },
{ name = "Actionability", weight = 40, desc = "Can act on this insight" },
]The prompt files (prompts/) and report template are written in Japanese. Reports will be generated in Japanese by default. To change the output language:
- Edit the language constraint in
prompts/research-protocol.mdline 10:- 日本語で全て出力すること → - Output everything in English - Translate
prompts/research-protocol.mdandtemplates/report-template.mdto your preferred language.
Edit prompts/research-protocol.md to adjust:
- Number of search queries per topic
- Source validation requirements
- Report structure and length
Edit the plist StartCalendarInterval:
<key>Hour</key>
<integer>7</integer> <!-- Change to 7 AM -->Then reload: launchctl unload ... && launchctl load ...
| Decision | Why |
|---|---|
| 2-pass (Opus + Sonnet) | Opus excels at theme selection (+28% in blind eval); Sonnet is faster and cheaper for research/writing |
| Sonnet fallback on Pass 1 failure | Resilience: if Opus times out or fails, Sonnet handles everything |
--append-system-prompt-file (not --system-prompt-file) |
Preserves Claude Code's default capabilities while adding research instructions |
--allowedTools (not --dangerously-skip-permissions) |
Minimum-privilege: Pass 1 is read-only, Pass 2 adds write |
--max-turns for timeout control |
Process-external timeouts (gtimeout) kill claude via signal, causing data loss |
| Shell scripts only | Zero dependencies beyond Claude Code CLI; trivial to understand and modify |
| TOML config | Human-readable, supports nested structures for tracks/criteria |
| LLM-as-Judge (non-fatal) | Automated quality feedback without blocking production; 6 independent dimensions reduce single-score bias |
< /dev/null stdin redirect |
Prevents MCP stdio communication from conflicting with terminal stdin (root cause of MCP hangs) |
| MCP health check (non-fatal) | Verifies Mem0 MCP before Pass 1; on failure, excludes Mem0 tools and continues |
stream-json for Pass 1 |
Captures tool usage counts alongside result for cost/performance monitoring |
- Claude Code plugins cause hangs -- If you have Claude Code plugins installed globally, their MCP servers initialize on every
claude -pcall, adding minutes of overhead or causing hangs. Create.claude/settings.jsonin the project root to disable them:List each of your installed plugins and set them to{ "enabledPlugins": { "plugin-name@marketplace": false } }false. Check installed plugins withclaude plugin list. There is currently no blanket "disable all" option (tracking issue). - OAuth token expires ~4 days -- Run
claudeinteractively periodically to refresh ANTHROPIC_API_KEYmust be unset -- If set, Claude uses per-token billing instead of Max plan. The script handles this withunset ANTHROPIC_API_KEY- launchd + shell profile --
launchddoes NOT source.zshrc. All PATH entries must be explicit in the script and plist --max-turns-- Pass 1 uses 15 turns (theme selection), Pass 2 uses 40 turns (research). These are guidelines, not hard limits- Do NOT run from inside Claude Code --
claude -pcannot be nested inside another Claude Code session
- RUNBOOK.md / RUNBOOK.ja.md -- Operations: monitoring, troubleshooting, common issues
- CONTRIB.md / CONTRIB.ja.md -- Development: testing, CLI flags, environment variables