QED | Devpost

Inspiration

Students use AI to get answers, not to learn how to think. We built QED — a multi-agent study coach that teaches problem-solving using the Socratic method. Instead of giving direct answers, it guides students through problems with questions and progressive hints until they solve it themselves.

What it does

QED uses five specialized AI agents, all powered by Claude, working together:

Decomposer Agent - Breaks complex problems into reasoning steps
Socratic Coach Agent - Guides with 4 levels of hints (never direct answers until genuine attempts)
Critic Agent - Provides TA-style feedback on solutions, identifying logical gaps and strengths
Planner Agent - Creates realistic study schedules with spaced repetition and checkpoint questions
Misconception Tracker - Analyzes error patterns to recommend targeted practice

Bonus: AI-powered Manim visualizations where Claude generates Python code to create custom math animations — rendered to video in real-time.

How we built it with Claude

Claude (Anthropic's API) is the core intelligence behind QED. We implemented a multi-agent architecture where each agent is a specialized prompt profile that directs Claude's reasoning:

The Socratic Coach prompt instructs Claude to "NEVER give direct answers" and use progressive hint levels
The Critic prompt configures Claude to analyze solutions like a TA, identifying logical gaps
The Planner prompt guides Claude to create study schedules following cognitive science principles

We use Claude's extended context window to maintain conversation history across multiple exchanges, enabling genuine back-and-forth tutoring. For visualizations, Claude generates Manim Python code that we validate and execute in a sandboxed environment.

Tech Stack: Next.js, React, TypeScript, Flask microservice, Manim, LaTeX. Claude handles all LLM interactions with swappable support for GPT-4.

Challenges we ran into

Balancing helpfulness vs. learning: Finding the right hint progression to avoid frustration without solving problems for students required extensive prompt engineering.

AI code safety: Executing Claude-generated Python requires multiple validation layers — client-side regex blocking dangerous imports, server-side sandboxing with restricted built-ins.

Prompt engineering: Writing prompts that teach (not just answer) required studying cognitive science. Small prompt changes drastically affected Claude's behavior.

Manim rendering: LaTeX dependencies, 30-second render times, and syncing video duration with voice-over audio.

Accomplishments & Impact

✅ Built a multi-agent system that genuinely teaches instead of just answering ✅ Safe AI code execution with defense-in-depth security ✅ Beautiful on-the-fly mathematical animations ✅ Ethical AI tutor with academic integrity built-in

Most importantly: it works. Students actually learn problem-solving skills instead of getting instant answers. QED demonstrates that with careful prompt design, Claude can be configured to teach using proven pedagogical methods like the Socratic method and spaced repetition.

What we learned

Prompt engineering is precise craft — constraints like "NEVER give direct answers" need explicit enforcement
Claude's extended context enables genuine multi-turn tutoring conversations
Teaching ≠ answering — best hints are just beyond current student ability
JSON contracts make multi-agent orchestration clean and debuggable