Inspiration
Students use AI to get answers, not to learn how to think. We built QED — a multi-agent study coach that teaches problem-solving using the Socratic method. Instead of giving direct answers, it guides students through problems with questions and progressive hints until they solve it themselves.
What it does
QED uses five specialized AI agents, all powered by Claude, working together:
- Decomposer Agent - Breaks complex problems into reasoning steps
- Socratic Coach Agent - Guides with 4 levels of hints (never direct answers until genuine attempts)
- Critic Agent - Provides TA-style feedback on solutions, identifying logical gaps and strengths
- Planner Agent - Creates realistic study schedules with spaced repetition and checkpoint questions
- Misconception Tracker - Analyzes error patterns to recommend targeted practice
Bonus: AI-powered Manim visualizations where Claude generates Python code to create custom math animations — rendered to video in real-time.
How we built it with Claude
Claude (Anthropic's API) is the core intelligence behind QED. We implemented a multi-agent architecture where each agent is a specialized prompt profile that directs Claude's reasoning:
- The Socratic Coach prompt instructs Claude to "NEVER give direct answers" and use progressive hint levels
- The Critic prompt configures Claude to analyze solutions like a TA, identifying logical gaps
- The Planner prompt guides Claude to create study schedules following cognitive science principles
We use Claude's extended context window to maintain conversation history across multiple exchanges, enabling genuine back-and-forth tutoring. For visualizations, Claude generates Manim Python code that we validate and execute in a sandboxed environment.
Tech Stack: Next.js, React, TypeScript, Flask microservice, Manim, LaTeX. Claude handles all LLM interactions with swappable support for GPT-4.
Challenges we ran into
Balancing helpfulness vs. learning: Finding the right hint progression to avoid frustration without solving problems for students required extensive prompt engineering.
AI code safety: Executing Claude-generated Python requires multiple validation layers — client-side regex blocking dangerous imports, server-side sandboxing with restricted built-ins.
Prompt engineering: Writing prompts that teach (not just answer) required studying cognitive science. Small prompt changes drastically affected Claude's behavior.
Manim rendering: LaTeX dependencies, 30-second render times, and syncing video duration with voice-over audio.
Accomplishments & Impact
✅ Built a multi-agent system that genuinely teaches instead of just answering ✅ Safe AI code execution with defense-in-depth security ✅ Beautiful on-the-fly mathematical animations ✅ Ethical AI tutor with academic integrity built-in
Most importantly: it works. Students actually learn problem-solving skills instead of getting instant answers. QED demonstrates that with careful prompt design, Claude can be configured to teach using proven pedagogical methods like the Socratic method and spaced repetition.
What we learned
- Prompt engineering is precise craft — constraints like "NEVER give direct answers" need explicit enforcement
- Claude's extended context enables genuine multi-turn tutoring conversations
- Teaching ≠ answering — best hints are just beyond current student ability
- JSON contracts make multi-agent orchestration clean and debuggable
What's next for QED
- Learning analytics dashboard tracking progress
- Voice interface for natural tutoring conversations
- Course-specific agents pre-trained on textbooks
- Session history with automated spaced repetition reminders
QED: Quod Erat Demonstrandum — proving AI can drive deeper learning, not just faster answers.
Log in or sign up for Devpost to join the conversation.