Inspiration
Most AI software generation tools are just thin wrappers around a single LLM prompt. The core problem with using LLMs for software generation isn't generating code—it's reliability, strict schema enforcement, and cross-layer consistency. If an LLM creates an API endpoint, but forgets to create the underlying Database table, the app crashes.
I was inspired to treat AI software generation not as a prompt engineering task, but as a rigid system design and control problem.
What it does
ForgeEngine is an AI compiler that translates natural language intent into a fully structured, validated, and executable application configuration. It outputs strict, Zod-validated JSON schemas for UI, API, Database, and Auth layers independently, ensuring total cross-layer consistency.
How I built it
I built ForgeEngine using Next.js and React, powered by a 6-stage pipeline:
- Intent & Architecture: Parses natural language into structured product constraints and entities.
- Schema Synthesis: Generates independent UI, API, DB, and Auth schemas using the Groq SDK.
- Consistency Validator (The Core Engine): A strict validation layer that sweeps across schemas (e.g., ensuring a UI component doesn't request data that the API schema doesn't provide).
- Targeted Repair: If the validator finds an LLM hallucination, it doesn't do a blind retry. It feeds the localized error back to a repair LLM for a surgical patch.
- Execution Awareness: The final emitted configuration is instantly hydrated into a simulated runtime in the browser, proving the JSON is completely executable.
Challenges I ran into
- Strict JSON Schema Enforcement: Popular SDK wrappers often aggressively force
json_schemamodes which break when using versatile open-source models (like Llama 3.3). I had to pivot to using the direct Groq SDK and enforce strict structural keys programmatically. - Cross-layer Hallucinations: Initially, the LLM would frequently forget to add a database column that the UI required. Designing the deterministic cross-layer Consistency Validator and the self-healing Repair Engine was highly challenging, but entirely solved the issue.
Accomplishments that I'm proud of
I successfully engineered a toggleable compiler mode (Fast, Balanced, Production) that allows users to explicitly balance latency, cost, and output quality. By implementing the multi-stage repair engine, I increased the structural success rate of edge-case prompts from roughly 40% (single-prompt approach) to nearly 100%.
What I learned
I learned that unpredictable LLMs can absolutely be tamed for production software generation, but only if you wrap them in high-agency, deterministic system design.
Built With
- framer-motion
- groq
- next.js
- react
- tailwind.css
- typescript
- zod
Log in or sign up for Devpost to join the conversation.