Inspiration

Most AI software generation tools are just thin wrappers around a single LLM prompt. The core problem with using LLMs for software generation isn't generating code—it's reliability, strict schema enforcement, and cross-layer consistency. If an LLM creates an API endpoint, but forgets to create the underlying Database table, the app crashes.

I was inspired to treat AI software generation not as a prompt engineering task, but as a rigid system design and control problem.

What it does

ForgeEngine is an AI compiler that translates natural language intent into a fully structured, validated, and executable application configuration. It outputs strict, Zod-validated JSON schemas for UI, API, Database, and Auth layers independently, ensuring total cross-layer consistency.

How I built it

I built ForgeEngine using Next.js and React, powered by a 6-stage pipeline:

  1. Intent & Architecture: Parses natural language into structured product constraints and entities.
  2. Schema Synthesis: Generates independent UI, API, DB, and Auth schemas using the Groq SDK.
  3. Consistency Validator (The Core Engine): A strict validation layer that sweeps across schemas (e.g., ensuring a UI component doesn't request data that the API schema doesn't provide).
  4. Targeted Repair: If the validator finds an LLM hallucination, it doesn't do a blind retry. It feeds the localized error back to a repair LLM for a surgical patch.
  5. Execution Awareness: The final emitted configuration is instantly hydrated into a simulated runtime in the browser, proving the JSON is completely executable.

Challenges I ran into

  • Strict JSON Schema Enforcement: Popular SDK wrappers often aggressively force json_schema modes which break when using versatile open-source models (like Llama 3.3). I had to pivot to using the direct Groq SDK and enforce strict structural keys programmatically.
  • Cross-layer Hallucinations: Initially, the LLM would frequently forget to add a database column that the UI required. Designing the deterministic cross-layer Consistency Validator and the self-healing Repair Engine was highly challenging, but entirely solved the issue.

Accomplishments that I'm proud of

I successfully engineered a toggleable compiler mode (Fast, Balanced, Production) that allows users to explicitly balance latency, cost, and output quality. By implementing the multi-stage repair engine, I increased the structural success rate of edge-case prompts from roughly 40% (single-prompt approach) to nearly 100%.

What I learned

I learned that unpredictable LLMs can absolutely be tamed for production software generation, but only if you wrap them in high-agency, deterministic system design.

Built With

Share this project:

Updates