TreeHackNow — About the Project

Inspiration

Robotics simulation is powerful, but creating robot models is tedious. URDF (Unified Robot Description Format) requires precise XML: links, joints, inertials, collision geometries. A single typo or misplaced origin can break everything. We wondered: what if you could describe a robot in plain English and get a working, simulated URDF?

We were inspired by the gap between natural language (how humans think about robots) and formal representations (how simulators need them). LLMs excel at structured output—why not bridge that gap for robotics?

What it does

The project turns natural language into simulated robot URDFs. You type "A 4-legged dog robot" or "A box with 4 wheels"—and get a valid, physics-tested URDF in seconds.

Core features:

Natural language generation — Describe any robot; the LLM produces URDF XML
Image-to-URDF — Upload a sketch, diagram, or photo; GPT-4o vision analyzes it and generates a matching URDF
RAG-augmented generation — Retrieves relevant URDF snippets from a library (quadrupeds, hexapods, wheeled bases) to improve output quality
Validation pipeline — Parse checks (urdfpy), link position checks (no overlapping geometry), effort limits (no floppy robots)
Physics simulation — PyBullet runs a 5-second sim; detects explosions, fall-over, self-collisions
Error feedback loop — When validation or simulation fails, the error is fed back to the LLM for automatic retry (up to 5 attempts)
Iterative refinement — "Make it heavier," "add another wheel," "shorter legs"—modify existing robots with follow-up prompts
Multi-terrain stress testing — Flat, uneven, stairs, slope; score robots across all terrains
Export — URDF → MJCF (MuJoCo) and SDF (Gazebo) conversion
Web UI — 3D preview (Three.js + urdf-loader), history, leaderboard, feedback suggestions

How we built it

Architecture: A three-stage pipeline—Generate → Validate → Simulate—with an orchestrator agent that retries on failure.

Generation — OpenAI GPT-4o-mini with a system prompt that enforces URDF rules. For multi-legged robots, we inject chain-of-thought: the LLM first computes angles ( \theta_i = \frac{360°}{n} \cdot i ) and mount positions ( (x, y) = (r \cos\theta, r \sin\theta) ) before writing XML, avoiding legs at ((0,0,0)).
RAG — TF-IDF over a corpus of URDF snippets (quadruped, hexapod, wheeled base, etc.). Query tokens are matched; top-k snippets are injected into the prompt as examples.
Validation — urdfpy for parse correctness; custom checks for link offsets (( | \text{origin} | > 0.01 ) m) and joint effort (( \geq 100 ) N·m).
Simulation — PyBullet headless mode. Terrain loaders for flat, uneven (heightfield), stairs, slope. Physics sanity check (0.5 s) catches explosions and self-collisions before full 5 s run.
Scoring — Composite score from stability (displacement), uprightness (tilt cosine), and grounding (height). Terrain multipliers: flat 1.0×, slope 1.15×, stairs 1.25×, uneven 1.30×. Final score: [ S = \min\left(100,\; \left(0.4 S_{\text{stab}} + 0.35 S_{\text{upright}} + 0.25 S_{\text{ground}}\right) \cdot m_{\text{terrain}}\right) ]
Web stack — Flask backend, vanilla JS frontend, Three.js + urdf-loader for 3D preview. History and leaderboard persisted to JSON.

Challenges we ran into

Leg overlap — Early multi-legged robots had all legs at ((0,0,0)); PyBullet exploded. We added chain-of-thought prompting so the LLM computes angles and positions first.
Floppy robots — Weak joint effort caused limbs to collapse. We added effort validation (min 100) and mass validation (0.01–500 kg).
Self-collisions — Links touching at spawn caused instability. We added a sanity check that detects non-adjacent link contacts and feeds that back to the LLM.
URDF extraction — LLMs sometimes wrap XML in markdown or add commentary. We use regex to extract <?xml ... </robot> and strip the rest.
PyBullet on ARM Mac — Some users needed brew install cmake for PyBullet to build. We made simulation optional so generation and validation still work without it.

Accomplishments that we're proud of

End-to-end pipeline — From "A 4-legged dog" to a simulated, scored robot in one flow
Self-healing — Error feedback loop means the system often fixes its own mistakes without human intervention
Image-to-URDF — Two-stage pipeline (analyze → generate) with RAG for sketch/diagram input
Multi-format export — URDF, MJCF, SDF from a single description
Stress testing — Robots tested on four terrains; leaderboard with terrain-filtered rankings
Feedback suggestions — UI suggests refinements ("Robot is unstable — widen the base") that users can one-click apply

What we learned

Structured prompting matters — Chain-of-thought for geometry (angles, positions) dramatically improved multi-legged robot quality
Validation layers compound — Parse → physics sanity → full sim catches different failure modes
RAG helps — Even a small snippet library (10–15 URDFs) improved generation for similar robot types
Vision + text — GPT-4o vision can interpret sketches and diagrams; combining that with the text pipeline opened image-to-URDF

What's next for TreeHackNow

Mesh support — Generate or reference STL/OBJ meshes for more realistic geometry
Trajectory optimization — Use simulation feedback to tune joint parameters (PD gains, limits) automatically
Multi-robot scenarios — Generate and simulate multiple robots interacting
ROS 2 integration — Export to ROS 2 packages with launch files and config
Community snippet library — Allow users to contribute URDF snippets to the RAG index
Fine-tuned model — Train a small model on URDF examples for faster, cheaper generation

Built With

css
flask
html
javascript
openai-api
pybullet
pytest
python
python-dotenv
three.js
urdf-loader
urdfpy

Updates

Kunal Aneja started this project — Feb 14, 2026 11:05 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.