TreeHackNow — About the Project

Inspiration

Robotics simulation is powerful, but creating robot models is tedious. URDF (Unified Robot Description Format) requires precise XML: links, joints, inertials, collision geometries. A single typo or misplaced origin can break everything. We wondered: what if you could describe a robot in plain English and get a working, simulated URDF?

We were inspired by the gap between natural language (how humans think about robots) and formal representations (how simulators need them). LLMs excel at structured output—why not bridge that gap for robotics?

What it does

The project turns natural language into simulated robot URDFs. You type "A 4-legged dog robot" or "A box with 4 wheels"—and get a valid, physics-tested URDF in seconds.

Core features:

  • Natural language generation — Describe any robot; the LLM produces URDF XML
  • Image-to-URDF — Upload a sketch, diagram, or photo; GPT-4o vision analyzes it and generates a matching URDF
  • RAG-augmented generation — Retrieves relevant URDF snippets from a library (quadrupeds, hexapods, wheeled bases) to improve output quality
  • Validation pipeline — Parse checks (urdfpy), link position checks (no overlapping geometry), effort limits (no floppy robots)
  • Physics simulation — PyBullet runs a 5-second sim; detects explosions, fall-over, self-collisions
  • Error feedback loop — When validation or simulation fails, the error is fed back to the LLM for automatic retry (up to 5 attempts)
  • Iterative refinement — "Make it heavier," "add another wheel," "shorter legs"—modify existing robots with follow-up prompts
  • Multi-terrain stress testing — Flat, uneven, stairs, slope; score robots across all terrains
  • Export — URDF → MJCF (MuJoCo) and SDF (Gazebo) conversion
  • Web UI — 3D preview (Three.js + urdf-loader), history, leaderboard, feedback suggestions

How we built it

Architecture: A three-stage pipeline—Generate → Validate → Simulate—with an orchestrator agent that retries on failure.

  1. Generation — OpenAI GPT-4o-mini with a system prompt that enforces URDF rules. For multi-legged robots, we inject chain-of-thought: the LLM first computes angles ( \theta_i = \frac{360°}{n} \cdot i ) and mount positions ( (x, y) = (r \cos\theta, r \sin\theta) ) before writing XML, avoiding legs at ((0,0,0)).

  2. RAG — TF-IDF over a corpus of URDF snippets (quadruped, hexapod, wheeled base, etc.). Query tokens are matched; top-k snippets are injected into the prompt as examples.

  3. Validation — urdfpy for parse correctness; custom checks for link offsets (( | \text{origin} | > 0.01 ) m) and joint effort (( \geq 100 ) N·m).

  4. Simulation — PyBullet headless mode. Terrain loaders for flat, uneven (heightfield), stairs, slope. Physics sanity check (0.5 s) catches explosions and self-collisions before full 5 s run.

  5. Scoring — Composite score from stability (displacement), uprightness (tilt cosine), and grounding (height). Terrain multipliers: flat 1.0×, slope 1.15×, stairs 1.25×, uneven 1.30×. Final score: [ S = \min\left(100,\; \left(0.4 S_{\text{stab}} + 0.35 S_{\text{upright}} + 0.25 S_{\text{ground}}\right) \cdot m_{\text{terrain}}\right) ]

  6. Web stack — Flask backend, vanilla JS frontend, Three.js + urdf-loader for 3D preview. History and leaderboard persisted to JSON.

Challenges we ran into

  • Leg overlap — Early multi-legged robots had all legs at ((0,0,0)); PyBullet exploded. We added chain-of-thought prompting so the LLM computes angles and positions first.

  • Floppy robots — Weak joint effort caused limbs to collapse. We added effort validation (min 100) and mass validation (0.01–500 kg).

  • Self-collisions — Links touching at spawn caused instability. We added a sanity check that detects non-adjacent link contacts and feeds that back to the LLM.

  • URDF extraction — LLMs sometimes wrap XML in markdown or add commentary. We use regex to extract <?xml ... </robot> and strip the rest.

  • PyBullet on ARM Mac — Some users needed brew install cmake for PyBullet to build. We made simulation optional so generation and validation still work without it.

Accomplishments that we're proud of

  • End-to-end pipeline — From "A 4-legged dog" to a simulated, scored robot in one flow
  • Self-healing — Error feedback loop means the system often fixes its own mistakes without human intervention
  • Image-to-URDF — Two-stage pipeline (analyze → generate) with RAG for sketch/diagram input
  • Multi-format export — URDF, MJCF, SDF from a single description
  • Stress testing — Robots tested on four terrains; leaderboard with terrain-filtered rankings
  • Feedback suggestions — UI suggests refinements ("Robot is unstable — widen the base") that users can one-click apply

What we learned

  • Structured prompting matters — Chain-of-thought for geometry (angles, positions) dramatically improved multi-legged robot quality
  • Validation layers compound — Parse → physics sanity → full sim catches different failure modes
  • RAG helps — Even a small snippet library (10–15 URDFs) improved generation for similar robot types
  • Vision + text — GPT-4o vision can interpret sketches and diagrams; combining that with the text pipeline opened image-to-URDF

What's next for TreeHackNow

  • Mesh support — Generate or reference STL/OBJ meshes for more realistic geometry
  • Trajectory optimization — Use simulation feedback to tune joint parameters (PD gains, limits) automatically
  • Multi-robot scenarios — Generate and simulate multiple robots interacting
  • ROS 2 integration — Export to ROS 2 packages with launch files and config
  • Community snippet library — Allow users to contribute URDF snippets to the RAG index
  • Fine-tuned model — Train a small model on URDF examples for faster, cheaper generation

Built With

Share this project:

Updates