Inspiration

We HATE open CV and training custom models. Trying to develop autonomous hardware devices sucks for beginners because it involves a huge amount of work in the beginning just to get started. It's also not very scalable as the tools required to develop autonomous interfaces are domain-specific and require a lot of research, compute, application-specific data, etc. Also, we had the idea to allow LLMs to control some physical interface.

What it does

IntelliDrive lets you control any hardware interface with natural language through an LLM. You simply describe your objective (e.g., “find me something healthy to it”), and the LLM uses the exposed endpoints to take relevant actions towards the goal. For our purposes with an RC car, we exposed 5 endpoints: /forward, /backward, /left, /right, and /photo. Thus, the LLM might first decide to take a picture to see if the desired goal can be achieved, and then call the movement endpoints to get there (eg, turn left and take a picture again if no current object in frame satisfies the given objective).

The main catch is that we never programmed any CV pipelines! The whole appeal is that given enough context (the tools such as /photo), the reasoning capabilities (general-purpose LLMs) and the ability to execute actions (move forward, back, left, right), any hardware interface can be made autonomous.

How we built it

Hardware:

The custom-built RC car is powered by a Raspberry Pi 4B running QNX OS. It uses the Pi camera module, an L298 motor driver, and 12V DC brushless motors. The Pi is powered by a power bank, while a 7.5V DC power supply powers the motor driver board. Keep in mind, IntelliDrive can be adapted for any type of machine and is incredibly modular.

The Pi interfaces with the GPIO pins for motor control through the rpi_gpio QNX library. The ability to take a picture with the camera is not supported in QNX OS, so we had to use a Python subprocess to open up the QNX camera viewfinder and then click a screenshot, which is saved to the local disk.

Software:

The Raspberry Pi runs on QNX OS, which hosts a Flask server exposing various endpoints for interacting with the GPIO pins and camera, using QNX libraries. The take_photo function captures images with the Pi’s camera and sends them, along with contextual data, through the Flask server to a target machine. This is done because the image is sent to the Gemini API for annotation, and the library caused build errors on QNX. After annotation, the response is sent back to the Pi which in turn returns it to the MCP tool call.

All these functions are integrated as tools within MCP, which is connected to Claude Code (didn't have the energy to set up some other MCP provider at 8 am...). The Gemini API does the data annotation. The front end is built using Next.js and features the React Speech API to collect user answers (although it is not yet connected to the MCP input).

Challenges we ran into

  • Started the first night off trying to do this with a drone, but failed due to some networking issues that we couldn't debug
  • Hacked a Walmart RC car so that we can use its powerful motors, only to realize that the motor for turning was fried... so we had to build our car from scratch.
  • QNX: The Gemini libraries gave build errors on the OS, so we had to bootstrap a rudimentary distributed system so that it could process it on another server. This, in turn, introduced challenges with delays, which we figured out with enough experimentation to let all files save properly and then send them.
  • MCP server would sometimes receive the old screenshot when it asked for a new one, making the car do weird things. Figured that this was due to some multiprocessing hacks we were using so had to take those out... but that meant a slower pipeline
  • Fitting all of the hardware onto a tiny frame was a struggle.
  • Camera quality was kinda bad (it needed a LOT of lighting to work), and context in MCP tool calls were not optimized for speed so LLM took a long time to execute actions.
  • LLM was a little dumb when making actions but also because it only had an annotation string to "see"

Accomplishments that we're proud of

  • It actually worked!
  • We built our own version of a simple distributed network to overcome hurdles we faced with QNX.
  • We improved drastically on our hardware and software skills.

What we learned

  • Having a backup plan is essential (especially in hardware hacks!). We had to simplify our idea a lot, but that was only possible because we had planned for it beforehand.

What's next for Intelli-Drive

  • Connect more tools to the MCP server - imagine PCB encoders, Google Maps API, etc. Then the hardware interface would be able to execute much more complex NL tasks
  • Make the MCP pipeline & server endpoints a lot faster. Taking the photo was especially slow due to QNX limitations
  • Expand Hardware Compatibility: Adapt IntelliDrive to work beyond RC cars — drones, home robots, industrial tools, and other IoT devices.

Built With

Share this project:

Updates