Inspiration

As mechatronics students, we are working on an autonomous vehicle project to ease student travel on the campus. Our project leverages the Gemini API for real-time decision making, eliminating the need for pre-mapped environments and enabling navigation in unfamiliar conditions. We aim to incorporate VQA capability provided by Gemini API for decision making in complex manoeuvres. We aim to showcase the innovative use of Generative AI in autonomous vehicle systems.

What it does

We leverage Gemini API to assist us with the decision making process to control vehicle movement on the road.

How we built it

We utilised the CARLA simulator, an open-source platform renowned for its realistic urban environments and comprehensive testing capabilities. We built a Node.js application deployed on Google Cloud that interacts with CARLA. This app uses the Vertex AI API to send real-time image data (frames) generated by CARLA as multimodal prompts, receiving decisions like "STOP" or "GO" in return. Then these commands are subscribed to control script which publishes the vehicle kinematics commands through different python scripts to vehicle

Challenges we ran into

  • Randomness in the response for same image was a big challenge.
  • Unable to fine tune the gemini-pro-vision-1.0 with images hence the responses are generic and not particularly trained to work with autonomous vehicles.
  • As we are consuming API, CARLA integration with Gemini API experience response lag.
  • One major hurdle we faced was providing context for the prompts sent to the Gemini API. Unlike static maps, real-world environments are constantly changing. This made it difficult to anticipate upcoming situations and provide the necessary context for the Gemini to make decisions.

Accomplishments that we're proud of

Completing Stage 0 : Successfully integrating Gemini API with Carla based on image published from Carla.

What we learned

  • Prompts/Questions for Visual Question Answering tailored to autonomous vehicles.
  • Software stack framework integrating CARLA and Gemini.

What's next for Geminus

  • Fine tuning and running it on board hardware devices.
  • Next Stages 1-3
    • Stage 1 : Integrating it with Lane Following Module
    • Stage 2 : Using Parameterized Prompts and Adding a NLP Filter to get Keywords for Control script.
    • Stage 3 : Integrating other sensors (GPS, 3D-LIDAR, IMU) with Gemini.
  • Hardware integration Stages 4 and 5
  • Stage 4 : Using the same software stack as Stage 2 but with a real Vehicle.
  • Stage 5 : Adding Other Sensors (GPS, 3D-LIDAR, IMU) with a real Vehicle.

Built With

Share this project:

Updates