Inspiration
As mechatronics students, we are working on an autonomous vehicle project to ease student travel on the campus. Our project leverages the Gemini API for real-time decision making, eliminating the need for pre-mapped environments and enabling navigation in unfamiliar conditions. We aim to incorporate VQA capability provided by Gemini API for decision making in complex manoeuvres. We aim to showcase the innovative use of Generative AI in autonomous vehicle systems.
What it does
We leverage Gemini API to assist us with the decision making process to control vehicle movement on the road.
How we built it
We utilised the CARLA simulator, an open-source platform renowned for its realistic urban environments and comprehensive testing capabilities. We built a Node.js application deployed on Google Cloud that interacts with CARLA. This app uses the Vertex AI API to send real-time image data (frames) generated by CARLA as multimodal prompts, receiving decisions like "STOP" or "GO" in return. Then these commands are subscribed to control script which publishes the vehicle kinematics commands through different python scripts to vehicle
Challenges we ran into
- Randomness in the response for same image was a big challenge.
- Unable to fine tune the gemini-pro-vision-1.0 with images hence the responses are generic and not particularly trained to work with autonomous vehicles.
- As we are consuming API, CARLA integration with Gemini API experience response lag.
- One major hurdle we faced was providing context for the prompts sent to the Gemini API. Unlike static maps, real-world environments are constantly changing. This made it difficult to anticipate upcoming situations and provide the necessary context for the Gemini to make decisions.
Accomplishments that we're proud of
Completing Stage 0 : Successfully integrating Gemini API with Carla based on image published from Carla.
What we learned
- Prompts/Questions for Visual Question Answering tailored to autonomous vehicles.
- Software stack framework integrating CARLA and Gemini.
What's next for Geminus
- Fine tuning and running it on board hardware devices.
- Next Stages 1-3
- Stage 1 : Integrating it with Lane Following Module
- Stage 2 : Using Parameterized Prompts and Adding a NLP Filter to get Keywords for Control script.
- Stage 3 : Integrating other sensors (GPS, 3D-LIDAR, IMU) with Gemini.
- Hardware integration Stages 4 and 5
- Stage 4 : Using the same software stack as Stage 2 but with a real Vehicle.
- Stage 5 : Adding Other Sensors (GPS, 3D-LIDAR, IMU) with a real Vehicle.



Log in or sign up for Devpost to join the conversation.