Inspiration
The inspiration behind our project was to create a unique and entertaining AI that could tackle a game we both love—GeoGuessr. We wanted to challenge ourselves with a problem that combines fun with technical difficulty, going beyond traditional AI use cases to create something truly out-of-the-box. Our vision was to build an AI that could analyse images, identify their geographical location, and essentially "play" GeoGuessr like a human would, but with the power of deep learning.
What it does
GeoQuest is an AI-based tool that identifies the location of a given image using a Convolutional Neural Network (CNN) for image processing. It takes in an image and attempts to predict the corresponding geographical coordinates. The AI uses deep learning models to analyse visual cues, such as landmarks, terrain, and other features. Users interact with the AI through a web interface built using React and Flask, allowing them to upload images and receive predictions on where the location might be in the world.
How we built it
We built the core of GeoQuest using a combination of deep learning and web development technologies:
Data Collection: Initially, we intended to use the Mapillary API for gathering images and their corresponding geographical coordinates. However, we faced significant challenges with the API, such as rate limits and restrictions on data access, which made it unreliable for our needs. Eventually, we decided to manually build our own dataset. This involved spending countless hours collecting images, associating them with coordinates, and organizing them into a structured dataframe. Image Processing with CNNs: For the AI model, we used Convolutional Neural Networks (CNNs) to process images and predict coordinates. Given our limited experience with neural networks, we experimented with various architectures and eventually incorporated TinyViT (a vision transformer-based model) to improve the model's ability to capture image features. Backend: We built the server-side logic using Flask, which serves the model and handles API requests from the frontend. The Flask server processes the image input, runs the AI model, and returns the predicted location. Frontend: We used React to create a user-friendly interface that allows users to upload images and view predictions. This was our first time using React, so there was a learning curve as we navigated its component-based structure and integrated it with Flask.
Challenges we ran into
The project presented numerous challenges, from data collection to model training:
Data Collection: The biggest hurdle was finding and compiling a suitable dataset. When the Mapillary API proved to be unreliable, we resorted to manually creating a dataset, which was time-consuming and tedious. Associating images with accurate coordinates and structuring this data into a usable format took much longer than expected. Image Processing and Model Training: Working with CNNs and deep learning models like TinyViT was completely new to us. Understanding how to pre-process images, optimize model parameters, and troubleshoot training issues was a significant challenge. Balancing model complexity and training time on our limited hardware also added to the difficulty. Integrating Flask and React: As newcomers to both Flask and React, integrating the backend and frontend was a challenge. Setting up CORS (Cross-Origin Resource Sharing) to allow communication between the two, managing API requests, and deploying the entire stack tested our problem-solving skills.
Accomplishments that we're proud of
Building a Functional AI: Despite being new to deep learning, we managed to build a working image processing AI that can make predictions about geographical locations. It was rewarding to see the model gradually improve through trial and error. Creating a Custom Dataset: Building our dataset from scratch required a lot of patience and perseverance, and we’re proud of the effort we put into creating a solid foundation for the AI. Learning Flask and React: Diving into Flask and React for the first time was intimidating, but we managed to build a functional web interface that users can interact with to upload images and receive predictions. This experience gave us a better understanding of how to build full-stack applications.
What we learned
Deep Learning Fundamentals: We learned a great deal about how CNNs work, from image pre-processing to model training and evaluation. Working with TinyViT also exposed us to the cutting-edge world of vision transformers. Data Handling: Creating a dataset from scratch taught us the importance of data quality and structure in training machine learning models. We learned how to manage and pre-process image data effectively. Full-Stack Development: We gained practical experience with Flask and React, learning how to set up a backend server, create RESTful APIs, and integrate them with a frontend framework.
What's next for GeoQuest
Model Optimization: We aim to refine the TinyViT model and experiment with other architectures to achieve better prediction accuracy. Fine-tuning hyperparameters and incorporating more data augmentation techniques could significantly improve performance. Enhanced User Experience: We plan to improve the web interface, making it more user-friendly and interactive. Adding features like visualizing predictions on a world map would enhance the overall user experience.
GeoQuest has been a challenging yet rewarding project, pushing us beyond our comfort zones and helping us grow as developers and AI enthusiasts. We’re excited about the potential it has and look forward to making it even better!
Built With
- axios
- cnn
- flask
- google-streetview
- javascript
- jupyter-notebooks
- mapillary
- python
- react
- tinyvit
Log in or sign up for Devpost to join the conversation.