GeoNet is a deep learning model designed for geolocating images based on visual cues, built on a hybrid architecture that combines Vision Transformers (ViT) and Convolutional Neural Networks (CNNs). This combination allows GeoNet to effectively extract both local and global features from images, enabling more accurate geolocation predictions. The model processes input images and predicts the geographic location where the image was captured, making it suitable for applications such as geolocation-based games, mapping, and research.
Hybrid Architecture: Utilizes both Vision Transformers (ViT) and Convolutional Neural Networks (CNNs) to capture global and local image features for more robust geolocation predictions.
Accurate Image Geolocation: Predicts geographic coordinates (latitude and longitude) based on visual information within the input image.
Extensive Training: The model has been trained on a diverse dataset of geotagged images sourced from the Google Street View API, enhancing its ability to generalize across different landscapes, environments, and urban structures.
Applications: Useful in location-based services, geolocation games (e.g., GeoGuessr), mapping applications, and research that requires spatial analysis of image data.
https://www.youtube.com/watch?v=hvCGrNylYic
Ashwin Santhosh
Alex Guo
Daniel Rolfe
Nolan Young
This project is subject to copyright.
© 2024 Ashwin Santhosh, Alex Guo, Daniel Rolfe, Nolan Young. All rights reserved.