AI-perlapse

Inspiration

With the imposition of a worldwide lockdown, the restriction on traveling was exasperating. Once, during a video-call rant session on being stuck at home, the 4 of us came up with a visually aesthetic idea to reduce others’ indoor suffering. Owing to travel restrictions, if we cannot capture reality in pictures, why don’t we create reality through pictures?

With access to endless images of monuments uploaded by tourists from around the world, the idea of living an experience through the eyes of other people was fascinating. Therefore, we decided to create an AI-based hyper lapse that enables users to choose any monument and watch a moving time-lapse created from pictures uploaded on Google.

What it does

Ai-perlapse is an end-to-end software that generates a hyperlapse of a monument that you dream of visiting. Essentially, the software accepts an input of a monument from the user, and subsequently, implements a Google Cloud-powered network to web scrape images that people around the world have uploaded on Google Images. Next, using Google Cloud Vision, the program filters the images based on its features and landmarks. Lastly, applying SIFT, common key points are established on all images that define its relative depth and angle, which is used in generating the final hyperlapse.

How we built it

Web Scraping - We used a Google API through selenium and chromedriver that created a bot that would take in keywords (input by the user) and scour google images for the [top 'n' (a user input) images] for the specific landmark or monument. The API stored the images in a folder for future use.
Image Filtering - We used Google Cloud Vision and OpenCV to iterate through every image in the created folder and understand whether it was similar to the actual landmark/monument. We also used Google Cloud Vision to create an automated software that would return 10 labels for every image. These labels would then be compared to the labels associated with the first image Google returned (the most accurate representation of the landmark/monument) to filter images based on their relative similarity indices relative to the first image Google returned.
Key Point Matching - Using SIFT and OpenCV, we created an algorithm that boxed the landmark/monument and mapped common key points in images. Using SIFT, we calculate the distance between these key points and use that to determine depth perception - as the distance between these points reduces, the image is zooming out.

In short: 36 hours of scouring the internet, watching a ton of videos on OpenCV and Google Cloud Vision, creating and SIFTing through what seems like a million lines of code is how we created AI-perlapse.

Challenges we ran into

First hackathon for the 3 of us. 36 hours with negligible sleep, with a ton of assignments due on either end of the hackathon - a challenge in itself.

The first technical challenge we encountered was during web scraping images online. We initially considered using an Instagram API to search for images based on location and hashtags. However, available Instagram APIs were outdated and certain privacy changes on Instagram prevented us from implementing them. We then decided to use a google API but encountered an issue whereby we could only download 20 images at once. Therefore, after scouting open-source projects and repositories, we decided to use a Chromedriver with selenium to run the google API, which we partially coded from scratch.

The second issue we came across was certain images incorrectly projecting the monument or landmark in question or projecting it from an obscure angle. Therefore, we came up with filtration techniques to ensure such images were not part of the final hyperlapse. We used a two-pronged approach to accomplish this - firstly, we created an AI-based detection algorithm that would match the object to a landmark and return a normalized score, between 0 and 1, indicating how close the image was to the actual monument/landmark. Additionally, we worked open creating an algorithm that would assign 10 labels, such as "Wonders of the world" and "temple", to every image in a folder. Then, we would compare the labels of the most accurate image in the folder with the labels of the remaining images - if less than 6 labels were common, the image would be removed.

Lastly, we encountered an issue in ensuring the key points of every image were at similar locations for a particular landmark or monument, regardless of the scale. To ensure this, we SIFTed through the images (pun intended), calculating the angles between the key points on the images, ensuring that fell within a certain range. To ensure that our final hyperlapse slowly zoomed into the image, we ran the SIFT algorithm to calculate the distance between two key points, realizing that as it decreased, the distance from the camera increased i.e. it was more zoomed out.

Accomplishments that we're proud of

As the main goal of our participation in VandyHacks was to become better programmers, our biggest achievement was that we learned complex technical skills. Furthermore, considering the high complexity of our project, we were able to accomplish a large chunk of work under a time crunch.

With respect to the project specifics, the image scraping module we created worked perfectly. Even the AI-based image filtering algorithm we created was robust and not only handled errors well but also provided results with high accuracy. Lastly, we also evaluated the scope of our project to lay down a plan on how we can take our project forward through keypoint matching and SIFT.

What we learned

The most important takeaway from VandyHacks was the technical skills we gained over the last 36 hours. We are now proficient in working with software such as Google Cloud, OpenCV, and Chromedriver as well as concepts like web-scraping, AI-based image filtering, and SIFT. We also learned how to operate GitHub efficiently by delving deeper into concepts such as gitcontrol, gitignore, etc. This is an essential skill in any collaborative technical project, which will have a widescale application for us in the future. Lastly, as we were dealing with fairly complex code, the errors we faced along the path taught us the importance of perseverance and also improved our problem-solving skills in team-based projects.

What's next for AI-perlapse

AI Perlapse has infinite potential. We will work to create a hyperlapse with these images using a SIFT filtering algorithm. Further, we hope to allow the user to determine the orientation, angle, and possibly even draw out the trajectory of the hyperlapse. Additionally, we aim to build on our current project to create software that stitches images or parts of images together to create 2-dimensional panoramas or even 3-dimensional pictures of monuments and landmarks.