Inspiration
Super resolution for general images-- increasing the resolution of images through ML-- is difficult and expensive. From everything from cartoons to animal photography, landscapes to Zoom calls, to find one model to rule them all is years away. Further, as images get larger, inference time grows at least quadratically, leaving current methods ineffective.
This is where Goggles comes in. Because of the pandemic, people have been relying on video communications more and more for 'face-to-face' interaction. Goggles is a solution to increase the quality of these valuable channels, both in enjoyment and resolution.
What it does
Goggles takes advantage of two fairly well established computer vision problems: face detection and facial super-resolution.
Instead of pushing a full image into a super-resolution model, we first detect where the faces are. We can then crop these individual faces and apply an efficient facial super-resolution on the faces alone. Finally, we can stick these images back into the low resolution background.
This gives all the benefit of a higher resolution image, but without the difficulties with creating a general, full-image super-resolution model.
How we built it
The majority of the work involved is building the infrastructure around two models: the Haar Cascades Face Detection algorithm built into OpenCV, and the Face-Super-Resolution model based off of the ESRGAN super-resolution architecture, with
From there, we applied this to connect with the webcam, and ported it to Google Colab to take advantage of their GPUs.
Challenges we ran into
The difficulty of this is making it fast enough for real time use. I was able to cut a lot of the fluff from the original model pre and post processing, but inference time is still a big hurdle.
Accomplishments that we're proud of
It works!
What we learned
You can do a lot of cool ML work without ever training needing to train a model!
What's next for Goggles
The next step would be to try to distill the ESRGAN into something smaller but with similar accuracy. The ultimate goal of this is to be fast enough to work in real-time with one's webcam.
Log in or sign up for Devpost to join the conversation.