Machine Perception

Research in machine perception tackles the hard problems of understanding images, sounds, music and video. In recent years, our computers have become much better at such tasks, enabling a variety of new applications such as: content-based search in Google Photos and Image Search, natural handwriting interfaces for Android, optical character recognition for Google Drive documents, and recommendation systems that understand music and YouTube videos. Our approach is driven by algorithms that benefit from processing very large, partially-labeled datasets using parallel computing clusters. A good example is our recent work on object recognition using a novel deep convolutional neural network architecture known as Inception that achieves state-of-the-art results on academic benchmarks and allows users to easily search through their large collection of Google Photos. The ability to mine meaningful information from multimedia is broadly applied throughout Google.

Recent Publications

VISTA: A Test-Time Self-Improving Video Generation Agent

Sercan Arik

Hootan Nakhost

Tomas Pfister

Chen-Yu Lee

Xingchen Wan

Xuan Long Do

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (to appear) (2026)

On-the-Fly OVD Adaptation with FLAME: Few-shot Localization via Active Marginal-Samples Exploration

Yehonathan Refael

Amit Aides

Aviad Barzilai

George Leifman

Vered Silverman

Bolous Jaber

Tomer Shekel

Genady Beryozkin

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops (2026), pp. 886-894

On the Design of the Binaural Rendering Library for Eclipsa Audio Immersive Audio Container

Tomasz Rudzki

Gavin Kearney

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

Generating Dialogues from Egocentric Instructional Videos for Task Assistance: Dataset, Method and Benchmark

Lavisha Aggarwal

Vikas Bahirwani

Lin Li

Andrea Colaco

2025

Perceptual Evaluation of a Mix Presentation for Immersive Audio with IAMF

Carlos Tejeda-Ocampo

Toni Hirvonen

Ema Souza-Blanes

Mahmoud Namazi

Jan Skoglund

AES 158th Convention of the Audio Engineering Society (2025)

Global-to-Local or Local-to-Global? Enhancing Image Retrieval with Efficient Local Search and Effective Global Re-ranking

Dror Aiger

Bingyi Cao

Andre Araujo

Kaifeng Chen

2025

Defining the technology of today and tomorrow.

Philosophy

People

Research areas

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Machine Perception

Recent Publications

Some of our teams

Join us