Inspiration

We wanted to make learning feel more natural and connected to the real world. Instead of staring at flashcards or videos, what if people could walk through a real city and learn by talking with an AI that links what they see to what they need to remember? This idea led us to combine real-time street-level visuals with an intelligent system that adapts to each learner’s memory needs.

What it does

LookAround AI helps users learn while exploring real-world street views. As users move through a location, an AI agent talks to them about what they see—like signs, buildings, or shops—and connects those visuals to knowledge the user is learning or needs to review. For example, the AI might point out a landmark and bring up a related vocabulary word or science concept. Users can speak naturally with the system and hear it respond in real time.

How we built it

We built the system using the TEN Framework and Google Maps API. Google Maps API is the foundation of the visual input layer, letting us fetch high-resolution street-level images and metadata for real-world locations around the globe. This gave our agents a realistic environment to anchor learning in context.

The TEN Framework coordinates multiple AI agents, including:

  • A vision agent that processes Street View images and identifies meaningful visual entities such as buildings, road signs, vehicles, and shops.
  • A language agent that generates conversation content related to both what is seen and what the user needs to study.
  • A speech agent that manages voice recognition and natural speech output so users can talk freely with the system.

TEN ensures these agents stay in sync, handing off data between them and reacting in real time as the user navigates through new scenes. Together with Google Maps, it enables the system to provide immersive, geographically grounded interactions.

Challenges we ran into

Working with real-time street-level image input required designing a robust pipeline to fetch, process, and respond to changing scenes quickly. Google Maps API had rate limits and response constraints, so we had to optimize how and when to request data. Coordinating visual, language, and speech agents under TEN’s control while keeping the conversation natural added complexity.

Accomplishments that we're proud of

We created an immersive learning interface where users can explore real environments while having a voice-based, context-aware conversation with an AI. The combination of Google Maps' visual richness and TEN’s intelligent agent coordination brought a new dimension to educational software. The AI does not speak in a vacuum — it reacts to the user’s environment and adapts the learning to fit.

What we learned

We learned how to integrate multiple AI agents to process visuals, language, and speech in a live context. We also gained experience building a smart interaction layer on top of a map API. The power of combining geographic data with personalized learning logic became clear — people remember better when ideas are tied to real-world scenes.

What's next for LookAround AI

We plan to expand to support more types of street-level environments, including indoor spaces and campus tours. We also aim to open the system to user-defined learning content. The front-end experience will continue to be powered by Google Maps, while the backend will be strengthened by our Recap Memory Planning System, which decides what to review using a memory decay model and spaced repetition algorithm. As a whole, LookAround AI will keep evolving toward a personalized, immersive, and memory-driven learning platform.

Built With

Share this project:

Updates