Inspiration

In high-precision manufacturing and research labs, the cost of a single human error is astronomical. Whether it is a misplaced 0.10 capacitor on a $10,000 PCB or a slight deviation in a chemical titration, these "micro-errors" often go unnoticed until it’s too late. We were inspired by the "Action Era" of AI—moving beyond chatbots that just "talk" to creating Autonomous Orchestrators that "watch and protect." We wanted to build a guardian that doesn't just record the past, but understands the spatial-temporal present to secure the future.

What it does

Gemini Sentinel is a real-time, multimodal AI supervisor. Using a mounted camera feed, it: Monitors Workflows: Tracks human actions during complex assembly or lab experiments. Verifies Against Standards: Cross-references live actions with a 1M-token knowledge base of technical schematics and safety protocols. Predicts Errors: Uses spatial reasoning to detect if a component is being placed with incorrect polarity or if a tool is being used unsafely. Intervene via Voice: Provides sub-second, natural language audio feedback via Gemini Live API (e.g., "Stop! The capacitor at C12 is reversed").

How we built it

The core of Gemini Sentinel is the Gemini 3 Pro engine, optimized for agentic workflows: Gemini Live API: We utilized the low-latency multimodal stream to feed high-resolution video frames directly to the model. Spatial-Temporal Reasoning: We implemented a "Memory Buffer" using Thought Signatures. This allows the model to remember the state of the board at T0 and compare it to T-now to understand the sequence of assembly. 1M Token Context: We ingested thousands of pages of PDF schematics into the context window, allowing the model to perform Zero-Shot Verification without needing a custom-trained vision model. Thinking Levels: We set the thinking level to high for initial setup and low during live monitoring to balance deep reasoning with real-time latency.

Challenges we ran into

Occlusion: In a real lab, hands often block the camera. We engineered prompts that leverage Gemini’s reasoning to "infer" what is happening under the hand based on the tools being used. Latency vs. Accuracy: We optimized our media resolution settings to find the "sweet spot" where text on components was legible but the frame rate remained high. Thought Continuity: Maintaining the model’s "train of thought" across long sessions required strict management of Thought Signatures to prevent reasoning drift. Accomplishments that we're proud of Sub-Second Intervention: Achieving a feedback loop fast enough to stop a technician's hand in mid-air. Dynamic Schematic Mapping: The ability for the AI to "read" a complex blueprint it has never seen before and start supervising immediately. Zero-Training Deployment: It works out-of-the-box using only the manufacturer's documentation.

What we learned

We learned that Gemini 3 is an orchestrator. The breakthrough was understanding the physics of the task. We also learned that Temperature = 1.0 is crucial; lowering it hindered the model's ability to stay creative when solving unexpected spatial puzzles.

What's next for Gemini Sentinel

Digital Twins: Using agents to automatically update digital models of the hardware as it is being built. Edge-to-Cloud Hybrid: Running local lightweight models for tracking while using Gemini 3 for high-level reasoning. Collaborative Multi-Agent: Deploying a fleet of Sentinels that communicate to optimize factory flow.

Built With

  • ai
  • ai-agent
  • api-key
  • app.js
  • assets-folder
  • automation
  • capcut-editing
  • computer-vision
  • css
  • devpost-submission
  • engineering-safety
  • error-detection
  • flask
  • gemini-live-api
  • gemini-pro
  • github-repo
  • google-cloud
  • html
  • intellegent
  • iso-standards
  • javascript
  • machine-learning
  • main.py
  • mit-license
  • multimodal-reasoning
  • opencv
  • python
  • react
  • readme.md
  • real-time-monitoring
  • requirements.txt
  • robotics
  • spatial-analysis
  • sub-second-latency
  • system-architecture
  • vertex-ai
  • video-demo
  • vision-ai
Share this project:

Updates