Inspiration
Caterpillar’s field reality inspired us: expert mechanics are retiring, junior operators are being asked to do more, and every hour of unplanned downtime is expensive. We wanted to build something that captures veteran intuition and puts it in every operator’s pocket. The bigger motivation was turning inspections from a paperwork task into a safety, reliability, and aftermarket growth engine.
What it does
CAugmenT is a multimodal iOS/Android inspection app that combines computer vision, speech, and engine audio analysis in one online/offline workflow. It helps operators:
- run guided inspections aligned to Cat Inspect-style checklists,
- get real-time AI assessment of captured frames + narration,
- detect engine anomalies from WAV audio clips,
- identify damaged parts and get ranked CAT part matches with dealer links,
- generate structured, Cat Inspect-compatible reporting outputs.
How we built it
We built a React Native/Expo mobile app connected to a FastAPI backend (gptagent.py) with dedicated endpoints for checklist generation, image adjudication, parts matching, engine analysis, and final report compilation. Core stack highlights:
- Frontend: React Native + Expo, Zustand state management, camera/audio capture, TTS + haptics for critical guidance.
- Vision + narration: Faster-Whisper for transcription, lightweight CNN expert hint, GPT-4o Vision for checklist-aware adjudication.
- Parts workflow: image-derived physical signatures + vector retrieval + LLM reranking for top part candidates.
- Engine workflow: CLAP embeddings + novelty scoring against healthy baseline tensors.
- Reporting: automated final checklist aggregation and PDF generation compatible with CAT inspection-style workflows.
Challenges we ran into
- Synchronizing video frames and narration cleanly enough for reliable per-frame analysis.
- Designing a robust multimodal pipeline that stays responsive in field-like conditions.
- Balancing model quality with latency for real-time operator feedback.
- Building retrieval/ranking logic that returns practical, high-confidence part suggestions.
- Keeping the UX simple for gloved, outdoor, high-noise operation while still showing rich AI outputs.
Accomplishments that we're proud of
- Delivering an end-to-end, working mobile experience with three integrated AI workflows.
- Combining vision, speech, and acoustics in a single inspection session rather than siloed tools.
- Producing Cat Inspect-compatible outputs and actionable maintenance recommendations.
- Bridging anomaly detection to business impact through direct parts ordering pathways.
- Establishing a foundation for a compounding “data flywheel” via validated inspection feedback.
What we learned
- Multimodal systems are strongest when each modality has a clear, scoped role.
- Field UX matters as much as model accuracy in operational adoption.
- Structured outputs and deterministic schemas are critical for trustworthy AI in maintenance workflows.
- Retrieval quality depends heavily on strong part representations, not just model choice.
- Building for reliability/offline resilience early changes architecture decisions in a good way.
What's next for CAugmenT
- Improve on-device/offline inference coverage to reduce network dependence even further.
- Expand machine-family and component coverage with more real-world inspection data.
- Strengthen the custom construction-domain model and quarterly retraining loop.
- Add deeper dealer/aftermarket integrations for faster procurement handoff.
- Validate in pilot environments with measurable KPIs (inspection time, safety interventions, conversion to parts orders, and false positive/negative rates).
Log in or sign up for Devpost to join the conversation.