ποΈ Inspiration
Construction professionals and civil engineers often rely on visual data to assess infrastructure, document materials, and ensure safety. We were inspired to create an AI-powered assistant that could analyze construction images and provide instant, structured feedback about visible components, materials, and equipment β reducing manual documentation and enabling smarter, faster insights.
π§ What it does
GemStructify lets users upload a construction-related image (e.g., bridges, buildings, work sites) and uses Gemini AI to analyze the photo. It returns a structured breakdown in JSON format that categorizes:
ποΈ Structural Components (e.g., beams, trusses, slabs)
π§± Building Materials (e.g., bricks, concrete, steel)
π§° Construction Equipment (e.g., cranes, excavators)
The app also provides an animated frontend experience, complete with JWT-based login and a clean UX that highlights the AI-generated findings in real time.
π§ How we built it
Frontend: Built using Next.js, Tailwind CSS, and Framer Motion for animated UI
Backend: Powered by FastAPI, handles auth, uploads, and Gemini integration
Authentication: Secure JWT-based login system using MongoDB to store users
AI Analysis: Uses Gemini 1.5 Pro through LangChain to interpret image data
Deployment: Integrated via ngrok for real-time API testing and secure communication
π§ Challenges we ran into
Sending image data to Gemini in a format it could parse β we resolved this by converting images to base64 and structuring messages to align with Geminiβs expectations
Managing authentication and secure access to protected pages using JWT
Handling and formatting the raw AI response into clean, readable, and animated UI
CORS issues and backend/frontend communication via ngrok
π Accomplishments that we're proud of
Successfully integrated image upload and AI interpretation in a single app
Built a complete login-authenticated flow with live user session tracking
Delivered a fast, interactive, and animated user interface
Created a fully structured, categorized, and readable output from raw AI text
π What we learned
How to send and format multimodal inputs (image + instruction) to Gemini AI
Fine-tuning prompt design to generate structured, predictable JSON responses
JWT authentication flow and decoding directly on the frontend
Working with Framer Motion to create delightful animations that enhance UX
π What's next for GemStructify
π Add real-time object detection for live construction monitoring
ποΈ Enable automatic safety checks or risk detection from site images
πΎ Allow users to save and export analysis reports for documentation
π οΈ Expand to support construction plan drawings, drone footage, and 3D scans
π Deploy to production with a proper domain and host backend using Render or Railway
π οΈ Built With
π§ Gemini 1.5 Pro (via LangChain) β Multimodal AI for analyzing images
βοΈ FastAPI β Backend framework for handling auth, image uploads, and Gemini integration
π Next.js β React framework for frontend with server components
π¨ Tailwind CSS β Utility-first CSS for styling
ποΈ Framer Motion β For smooth, animated transitions in the UI
π§Ύ jwt-decode β Client-side JWT decoding for session control
π MongoDB Atlas β Database for storing user credentials
π OAuth2 & JWT β Secure authentication and access control
π Ngrok β Tunnel public traffic to local backend during development
π¦ Python β Backend language for FastAPI, JWT, and Gemini setup
π¨ TypeScript β Strongly-typed language for frontend logic
π€ FormData API β For sending uploaded images to the backend
π· base64 Encoding β To embed images into Geminiβs input schema
πͺ LangChain β To orchestrate structured prompts with Gemini
π‘ Vercel (optional) β For frontend deployment (Next.js hosting)
Built With
- fastapi
- framer-motion
- gemini
- jwt-decode
- langchain
- mongodb
- ngrok
- python
- typescript
- vercel

Log in or sign up for Devpost to join the conversation.