πŸ—οΈ Inspiration

Construction professionals and civil engineers often rely on visual data to assess infrastructure, document materials, and ensure safety. We were inspired to create an AI-powered assistant that could analyze construction images and provide instant, structured feedback about visible components, materials, and equipment β€” reducing manual documentation and enabling smarter, faster insights.

🧠 What it does

GemStructify lets users upload a construction-related image (e.g., bridges, buildings, work sites) and uses Gemini AI to analyze the photo. It returns a structured breakdown in JSON format that categorizes:

πŸ—οΈ Structural Components (e.g., beams, trusses, slabs)

🧱 Building Materials (e.g., bricks, concrete, steel)

🧰 Construction Equipment (e.g., cranes, excavators)

The app also provides an animated frontend experience, complete with JWT-based login and a clean UX that highlights the AI-generated findings in real time.

πŸ”§ How we built it

Frontend: Built using Next.js, Tailwind CSS, and Framer Motion for animated UI

Backend: Powered by FastAPI, handles auth, uploads, and Gemini integration

Authentication: Secure JWT-based login system using MongoDB to store users

AI Analysis: Uses Gemini 1.5 Pro through LangChain to interpret image data

Deployment: Integrated via ngrok for real-time API testing and secure communication

🚧 Challenges we ran into

Sending image data to Gemini in a format it could parse β€” we resolved this by converting images to base64 and structuring messages to align with Gemini’s expectations

Managing authentication and secure access to protected pages using JWT

Handling and formatting the raw AI response into clean, readable, and animated UI

CORS issues and backend/frontend communication via ngrok

πŸ… Accomplishments that we're proud of

Successfully integrated image upload and AI interpretation in a single app

Built a complete login-authenticated flow with live user session tracking

Delivered a fast, interactive, and animated user interface

Created a fully structured, categorized, and readable output from raw AI text

πŸ“š What we learned

How to send and format multimodal inputs (image + instruction) to Gemini AI

Fine-tuning prompt design to generate structured, predictable JSON responses

JWT authentication flow and decoding directly on the frontend

Working with Framer Motion to create delightful animations that enhance UX

πŸš€ What's next for GemStructify

πŸ” Add real-time object detection for live construction monitoring

πŸ—οΈ Enable automatic safety checks or risk detection from site images

πŸ’Ύ Allow users to save and export analysis reports for documentation

πŸ› οΈ Expand to support construction plan drawings, drone footage, and 3D scans

🌐 Deploy to production with a proper domain and host backend using Render or Railway

πŸ› οΈ Built With

🧠 Gemini 1.5 Pro (via LangChain) – Multimodal AI for analyzing images

βš™οΈ FastAPI – Backend framework for handling auth, image uploads, and Gemini integration

🌐 Next.js – React framework for frontend with server components

πŸ’¨ Tailwind CSS – Utility-first CSS for styling

🎞️ Framer Motion – For smooth, animated transitions in the UI

🧾 jwt-decode – Client-side JWT decoding for session control

πŸƒ MongoDB Atlas – Database for storing user credentials

πŸ” OAuth2 & JWT – Secure authentication and access control

πŸŒ‰ Ngrok – Tunnel public traffic to local backend during development

πŸ“¦ Python – Backend language for FastAPI, JWT, and Gemini setup

🟨 TypeScript – Strongly-typed language for frontend logic

πŸ“€ FormData API – For sending uploaded images to the backend

πŸ“· base64 Encoding – To embed images into Gemini’s input schema

πŸͺ„ LangChain – To orchestrate structured prompts with Gemini

πŸ’‘ Vercel (optional) – For frontend deployment (Next.js hosting)

Built With

Share this project:

Updates