Transform Dark Data into Digital Gold
Built for UGAHacks 2026 | Georgia Institute of Technology
- Gabriel Falade - Georgia Tech
- Toye Oni - Georgia Tech
Organizations drown in "dark data" - scattered documents with contradictory information that waste time, cause confusion, and lead to poor decisions. Employees spend hours searching for accurate information, often relying on outdated policies or conflicting guidance.
Transmute solves this problem by automatically:
- π Analyzing document collections to build knowledge graphs
β οΈ Detecting contradictions between documents- π Identifying obsolete information
- π Generating comprehensive wiki summaries
- π¬ Answering questions through an AI-powered chatbot
- Sustainability: Reduces cognitive load and wasted effort searching for information
- Community: Makes public policy documents clear and accessible, improving democracy and public safety
- Efficiency: Saves organizations time and money by surfacing contradictions before they cause problems
- π€ Document Upload: Batch upload documents via ZIP files
- π Analytics Dashboard: View processed documents, contradictions, and relationships
- πΈοΈ Interactive Graph Visualization: Explore document relationships with clickable nodes
- π AI-Generated Wiki: Comprehensive synthesis of all uploaded documents
- π¬ RAG-Powered Chatbot: Ask questions and get answers grounded in your documents
- π Dark/Light Mode: Comfortable viewing in any environment
- π Real-time Processing: Automatic analysis as documents are uploaded
- React (v19.2.4) - UI framework
- React Router DOM (v7.13.0) - Client-side routing
- React Markdown (v10.1.0) - Wiki content rendering
- CSS3 - Custom styling and animations
- Python 3 - Core backend language
- Flask - Web application framework
- Flask-CORS - Cross-origin resource sharing
- Google Gemini API (gemini-2.0-flash) - LLM for:
- Contradiction detection
- Wiki generation
- Document Q&A chatbot
- Obsolete document identification
- Sentence Transformers (all-MiniLM-L6-v2) - Semantic embeddings
- scikit-learn - Cosine similarity calculations
- NumPy - Numerical operations
- JSON - Data persistence (documents, graph, metrics)
- python-dotenv - Environment variable management
- Knowledge Graphs - Document relationship modeling
- RAG (Retrieval-Augmented Generation) - Semantic search + LLM
- REST API - Flask backend serving React frontend
- Python 3.8+
- Node.js 14+
- Google Gemini API Key (Get one here)
- Clone the repository
git clone <repository-url>
cd data-alchemist- Set up Backend
cd backend
pip install -r requirements.txt
# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env
echo "GEMINI_MODEL=gemini-2.0-flash" >> .env- Set up Frontend
cd ../client
npm install- Start Backend Server (Terminal 1)
cd backend
python app.pyServer runs on http://localhost:5000
- Start Frontend (Terminal 2)
cd client
npm startApplication opens at http://localhost:3000
Ready-to-use demo datasets are available in backend/demo-datasets/:
- corporate-chaos.zip - 19 corporate documents with cloud provider, remote work, and sustainability contradictions
- city-council.zip - 11 municipal documents with climate and transportation policy conflicts
- live-demo.zip - 6 simple documents for quick testing
Simply upload any ZIP file through the Upload page!
- Upload Documents: Navigate to Upload page, drag & drop a ZIP file containing markdown documents
- View Analytics: See all processed documents, contradictions detected, and statistics
- Explore Graph: Visualize document relationships, click nodes to see connections
- Read Wiki: AI-generated summary synthesizing all documents
- Ask Questions: Use the chatbot to query your document collection
Issue: Frontend expected from/to fields for edges, but backend generated source/target fields.
Solution:
- Updated backend graph generation to use consistent field names
- Added normalization layer in frontend to handle both formats
- Fixed edge rendering logic in visualize.jsx
Issue: KeyError exceptions when accessing document relationships and insight structures.
Solution:
- Debugged by running
python generate_wiki.pydirectly to see error traces - Updated code to match actual JSON structure (e.g.,
nodesarray instead ofdoc1/doc2fields) - Fixed contradiction and obsolete document field references
Issue: Analytics page couldn't match documents to insights due to field name mismatches.
Solution:
- Standardized insight structure across backend
- Updated frontend to use correct field names (
nodesfor contradictions,obsolete_docfor obsolete documents) - Added defensive checks to handle missing fields gracefully
Issue: Text labels overlapped when nodes were too close together.
Solution:
- Increased circular layout radius from 200 to 280 pixels
- Adjusted text label positioning from y+45 to y+55
- Expanded SVG viewBox from 600 to 700 height
- Recentered graph (centerY 300β340) for better balance
Issue: Users couldn't easily explore relationships between connected documents.
Solution:
- Implemented
getConnectedNodes()function to find all connected documents - Added interactive "Connected Documents" section in sidebar
- Color-coded relationship types with clickable navigation
- Added smooth hover effects for better UX
- Google Gemini API - Large language model for contradiction detection, wiki generation, and chatbot functionality
- Hugging Face - Sentence Transformers model hosting
- Model:
sentence-transformers/all-MiniLM-L6-v2
- Model:
Frontend:
- React - Meta Platforms, Inc.
- React Router - Remix Software Inc.
- React Markdown - Titus Wormer
Backend:
- Flask - Pallets Projects
- Sentence Transformers - UKP Lab, TU Darmstadt
- scikit-learn - scikit-learn developers
- NumPy - NumPy Developers
Development:
- Create React App - Meta Platforms, Inc.
- python-dotenv - Saurabh Kumar
This project was inspired by the need to make organizational knowledge accessible, reduce information overload, and help teams stay aligned despite constantly changing documentation.
This project was created for UGAHacks 2026. All rights reserved by the team members.
Transform dark data into knowledge gold β¨
For questions or collaboration opportunities:
- Gabriel Falade - Georgia Institute of Technology
- Toye Oni - Georgia Institute of Technology
Project Repository: GitHub
Made with Claude Code π€