Inspiration
The Database of AI Litigation (DAIL) is a critical public-interest resource tracking legal disputes involving artificial intelligence technologies. However, its backend relied on a low-code system that limited structured querying, scalability, and programmatic access. As AI-related litigation continues to grow in both volume and complexity, the underlying infrastructure risked becoming a bottleneck for research and long-term sustainability.
We were inspired to redesign the backend architecture to support structured legal research, enforce data integrity, and provide scalable API access for future expansion.
What it does
SchemaForge modernizes the DAIL backend by:
- Redesigning the database using PostgreSQL
- Applying normalization up to Third Normal Form (3NF)
- Enforcing primary keys, foreign keys, and referential integrity
- Extracting multi-value fields into structured reference tables
- Building a clean ETL pipeline for migration and transformation
- Deploying a REST API using FastAPI for programmatic access
The system transforms unstructured, loosely connected data into a fully relational, queryable, and scalable backend suitable for research-grade analysis.
How we built it
We began by analyzing the legacy dataset and identifying structural limitations such as missing primary keys, comma-separated multi-value fields, and implied relationships.
We redesigned the schema in PostgreSQL with:
- Surrogate primary keys (e.g., case_id)
- One-to-many relationships (Cases → Dockets → Documents)
- Many-to-many bridge tables (Cases ↔ Issues, Causes, Algorithms, Organizations)
- Unique constraints and foreign key enforcement
We then built a Python ETL pipeline to:
- Load Excel sheets
- Clean missing and inconsistent values
- Normalize multi-value columns
- Insert base entities first
- Insert bridge relationships
- Enforce constraints during migration
The database was deployed on Supabase, and a REST API was built using FastAPI and SQLAlchemy. The API automatically generates OpenAPI documentation and supports structured retrieval across entities.
The API was deployed using Render to provide a live, testable backend.
Challenges we ran into
- The legacy data had no enforced primary keys.
- Multi-value fields required careful parsing and normalization.
- Missing and inconsistent values required defensive ETL logic.
- Mapping documents through dockets required hierarchical modeling.
- Cloud deployment required handling environment variables and secure credentials.
- Ensuring no duplicate records while preserving data completeness required careful constraint design.
Accomplishments that we're proud of
- Achieving full normalization up to Third Normal Form.
- Enforcing referential integrity at the database level.
- Eliminating duplicate and inconsistent classifications.
- Successfully migrating legacy Excel data into a structured PostgreSQL schema.
- Deploying a scalable API layer with automatic documentation.
- Delivering a production-ready cloud deployment on Supabase and Render. We transformed a low-code backend into a relational architecture designed for long-term sustainability.
What we learned
- Data modeling decisions directly impact research usability.
- Normalization significantly improves consistency and scalability.
- Enforcing integrity at the database layer is more reliable than relying on application logic.
- ETL design is as important as schema design.
- Cloud deployment introduces operational considerations beyond local development.
- Building structured APIs forces clarity in schema design.
What's next for SchemaForge
- Implement advanced filtering and join-based search endpoints.
- Add full-text search capabilities for legal research.
- Introduce authentication and role-based access control.
- Add indexing and performance optimization for large-scale querying.
- Integrate analytics dashboards for legal trend analysis.
SchemaForge is designed as a scalable foundation for the future growth of AI litigation research.
Built With
- fastapi
- postgresql
- python
- render
- sql
- supabase
Log in or sign up for Devpost to join the conversation.