Inspiration
CaseConnect was inspired by the need for an efficient way to search and identify missing persons cases within the NamUs database, helping law enforcement and the public to quickly find relevant cases.
What it does
CaseConnect is a semantic search engine for the NamUs missing persons database. It scrapes case data, downloads associated images, and embeds both text and images into a vector space. Users can search the database using text or images, retrieving similar cases.
How we built it
The project was built by first creating a scraper to extract data from the NamUs database. Text and images were then embedded using text-embedding-ada-002 and ViT-bigG-14 models, respectively. The searching is performed using a nearest neighbor's lookup
Challenges we ran into
The main challenges were efficiently scraping the database, embedding text and images, and implementing various search functionalities like text-to-text, image-to-image, and text-to-image searches.
Accomplishments that we're proud of
We successfully developed a scraper and embedded text and images, laying the foundation for an efficient semantic search engine for the NamUs database.
What we learned
We learned how to effectively extract and process data, and apply advanced embedding techniques to create a powerful search engine for missing persons cases.
What's next for CaseConnect
Future plans include building the search front end, setting up the embedding and front-end infrastructure, adding a chat feature, and implementing stretch goals such as converting sketches to images for searching and enabling live generations from sketches with semantic search.
Built With
- ada-2
- open-clip
- openai
- python
- sklearn
Log in or sign up for Devpost to join the conversation.