Inspiration
The project was inspired by the need for a smarter and more efficient way to interact with movie-related data. Traditional search engines provide generic results, and while IMDb has extensive information, filtering insights and retrieving structured data efficiently can be challenging. By leveraging AI-powered natural language processing (NLP) and graph-based databases like ArangoDB, we aimed to build a conversational AI tool that can answer complex movie-related queries intelligently.
What it does
ASK-IMDB is a conversational AI tool that allows users to ask questions about movies, actors, directors, ratings, and more. It integrates ArangoDB (a graph database) to store and retrieve structured movie data efficiently and uses LangChain with Google Generative AI to process natural language queries. The system can handle queries like "Which movies were directed by Christopher Nolan and have a rating above 8.0?" or "What are some movies similar to Inception?" and provide precise, AI-driven responses.
How we built it
- Database Setup: We used ArangoDB as our primary database to store movie-related data in a graph format. The dataset was imported using arango-datasets.
- AI & NLP Integration: We utilized LangChain and Google Generative AI to interpret and process user queries.
- Query Execution: The AI interprets the query and converts it into a structured database query using ArangoDB’s AQL (ArangoDB Query Language).
- Response Generation: The retrieved data is then processed and returned in a natural, easy-to-understand format.
- Deployment: The system was developed in a Jupyter Notebook environment and can be expanded into a full-scale application.
Challenges we ran into
- Data Ingestion: Loading and structuring IMDb-like datasets into ArangoDB was a challenge due to its complex relational nature.
- Query Optimization: Translating natural language queries into optimized AQL queries required fine-tuning.
- AI Model Fine-Tuning: Ensuring that LangChain and Google Generative AI provided relevant and accurate responses needed experimentation.
- Handling Large Datasets: Efficiently managing and querying large datasets in real-time posed scalability challenges.
Accomplishments that we're proud of
- Successfully integrated LangChain and ArangoDB to enable intelligent movie queries.
- Optimized graph-based queries for speed and accuracy.
- Built a system capable of answering complex multi-condition movie queries.
- Learned how to fine-tune AI models for domain-specific use cases.
What we learned
- Graph databases like ArangoDB are powerful for movie-related queries, especially when dealing with relationships (e.g., actors, directors, genres).
- AI-driven NLP can significantly enhance how users interact with databases.
- LangChain provides a flexible way to connect AI models with structured databases.
- Optimizing database queries is crucial for performance when dealing with large datasets.
What's next for ASK-IMDB
- Expand Dataset: Add more metadata, including user reviews and box office collections.
- Deploy as a Web App: Build a React or Flask-based frontend for user interaction.
- Improve Query Understanding: Enhance AI-driven query parsing for even more complex user queries.
- Support for Recommender System: Use AI to suggest movies based on user preferences.
Built With
- arangodb
- cugraph
- langchain
- networkx
Log in or sign up for Devpost to join the conversation.