AI LLM Generated Knowledge Graph

knowledge graph generated for "Edison, NJ"
knowledge graph generated for "Tesla, Inc."
knowledge graph generated for "Russia and Ukraine war"
knowledge graph generated for "How can the United States utilize nuclear technology for the betterment of society?" (AP Seminar question)

Inspiration

During my time in AP Seminar, I had to research questions and split them up into multiple lenses and stakeholders. This program is the answer to that. It would automatically split up a large topic or question into smaller more specific topics and name the relationship between them. There are many tools (especially with the advent of Large Language models like GPT) that can provide a lot of data about a topic. But what is missing is the proper organization of the knowledge with entities and their relationships. I wanted to build a tool that can, given a topic, use various tools, especially AI, and create a knowledge graph.

What it does

This application will take the input from the user; it then uses AI to fetch Wikipedia articles similar to the input; it then calls GPT API to organize the results into a knowledge graph.

How I built it

This application will take the input of a topic or a question from the user
It will then use textai model, running on Hugging Face infrastructure, to vector search for Wikipedia articles that are similar to the topic or the question
It will then call openAI GPT API with the fetched Wikipedia article provided as the prompt context (prompt engineering) to gpt3.5,
gpt3.5 can recognize named entity and its relationship, and get the results back as JSON
It will then call matplotlib and networkx library to display the JSON as a graph with nodes and edges

Challenges I ran into

Both textai and GPT are prone to hallucination, and this affects the quality of the knowledge graph
The application is dependent on the quality of the input (prompt) from the user. If the prompt is too specific, the similarity search will not get proper articles from Wikipedia. If the prompt is too general, textai and GPT will not provide accurate results
OpenAI GPT API calls are not free, so I needed to be very cautious of the cost

Accomplishments that I am proud of

The sheer volume of high-quality data that this tool could process is noteworthy. It has access to the entire Wikipedia and the latest LLM model to build the knowledge graph.
It is able to organize the nodes and edges accurately and create a large network of relationships. This is very helpful to find the indirect relationships between nodes that are far removed from each other
If one provides a good prompt, the accuracy of the vector similarity to identify the Wikipedia articles and the Named Entity and its Relationship was found to be very high.

What I learned

I learned the technical details of different AI models (like GPT and TextAI)
I am surprised by the power of LLM AI. A couple of years ago this project could not be completed.
I learned how I could string together best-of-breed libraries and models, to get me the best results

What's next for AI LLM Generated Knowledge Graph

I spent most of the time developing the back end. If I had more time, then I would have developed a browser-based Verbwire-based Web3 UI where the user can input their topic or question and visualize the graph. In addition, at present the graph is a static image (PNG), I would like to make the graph interactive, which the user can expand/collapse/traverse and use Verbwire UI. Lastly, I would transition away from Wikipedia as anyone can edit it.

Built With

gpt
matplotlib
networkx
python
textai
wikipedia

Updates

Rishish Thomas started this project — Jun 09, 2024 06:15 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.