Skip to content

ReinhardSellmair/graph_rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

README

This repo compares a Graph RAG with a SQL RAG on a dataset with Formula 1 results.

Setup

  1. Clone the repository:

    git clone <repo-url>
    cd graph_rag
  2. Create and activate a virtual environment:

    python -m venv .venv
    source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Download Formula 1 results Create a res directory in this repo and download all files from here in this directory.

  5. Set up a neo4j graph database Follow these instructions to set up a neo4j graph database.

  6. Define configuration Create a config.py file in the src directory and set following constants to connect to the neo4j and the SQL database:

    # path to database
    DATABASE_PATH = "../res/f1-results.db"
    
    # Neo4j connection details
    NEO4J_URL = 
    NEO4J_USERNAME = 
    NEO4J_PASSWORD = 
    DATABASE_NAME = 'f1-results'
  7. Populate databases Run the notebooks notebooks/create_sql_db.ipynb and notebooks/build_graph_db.ipynb to write the data from the files to the SQL and graph databases.

  8. Setting Up OpenAI API Key

    To use the OpenAI API for generating questions, set up the API key as follows:

    • Sign up or log in to an OpenAI account at OpenAI’s website.

    • Open the API Keys section under account settings.

    • Generate a new API key and copy it.

    • In your terminal, set the OPENAI_API_KEY environment variable to your key:

      On Linux/macOS:

      export OPENAI_API_KEY=your-api-key

      On Windows (Command Prompt):

      set OPENAI_API_KEY=your-api-key

      On Windows (PowerShell):

      $env:OPENAI_API_KEY="your-api-key"

    Keep the key secure and never share it publicly.

Evaluate RAG Models

The file src/qa_chain contains the class GraphQAChain which transfroms a question to a SQL or cypher query, executes the query and transforms the result to an answer. This class is initialized by providing a langchain_neo4j.Neo4jGraph objected connected to the graph database or langchain_community.utilities.SQLDatabase objected connected to the SQL database. The interface to the LLM model is provided via a langchain_openai.ChatOpenAI object.

Example of queriying the SQL database:

import sys
sys.path.append('../src')

from langchain_community.utilities import SQLDatabase
from langchain_openai import ChatOpenAI

from qa_chain import GraphQAChain
from config import DATABASE_PATH

# connect to database
connection_string = f"sqlite:///{DATABASE_PATH}"
db = SQLDatabase.from_uri(connection_string)

# llm interface
llm = ChatOpenAI(temperature=0, model="gpt-5")

# initialize chain
chain = GraphQAChain(llm, db, db_type='SQL', verbose=True)

# invoke chain
chain.invoke("Give me the driver names and years when the driver on pole position won the Singapore GP.")

prints:

{'write_query': {'query': "SELECT d.forename, d.surname, ra.year\nFROM races ra\nJOIN qualifying q ON q.raceId = ra.raceId AND q.position = 1\nJOIN results re ON re.raceId = ra.raceId AND re.positionOrder = 1 AND re.driverId = q.driverId\nJOIN drivers d ON d.driverId = re.driverId\nWHERE ra.name = 'Singapore Grand Prix'\nORDER BY ra.year\nLIMIT 10;"}}
{'execute_query': {'result': "[('Lewis', 'Hamilton', 2009), ('Fernando', 'Alonso', 2010), ('Sebastian', 'Vettel', 2011), ('Sebastian', 'Vettel', 2013), ('Lewis', 'Hamilton', 2014), ('Sebastian', 'Vettel', 2015), ('Nico', 'Rosberg', 2016), ('Lewis', 'Hamilton', 2018), ('Carlos', 'Sainz', 2023), ('Lando', 'Norris', 2024)]"}}
{'generate_answer': {'answer': '- 2009 — Lewis Hamilton\n- 2010 — Fernando Alonso\n- 2011 — Sebastian Vettel\n- 2013 — Sebastian Vettel\n- 2014 — Lewis Hamilton\n- 2015 — Sebastian Vettel\n- 2016 — Nico Rosberg\n- 2018 — Lewis Hamilton\n- 2023 — Carlos Sainz\n- 2024 — Lando Norris'}}

returns:

{'query': 'Give me the driver names and years when the driver on pole position won the Singapore GP.',
 'result': '- 2009 — Lewis Hamilton\n- 2010 — Fernando Alonso\n- 2011 — Sebastian Vettel\n- 2013 — Sebastian Vettel\n- 2014 — Lewis Hamilton\n- 2015 — Sebastian Vettel\n- 2016 — Nico Rosberg\n- 2018 — Lewis Hamilton\n- 2023 — Carlos Sainz\n- 2024 — Lando Norris'}

Example of querying graph database

import sys
sys.path.append('../src')

from langchain_neo4j import Neo4jGraph
from langchain_openai import ChatOpenAI

from qa_chain import GraphQAChain
from config import NEO4J_URL, NEO4J_USERNAME, NEO4J_PASSWORD, DATABASE_NAME

# connect to graph database
graph = Neo4jGraph(url=NEO4J_URL, username=NEO4J_USERNAME, password=NEO4J_PASSWORD, database=DATABASE_NAME, 
                   enhanced_schema=True)

# llm interface
llm = ChatOpenAI(temperature=0, model="gpt-5")

# initialize qa chain
chain = GraphQAChain(llm, graph, db_type='Cypher', verbose=True)

# invoke chain
chain.invoke("Give me the driver names and years when the driver on pole position won the Singapore GP.")

prints:

{'write_query': {'query': "MATCH (r:Race)-[:AT]->(ci:Circuit)\nWHERE r.name CONTAINS 'Singapore' OR ci.country = 'Singapore'\nMATCH (car:Car)-[q:QUALIFIED]->(r)\nWHERE q.position = 1\nMATCH (car)-[res:RACED]->(r)\nWHERE res.position = 1\nMATCH (d:Driver)-[:DROVE]->(car)\nRETURN DISTINCT d.forename + ' ' + d.surname AS driverName, r.year AS year\nORDER BY year DESC\nLIMIT 10"}}
{'execute_query': {'result': [{'driverName': 'Lando Norris', 'year': 2024}, {'driverName': 'Carlos Sainz', 'year': 2023}, {'driverName': 'Lewis Hamilton', 'year': 2018}, {'driverName': 'Nico Rosberg', 'year': 2016}, {'driverName': 'Sebastian Vettel', 'year': 2015}, {'driverName': 'Lewis Hamilton', 'year': 2014}, {'driverName': 'Sebastian Vettel', 'year': 2013}, {'driverName': 'Sebastian Vettel', 'year': 2011}, {'driverName': 'Fernando Alonso', 'year': 2010}, {'driverName': 'Lewis Hamilton', 'year': 2009}]}}
{'generate_answer': {'answer': '- 2024: Lando Norris\n- 2023: Carlos Sainz\n- 2018: Lewis Hamilton\n- 2016: Nico Rosberg\n- 2015: Sebastian Vettel\n- 2014: Lewis Hamilton\n- 2013: Sebastian Vettel\n- 2011: Sebastian Vettel\n- 2010: Fernando Alonso\n- 2009: Lewis Hamilton'}}

returns:

{'query': 'Give me the driver names and years when the driver on pole position won the Singapore GP.',
 'result': '- 2024: Lando Norris\n- 2023: Carlos Sainz\n- 2018: Lewis Hamilton\n- 2016: Nico Rosberg\n- 2015: Sebastian Vettel\n- 2014: Lewis Hamilton\n- 2013: Sebastian Vettel\n- 2011: Sebastian Vettel\n- 2010: Fernando Alonso\n- 2009: Lewis Hamilton'}

Comparing Models

Following LLM models were used to invoke the SQL and graph database: gpt-3.5-turbo, gpt-4, gpt-5. Each model-database combination answered following questions in notebooks/cypher_qa.ipynb and notebooks/sql_qa.ipynb:

  • How many drivers are from Brazil?
  • What's the total number of races?
  • Which constructor competed in the most races?
  • When was the first F1 season?
  • Which circuit hosted the most F1 races?
  • What driver won the 1992 Belgian Grand Prix?
  • Which team had the fastest pit stop at the Italian Grand Prix 2022?
  • How many drivers drove more than 100 races?
  • Which driver had the most race retirements?
  • Which constructor won the most driver world championships?
  • What circuits were included in both: the first and the most recent season?
  • How many races had the same driver who became first in qualifying and race?
  • Which teams won more than 3 constructor world championships?
  • What are the top 5 driver nationalities who competed in the most races?
  • What's the driver with the highest number of wins at the same circuit?
  • Which driver won the most world championships?
  • Who won the Monaco Grand Prix in 2020?
  • Who won the Formula 2 championship in 2017?
  • Which seasons had more than 25 races?
  • In which lap did Michael Schumacher overtake Ayrton Senna at the Brazilian Grand Prix 1995?

The results saved in qa_results.csv.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors