Skip to content

ertugrulakben/NeuroCausal-RAG

Repository files navigation

NeuroCausal RAG

Causality-Aware Retrieval-Augmented Generation
Find what keyword search can't β€” by understanding why things are connected.

v6.0.0 Python 3.10+ 187 Tests MIT License

Problem β€’ Solution β€’ Quick Start β€’ Search Modes β€’ API β€’ Benchmarks


Research Context: In June 2025, researchers at the University of Illinois Urbana-Champaign published "CC-RAG: Structured Multi-Hop Reasoning via Theme-Based Causal Graphs" β€” a breakthrough paper that brought causal reasoning into RAG systems. The academic world was excited: RAG could finally "understand and connect," not just "find and fetch."

We had already been doing this for two months. NeuroCausal RAG v5.0 was deployed to production in April 2025. The causal engine, multi-hop retrieval, and chain injection were already running in real enterprise environments.

This is how we work: we build for real clients first, battle-test in production, then open-source. Our personal AI system JARVIS has been alive for 5 years and operating as an autonomous agent for 3 years β€” months before platforms like OpenClaw existed. We plan to open-source that too.

Read more: Our 2025 AI R&D: NeuroCausal RAG, DSGMv2, and 100+ SaaS Projects


The Problem

Classic RAG systems retrieve documents by keyword similarity. Search for "stress" and you get documents containing the word "stress."

But real-world knowledge doesn't work that way:

Stress β†’ Cortisol rises β†’ Sleep disrupted β†’ Attention drops β†’ Workplace accident risk increases

If your system can't see this chain, it misses critical connections.

The academic world noticed this in June 2025 when UIUC researchers published CC-RAG, introducing causal graphs into RAG.

We deployed this to production in April 2025. Two months earlier.

The Solution

NeuroCausal RAG builds a causal knowledge graph on top of your documents and retrieves information by understanding why things are connected β€” not just what words they share.

Classic RAG NeuroCausal RAG
Retrieval Keyword similarity Cause-effect relationships
Search for "stress" Documents about stress + Cortisol, sleep, workplace accidents
Hops Single (1-hop) N-degree (multi-hop)
Scoring Vector distance Hybrid: Similarity + Causal + PageRank
Memory None Persistent feedback loop
Contradictions Ignored Detected and flagged

Architecture

NeuroCausal RAG v6.0
β”œβ”€β”€ Core Layer
β”‚   β”œβ”€β”€ Causal Knowledge Graph (NetworkX / Neo4j)
β”‚   β”œβ”€β”€ Multilingual Embeddings (Sentence-BERT)
β”‚   └── Vector Index (BruteForce / FAISS / Milvus)
β”‚
β”œβ”€β”€ Search & Retrieval
β”‚   β”œβ”€β”€ Hybrid Retriever (Similarity + Causal + Importance)
β”‚   β”œβ”€β”€ Multi-Hop Search (N-hop path finding + bridge docs)
β”‚   β”œβ”€β”€ Search Optimizer (6 adaptive modes)
β”‚   └── Query Decomposer (complex β†’ sub-queries)
β”‚
β”œβ”€β”€ Reasoning
β”‚   β”œβ”€β”€ Contradiction Detector
β”‚   β”œβ”€β”€ Temporal Reasoner
β”‚   └── Entity Linker (alias resolution)
β”‚
β”œβ”€β”€ Learning
β”‚   β”œβ”€β”€ Causal Discovery (semantic + NLI + funnel)
β”‚   β”œβ”€β”€ Feedback Loop (RLHF)
β”‚   └── Persistent Memory (SQLite)
β”‚
β”œβ”€β”€ Agentic RAG
β”‚   └── LangGraph Self-Correcting Agent
β”‚
└── API & UI
    β”œβ”€β”€ FastAPI REST API
    └── Streamlit Dashboard

Scoring Formula

Final Score = Ξ± Γ— Similarity + Ξ² Γ— Causal + Ξ³ Γ— Importance

Multi-Hop Decay: hop_score = base_score Γ— (0.7 ^ hop_distance)

Quick Start

Installation

git clone https://github.com/ertugrulakben/NeuroCausal-RAG.git
cd NeuroCausal-RAG
pip install -r requirements.txt

Basic Usage

from neurocausal_rag import NeuroCausalRAG

rag = NeuroCausalRAG()

# Add documents
rag.add_document("cement", "Cement production is responsible for 8% of global CO2 emissions.")
rag.add_document("co2", "CO2 is the primary greenhouse gas driving climate change.")
rag.add_document("warming", "Global warming causes sea level rise and extreme weather.")

# Add causal links
rag.add_causal_link("cement", "co2", "causes")
rag.add_causal_link("co2", "warming", "causes")

# Search β€” finds cement even though query doesn't mention it
results = rag.search("What causes global warming?")
# β†’ Returns: co2, warming, AND cement (via causal chain)

Multi-Hop Search

from neurocausal_rag.search import create_multi_hop_retriever

retriever = create_multi_hop_retriever(graph, embedding, max_hops=3)
results = retriever.search("How does cement affect sea levels?")

# Discovered chain:
# Cement Production β†’ CO2 Emissions β†’ Global Warming β†’ Sea Level Rise
explanation = retriever.explain_connection("cement", "warming")

Docker

docker-compose up -d
# API: http://localhost:8000
# UI: http://localhost:8501

Search Modes

6 preset modes for different retrieval strategies:

Mode Ξ± (Similarity) Ξ² (Causal) Ξ³ (Importance) Best For
BALANCED 0.5 0.3 0.2 General purpose
ENCYCLOPEDIA 0.7 0.2 0.1 Factual queries
DETECTIVE 0.3 0.5 0.2 Cause-effect investigation
HUB 0.3 0.2 0.5 Finding central documents
EXPLORER 0.4 0.3 0.3 Open-ended research
FACT_CHECKER 0.6 0.3 0.1 Verification tasks
from neurocausal_rag.search import create_optimizer

optimizer = create_optimizer(graph, embedding)
results = optimizer.search("Why did the bridge collapse?", mode="DETECTIVE")

API

Full REST API via FastAPI:

uvicorn neurocausal_rag.api.app:create_app --factory --host 0.0.0.0 --port 8000

Endpoints

Method Endpoint Description
POST /api/v1/search Search with causal reasoning
POST /api/v1/documents Add documents
GET /api/v1/documents List documents
POST /api/v1/documents/links Add causal links
POST /api/v1/agent/query Agentic RAG query
POST /api/v1/feedback Submit feedback
POST /api/v1/discovery Auto-discover causal links
GET /api/v1/graph/stats Graph statistics
POST /api/v1/graph/chain Get causal chain
GET /api/v1/health Health check
curl -X POST http://localhost:8000/api/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "What causes global warming?", "top_k": 5, "mode": "DETECTIVE"}'

Benchmarks

Case Study: The Invisible Connection

Query: "How do greenhouse gases cause global warming?"

Metric Classic RAG NeuroCausal RAG
Search Time 37 ms 22 ms
Documents Found Greenhouse effect, Gases + Cement Production
Causal Score 0.00 1.00
Multi-Hop None 3-hop chain

Discovered chain:

Cement Production β†’ CO2 Emissions β†’ Greenhouse Gas β†’ Global Warming

The word "cement" appears nowhere in the query β€” but the causal chain reveals the connection.

vs. CC-RAG (UIUC, June 2025)

Feature CC-RAG (June 2025) NeuroCausal RAG (April 2025)
Causal Graph DAG structure NetworkX + Neo4j
Multi-Hop Theme-based chaining N-hop + bridge documents
Bidirectional Search Yes Yes
Memory System No Persistent (SQLite)
Query Decomposition No Sub-query system
Contradiction Detection No Yes
Temporal Reasoning No Yes
Entity Linking No Yes (alias resolution)
Enterprise Ready Academic Production deployed
Published June 2025 April 2025

Testing

pytest tests/ -v

# With coverage
pytest tests/ --cov=neurocausal_rag --cov-report=html
Test Distribution (v6.1)
β”œβ”€β”€ Core (graph, node, edge): 35 tests
β”œβ”€β”€ Search (retriever, multi_hop, optimizer, decomposer): 66 tests
β”œβ”€β”€ Learning (discovery, entity, temporal, contradiction): 42 tests
β”œβ”€β”€ Memory: 24 tests
β”œβ”€β”€ Integration: 20 tests
β”œβ”€β”€ API Routes: 58 tests
β”œβ”€β”€ Config Validation: 70 tests
β”œβ”€β”€ LLM Client: 34 tests
└── Imports & Exports: 33 tests
─────────────────────────────
Total: 382 tests, 0 failures

Project Structure

neurocausal_rag/
β”œβ”€β”€ core/           # Graph engine, nodes, edges
β”œβ”€β”€ embedding/      # Sentence-BERT multilingual
β”œβ”€β”€ search/         # Retriever, multi-hop, optimizer, decomposer
β”œβ”€β”€ learning/       # Causal discovery, feedback, pipeline
β”œβ”€β”€ entity/         # Entity linking, NER
β”œβ”€β”€ reasoning/      # Contradiction detection, temporal reasoning
β”œβ”€β”€ memory/         # Persistent memory store
β”œβ”€β”€ agents/         # LangGraph agentic RAG
β”œβ”€β”€ api/            # FastAPI REST endpoints
β”œβ”€β”€ llm/            # LLM client (OpenAI)
β”œβ”€β”€ visualization/  # Graph visualization (PyVis)
└── ui/             # Streamlit components

Configuration

cp .env.example .env
# Set your API keys in .env

Roadmap

  • Causal knowledge graph
  • Multi-hop retrieval
  • 6 search modes
  • Contradiction detection
  • Temporal reasoning
  • Entity linking
  • Persistent memory (RLHF)
  • REST API (FastAPI)
  • Agentic RAG (LangGraph)
  • Enterprise backends (Neo4j, Milvus)
  • 187 tests
  • Batch processing
  • Advanced UI
  • PyPI package

Built By

Ertugrul Akben β€” AI & Systems Strategist

License

MIT


Because knowing "what" is not enough β€” you need to know "why."

About

Causality-aware RAG with multi-hop retrieval, contradiction detection, and temporal reasoning. 382 tests, 11K+ LOC. Deployed 2 months before CC-RAG (UIUC).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors