feat: Entity extraction uses JSON structured output instead of delimiter-based text by MrGidea · Pull Request #2684 · HKUDS/LightRAG

MrGidea · 2026-02-07T08:08:04Z

Summary

Replace delimiter-based entity extraction with JSON structured output, significantly improving extraction quality and compatibility with smaller models
Support native JSON mode for OpenAI-compatible APIs (response_format: json_object), Ollama (format: json), and Gemini (response_mime_type)
Auto-fallback: if a provider doesn't support response_format, automatically retry without it (relying on JSON prompt + json_repair)
Backward compatible: configurable via ENTITY_EXTRACTION_USE_JSON env var (default: true), cache rebuild auto-detects JSON vs delimiter format
Add EXTRACTION_MAX_TOKENS config to prevent output truncation (many APIs default to only 1024)
Skip relationships with empty descriptions to prevent merge errors

Changed Files

lightrag/types.py - Add EntityExtractionResult Pydantic model
lightrag/prompt.py - Add JSON-mode prompt templates and examples
lightrag/operate.py - Add _process_json_extraction_result() parser, modify extraction pipeline
lightrag/utils.py - Pass entity_extraction flag through use_llm_func_with_cache
lightrag/lightrag.py - Add entity_extraction_use_json and extraction_max_tokens config
lightrag/llm/openai.py - response_format: json_object with auto-fallback retry
lightrag/llm/ollama.py - format="json" for entity extraction
lightrag/llm/gemini.py - response_mime_type="application/json" for entity extraction
lightrag/llm/*.py (others) - Pop entity_extraction kwarg for compatibility

Test Plan

Tested with Moonshot API (OpenAI-compatible, direct connection)
Tested with Google Gemini 2.0 Flash via OpenRouter
Tested with DeepSeek V3 via OpenRouter
Tested with Qwen 2.5 72B via OpenRouter
Tested with Meta Llama 3.1 70B via OpenRouter
Verified JSON parsing, gleaning, cache rebuild, and graph construction
Verified auto-fallback when response_format is not supported
Verified backward compatibility (ENTITY_EXTRACTION_USE_JSON=false)

…ter-based text - Add EntityExtractionResult Pydantic model for structured JSON output - Add JSON-mode prompt templates for entity/relationship extraction - Add _process_json_extraction_result() JSON parser in extraction pipeline - Add entity_extraction_use_json config option, default True - Add extraction_max_tokens config to prevent output truncation - OpenAI: use response_format json_object with auto-fallback retry - Ollama/Gemini: use native JSON mode for entity extraction - Other providers: pop entity_extraction kwarg for compatibility - Cache rebuild auto-detects JSON vs delimiter format - Skip relationships with empty descriptions to prevent merge errors

… ruff formatting

…ndles truncation)

MrGidea · 2026-03-08T08:06:14Z

Superseded by a new combined PR that includes the JSON structured extraction changes together with the newer multimodal and role-based pipeline updates, while excluding the entity disambiguation experiment.

MrGidea added 4 commits February 7, 2026 15:58

fix: resolve CI linting - add extraction_max_tokens definition, apply…

9416981

… ruff formatting

fix: apply ruff formatting to test file

1b52cc0

refactor: remove extraction_max_tokens (not essential, json_repair ha…

f025a06

…ndles truncation)

danielaskdd added enhancement New feature or request server LightRAG Server labels Feb 9, 2026

danielaskdd added tracked Issue is tracked by project labels Feb 23, 2026

MrGidea closed this Mar 8, 2026

MrGidea mentioned this pull request Mar 8, 2026

feat: integrate structured extraction and multimodal role-based pipeline #2756

Closed

5 tasks

BdM-15 mentioned this pull request Apr 30, 2026

Migrate entity extraction from tuple-delimited to JSON structured output (LightRAG dev branch) BdM-15/proj-theseus#124

Closed

35 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Entity extraction uses JSON structured output instead of delimiter-based text#2684

feat: Entity extraction uses JSON structured output instead of delimiter-based text#2684
MrGidea wants to merge 4 commits intoHKUDS:mainfrom
MrGidea:feat/json-structured-extraction

MrGidea commented Feb 7, 2026

Uh oh!

MrGidea commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MrGidea commented Feb 7, 2026

Summary

Changed Files

Test Plan

Uh oh!

MrGidea commented Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants