feat: Entity extraction uses JSON structured output instead of delimiter-based text#2684
Closed
MrGidea wants to merge 4 commits intoHKUDS:mainfrom
Closed
feat: Entity extraction uses JSON structured output instead of delimiter-based text#2684MrGidea wants to merge 4 commits intoHKUDS:mainfrom
MrGidea wants to merge 4 commits intoHKUDS:mainfrom
Conversation
…ter-based text - Add EntityExtractionResult Pydantic model for structured JSON output - Add JSON-mode prompt templates for entity/relationship extraction - Add _process_json_extraction_result() JSON parser in extraction pipeline - Add entity_extraction_use_json config option, default True - Add extraction_max_tokens config to prevent output truncation - OpenAI: use response_format json_object with auto-fallback retry - Ollama/Gemini: use native JSON mode for entity extraction - Other providers: pop entity_extraction kwarg for compatibility - Cache rebuild auto-detects JSON vs delimiter format - Skip relationships with empty descriptions to prevent merge errors
…ndles truncation)
Contributor
Author
|
Superseded by a new combined PR that includes the JSON structured extraction changes together with the newer multimodal and role-based pipeline updates, while excluding the entity disambiguation experiment. |
5 tasks
This was referenced Mar 20, 2026
35 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
response_format: json_object), Ollama (format: json), and Gemini (response_mime_type)response_format, automatically retry without it (relying on JSON prompt +json_repair)ENTITY_EXTRACTION_USE_JSONenv var (default:true), cache rebuild auto-detects JSON vs delimiter formatEXTRACTION_MAX_TOKENSconfig to prevent output truncation (many APIs default to only 1024)Changed Files
lightrag/types.py- AddEntityExtractionResultPydantic modellightrag/prompt.py- Add JSON-mode prompt templates and exampleslightrag/operate.py- Add_process_json_extraction_result()parser, modify extraction pipelinelightrag/utils.py- Passentity_extractionflag throughuse_llm_func_with_cachelightrag/lightrag.py- Addentity_extraction_use_jsonandextraction_max_tokensconfiglightrag/llm/openai.py-response_format: json_objectwith auto-fallback retrylightrag/llm/ollama.py-format="json"for entity extractionlightrag/llm/gemini.py-response_mime_type="application/json"for entity extractionlightrag/llm/*.py(others) - Popentity_extractionkwarg for compatibilityTest Plan