fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241) by Abdeltoto · Pull Request #260 · HKUDS/RAG-Anything

Abdeltoto · 2026-04-22T03:30:57Z

Summary

Closes #241.

Three call sites of lightrag.operate.merge_nodes_and_edges in the multimodal ingestion path were not forwarding the chunk-tracking storages, and one of them was also dropping the document-level entity / relation storages:

File	Method	Missing arguments before this PR
`raganything/processor.py`	`_process_multimodal_content_individual`	`entity_chunks_storage`, `relation_chunks_storage`
`raganything/processor.py`	`_batch_merge_lightrag_style_type_aware`	`entity_chunks_storage`, `relation_chunks_storage`
`raganything/modalprocessors.py`	`BaseModalProcessor._process_chunk_for_extraction`	`full_entities_storage`, `full_relations_storage`, `entity_chunks_storage`, `relation_chunks_storage`

Because all four parameters default to None, the calls succeeded silently — but entity-to-chunk and relation-to-chunk mappings for multimodal entities (images, tables, equations) were never persisted to kv_store_entity_chunks.json / kv_store_relation_chunks.json. Text-only ingestion was unaffected because LightRAG itself populates those mappings during its own pipeline.

What this PR does

Forwards self.lightrag.full_entities, self.lightrag.full_relations, self.lightrag.entity_chunks, and self.lightrag.relation_chunks to merge_nodes_and_edges, mirroring the way LightRAG invokes the function during its native text ingestion path. The diff matches the suggested fix in #241 exactly.

Verification

Confirmed against lightrag.operate.merge_nodes_and_edges signature (parameters entity_chunks_storage, relation_chunks_storage, full_entities_storage, full_relations_storage are all optional BaseKVStorage slots — passing them is strictly additive).
Confirmed LightRAG exposes the four storages as instance attributes (self.full_entities, self.full_relations, self.entity_chunks, self.relation_chunks) at construction time in lightrag/lightrag.py.
BaseModalProcessor.__init__ already retains a self.lightrag reference, so the new forwarding lines are safe with no constructor change.

Backward compatibility

No public API changes.
No new dependencies.
Behaviour is strictly additive: chunk-tracking storages now receive the same data they would have received in pure text ingestion.

Test plan

ruff format and ruff check --ignore=E402 pass on the touched files.
Maintainer-side: ingest a PDF with images/tables and confirm kv_store_entity_chunks.json / kv_store_relation_chunks.json now contain entries whose IDs match the multimodal chunks.

Many thanks to @ashah1992 for the precise diagnosis in #241.

…nodes_and_edges (closes HKUDS#241) During multimodal ingestion, three call sites of `lightrag.operate.merge_nodes_and_edges` were missing the `entity_chunks_storage` and `relation_chunks_storage` arguments (and additionally `full_entities_storage` / `full_relations_storage` in `BaseModalProcessor._process_chunk_for_extraction`). Because these parameters default to `None`, calls succeeded silently but entity-to-chunk and relation-to-chunk mappings for multimodal entities were never persisted to `kv_store_entity_chunks.json` / `kv_store_relation_chunks.json`, degrading retrieval quality for image and table content. Forward all four storage instances from the wrapped LightRAG instance, matching the way LightRAG itself invokes the function during text ingestion. Made-with: Cursor

Abdeltoto · 2026-04-22T03:51:59Z

Closing this PR in favor of #247 (@sjhddh) and #250 (@peterCheng123321), which were both opened a few hours before mine and which I hadn't noticed when I pushed this — apologies for the duplicate noise.

For maintainers triaging the three: #250 is the most complete of the three because it also forwards doc_id to BaseModalProcessor._process_chunk_for_extraction, which #247 and this PR both miss. #247 has the most thorough description and groups the new kwargs more uniformly. Either approach fully resolves #241; an ideal merge would be #250's coverage with #247's kwarg ordering.

Reviews left on both. Thanks for the great work, folks.

Abdeltoto marked this pull request as ready for review April 22, 2026 03:45

This was referenced Apr 22, 2026

fix: pass entity_chunks/relation_chunks storages to merge_nodes_and_edges (fixes #241) #247

Closed

fix: pass entity_chunks_storage and relation_chunks_storage to all merge_nodes_and_edges calls #250

Merged

Abdeltoto closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241)#260

fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241)#260
Abdeltoto wants to merge 1 commit into
HKUDS:mainfrom
Abdeltoto:fix/merge-nodes-edges-storage-params

Abdeltoto commented Apr 22, 2026

Uh oh!

Abdeltoto commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Abdeltoto commented Apr 22, 2026

Summary

What this PR does

Verification

Backward compatibility

Test plan

Uh oh!

Abdeltoto commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant