Skip to content

fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241)#260

Closed
Abdeltoto wants to merge 1 commit into
HKUDS:mainfrom
Abdeltoto:fix/merge-nodes-edges-storage-params
Closed

fix: pass entity_chunks_storage and relation_chunks_storage to merge_nodes_and_edges (closes #241)#260
Abdeltoto wants to merge 1 commit into
HKUDS:mainfrom
Abdeltoto:fix/merge-nodes-edges-storage-params

Conversation

@Abdeltoto

Copy link
Copy Markdown
Contributor

Summary

Closes #241.

Three call sites of lightrag.operate.merge_nodes_and_edges in the multimodal ingestion path were not forwarding the chunk-tracking storages, and one of them was also dropping the document-level entity / relation storages:

File Method Missing arguments before this PR
raganything/processor.py _process_multimodal_content_individual entity_chunks_storage, relation_chunks_storage
raganything/processor.py _batch_merge_lightrag_style_type_aware entity_chunks_storage, relation_chunks_storage
raganything/modalprocessors.py BaseModalProcessor._process_chunk_for_extraction full_entities_storage, full_relations_storage, entity_chunks_storage, relation_chunks_storage

Because all four parameters default to None, the calls succeeded silently — but entity-to-chunk and relation-to-chunk mappings for multimodal entities (images, tables, equations) were never persisted to kv_store_entity_chunks.json / kv_store_relation_chunks.json. Text-only ingestion was unaffected because LightRAG itself populates those mappings during its own pipeline.

What this PR does

Forwards self.lightrag.full_entities, self.lightrag.full_relations, self.lightrag.entity_chunks, and self.lightrag.relation_chunks to merge_nodes_and_edges, mirroring the way LightRAG invokes the function during its native text ingestion path. The diff matches the suggested fix in #241 exactly.

Verification

  • Confirmed against lightrag.operate.merge_nodes_and_edges signature (parameters entity_chunks_storage, relation_chunks_storage, full_entities_storage, full_relations_storage are all optional BaseKVStorage slots — passing them is strictly additive).
  • Confirmed LightRAG exposes the four storages as instance attributes (self.full_entities, self.full_relations, self.entity_chunks, self.relation_chunks) at construction time in lightrag/lightrag.py.
  • BaseModalProcessor.__init__ already retains a self.lightrag reference, so the new forwarding lines are safe with no constructor change.

Backward compatibility

  • No public API changes.
  • No new dependencies.
  • Behaviour is strictly additive: chunk-tracking storages now receive the same data they would have received in pure text ingestion.

Test plan

  • ruff format and ruff check --ignore=E402 pass on the touched files.
  • Maintainer-side: ingest a PDF with images/tables and confirm kv_store_entity_chunks.json / kv_store_relation_chunks.json now contain entries whose IDs match the multimodal chunks.

Many thanks to @ashah1992 for the precise diagnosis in #241.

…nodes_and_edges (closes HKUDS#241)

During multimodal ingestion, three call sites of `lightrag.operate.merge_nodes_and_edges`
were missing the `entity_chunks_storage` and `relation_chunks_storage` arguments
(and additionally `full_entities_storage` / `full_relations_storage` in
`BaseModalProcessor._process_chunk_for_extraction`). Because these parameters
default to `None`, calls succeeded silently but entity-to-chunk and
relation-to-chunk mappings for multimodal entities were never persisted to
`kv_store_entity_chunks.json` / `kv_store_relation_chunks.json`, degrading
retrieval quality for image and table content.

Forward all four storage instances from the wrapped LightRAG instance, matching
the way LightRAG itself invokes the function during text ingestion.

Made-with: Cursor
@Abdeltoto

Copy link
Copy Markdown
Contributor Author

Closing this PR in favor of #247 (@sjhddh) and #250 (@peterCheng123321), which were both opened a few hours before mine and which I hadn't noticed when I pushed this — apologies for the duplicate noise.

For maintainers triaging the three: #250 is the most complete of the three because it also forwards doc_id to BaseModalProcessor._process_chunk_for_extraction, which #247 and this PR both miss. #247 has the most thorough description and groups the new kwargs more uniformly. Either approach fully resolves #241; an ideal merge would be #250's coverage with #247's kwarg ordering.

Reviews left on both. Thanks for the great work, folks.

@Abdeltoto Abdeltoto closed this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:merge_nodes_and_edges calls missing entity_chunks_storage and relation_chunks_storage during multimodal processing

1 participant