fix: store multimodal_processed in separate KV namespace to prevent DocProcessingStatus errors#253
Conversation
LightRAG's Server API deserializes doc_status records into DocProcessingStatus dataclass objects. Because RAGAnything was injecting a 'multimodal_processed' key directly into those records, any version of the dataclass that did not declare that field raised: DocProcessingStatus.__init__() got an unexpected keyword argument 'multimodal_processed' causing 500 errors on /documents/paginated and similar endpoints. Fix: introduce a dedicated 'raganything_multimodal_status' KV namespace (same storage class as parse_cache) to hold per-document multimodal processing state. LightRAG's own doc_status records are no longer modified with extra fields, so DocProcessingStatus deserialization always succeeds. All read/write paths in processor.py are updated accordingly. Fixes HKUDS#91 Fixes HKUDS#119 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Thanks for working on this. I reviewed the separate I don't think this should be merged as an isolated small fix yet. Moving Before this can land, I think we need at least:
So I would hold this for a design pass rather than merging it directly. |
|
@peterCheng123321 Thanks for raising this — closing in favor of #255 (now merged into main), which incorporates the core idea you proposed here. Your direction was right: storing multimodal completion state outside
This avoids the migration concern I raised in the earlier review: existing If you spot any case where #255's approach still misbehaves on your setup, please open a new issue with reproduction details — happy to iterate. Really appreciate the contribution. |
Summary
Fixes #91, fixes #119.
RAGAnything was injecting a
multimodal_processedfield directly into LightRAG'sdoc_statusKV records. The LightRAG Server API deserializes those records intoDocProcessingStatusdataclass objects. Any LightRAG version whoseDocProcessingStatusdoes not declaremultimodal_processedraised:Fix: introduce a dedicated
raganything_multimodal_statusKV namespace (same storage class asparse_cache) to hold per-document multimodal processing state. LightRAG's owndoc_statusrecords are no longer modified with RAGAnything-specific fields, soDocProcessingStatusdeserialization always succeeds regardless of LightRAG version.Changes:
raganything/raganything.py— addmultimodal_statusfield, initialize in both pre-provided and newly-created LightRAG paths, finalize infinalize_storagesraganything/processor.py— allmultimodal_processedreads/writes now go throughself.multimodal_status;doc_statusupserts no longer contain this fieldTest plan
/documents/paginatedon LightRAG Server — confirm no 500 erroradelete_by_doc_idstill works after processingis_document_fully_processed()andget_document_processing_status()return correct values