Skip to content

fix: preserve full_entities metadata when adding multimodal entities#228

Merged
LarFii merged 2 commits into
HKUDS:mainfrom
Exploreunive:fix/full-entities-preserve-multimodal-main-entities
Mar 24, 2026
Merged

fix: preserve full_entities metadata when adding multimodal entities#228
LarFii merged 2 commits into
HKUDS:mainfrom
Exploreunive:fix/full-entities-preserve-multimodal-main-entities

Conversation

@Exploreunive

Copy link
Copy Markdown
Contributor

Summary

This fixes a data-loss bug in Stage 3.5 multimodal entity storage.

When _store_multimodal_entities_to_full_entities() updates an existing
full_entities entry, it previously rebuilt the record using only:

  • entity_names
  • count
  • update_time

As a result, any existing metadata fields already stored for the same doc_id
could be silently overwritten.

Changes

  • preserve existing full_entities metadata via merge instead of rebuilding from scratch
  • append only new multimodal entity names without duplicating existing ones
  • preserve the original entity order instead of converting through a set
  • add regression tests covering:
  • new entry creation
  • preserving existing metadata fields
  • deduplicated append behavior without reordering

Why this matters

Stage 3.5 should enrich existing document entity records with multimodal main
entities, not replace previously stored metadata from the text pipeline.

Testing

pytest -q tests/test_full_entities_merge.py tests/test_close_event_loop.py

…ixes HKUDS#135)

The close() method is registered with atexit.register() and runs during
Python interpreter shutdown. The previous implementation had two issues:

1. get_running_loop() can succeed on a loop that is not running (just
   attached to the thread), causing create_task() to fail
2. asyncio.run() raises RuntimeError if an event loop is already set for
   the thread, even if that loop is closed

The fix checks loop.is_running() explicitly, and when there is no running
loop, properly cleans up any stale loop reference before calling
asyncio.run(). This eliminates the noisy warning:
  'There is no current event loop in thread MainThread'

Added standalone tests in tests/test_close_event_loop.py that verify
the fix works across all event loop states (no loop, closed loop,
running loop, finalize exception).
@LarFii

LarFii commented Mar 24, 2026

Copy link
Copy Markdown
Collaborator

Thanks for your contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants