Possible duplicate check bypass in summarization step when same URL occurs multiple times on the same page of processing. #22

Open
opened 2025-04-06 22:46:39 +02:00 by slatian · 0 comments
Owner

If an URL happens to occur twice within the same page window while summarizing the duplicate isn't detected because duplicates are only checked against the database, but not the other batch members.

This should be fixed partly in the summary pipeline and part in the database code to prevent two entity generations for the same URL to be open at the same time.

If an URL happens to occur twice within the same page window while summarizing the duplicate isn't detected because duplicates are only checked against the database, but not the other batch members. This should be fixed partly in the summary pipeline and part in the database code to prevent two entity generations for the same URL to be open at the same time.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
unobtanium/unobtanium#22
No description provided.