Fix missing coordinates in paragraphs continuation #1076
Conversation
|
I didn't see the problem in the previous PR, sorry ! |
|
Neither did I when I was developing it. The structure viewer app (https://structure-vision.streamlit.app/) is quite helpful in validating the stream order of PDF extraction. |
|
Still there are few paragraphs un-annotated it seems. I checked in a PDF. Any fixes? |
|
Oulx you please share an example? |
|
Please check with these articles: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10510434 ------> Article page no:9 And similarly many more articles |
|
Those issues are not related with this PR. Here the issue is that part of the text is misclassified as figure. I've referenced your comment in a separate issue. This will likely be solved, or, at least, mitigated by #963 (WIP). |
|
@lfoppiano Thanks looking forward for the fix |
When the paragraph continues after interruption (e.g., reference callout), the coordinates are lost:
This PR solves this issue.

This PR also adds a small modification in the frontend so that the paragraph coordinates are extracted if "add coordinates" is selected and "segment sentence" is not selected.