unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/partition_pdf removes spaces from the text

Open christinestraub opened this issue 1 year ago • 0 comments

Describe the bug Some spaces are removed from the text when partitioning a PDF document.

To Reproduce PDF: rok_20230930_1-1.pdf

elements = partition_pdf(
    filename="rok_20230930_1-1.pdf",
    strategy="hi_res",
    infer_table_structure=True,
)

print(str(elements[20]))

Current behavior

Nameofeachexchangeonwhichregistered NewYorkStockExchange

Expected behavior

Name of each exchange on which registered New York Stock Exchange

christinestraub avatar Apr 16 '24 19:04 christinestraub