-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
Generating a PDF with the following code ends up not returning anything from extractText.
"""
PyPDF2==2.1.0
WeasyPrint==55.0
"""
from io import BytesIO
from PyPDF2 import PdfReader
# Create example
from weasyprint import HTML
stream = BytesIO()
HTML(string="""
<html>
<body>
<div>Hello World</div>
</body>
</html>
""").write_pdf(stream)
stream.seek(0)
# Try to read "Hello World"
reader = PdfReader(stream)
print(reader.pages[0].extract_text())In this issue: Kozea/WeasyPrint/issues/290 @liZe points out that other tools are able to extract the text.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow