-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-formsFrom a users perspective, forms is the affected feature/workflowFrom a users perspective, forms is the affected feature/workflow
Description
Hello,
I'm using pypdf to fill out a for and generate a printable pdf. Everything works fine, expcept when I use unicode strings. The text apprears corrupted in the output pdf, regardless of the pdf viewer I use. I tried Adobe Reader, SumatraPdf and Brave.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
# Windows-10-10.0.26100-SP0
$ python -c "import pypdf;print(pypdf._debug_versions)"
# pypdf==5.7.0, crypt_provider=('cryptography', '45.0.5'), PIL=11.3.0Code + PDF
This is a minimal, complete example that shows the issue:
from io import BytesIO
import pypdf
from pypdf.generic import NameObject, NumberObject, BooleanObject, IndirectObject
import pypdf.generic
import pypdf.types
data = {
"subsemnatul": "Σὲ γνωρίζω ἀπὸ τὴν κόψη",
"cnp_cui": "123456789",
"localitatea": "Comuna Roșia-Nouă",
"strada": "Căpitan Nicolae Licăreț",
"adresa_nr": "12",
"adresa_bl": "A",
"adresa_sc": "1",
"adresa_et": "5",
"adresa_ap": "123",
"adresa_judet": "Конференция",
}
# https://stackoverflow.com/a/55302753
def fill_with_pypdf(file, data):
"""
Used to fill PDF with PyPDF.
To fill, PDF form must have field name values that match the dictionary keys
:param file: The PDF being written to
:param data: The data dictionary being written to the PDF Fields
:return:
"""
with open(file, "rb") as input_stream:
pdf_reader = pypdf.PdfReader(input_stream)
if "/AcroForm" in pdf_reader.trailer["/Root"]:
pdf_reader.trailer["/Root"]["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
writer = pypdf.PdfWriter(pdf_reader)
# alter NeedAppearances
try:
catalog = writer._root_object
# get the AcroForm tree and add "/NeedAppearances attribute
if "/AcroForm" not in catalog:
writer._root_object.update({
NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
need_appearances = NameObject("/NeedAppearances")
writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
except Exception as e:
print('set_need_appearances_writer() catch : ', repr(e))
if "/AcroForm" in writer._root_object:
# Acro form is form field, set needs appearances to fix printing issues
writer._root_object["/AcroForm"].update(
{NameObject("/NeedAppearances"): BooleanObject(True)})
# loop over all pages
for page_num in range(len(pdf_reader.pages)):
# writer.add_page(pdf_reader.pages[page_num])
page = writer.pages[page_num]
# loop over annotations, but ensure they are there first...
if page.get('/Annots'):
# update field values
writer.update_page_form_field_values(page, data, auto_regenerate=False)
for j in range(0, len(page['/Annots'])):
writer_annot = page['/Annots'][j].get_object()
# flatten all the fields by setting bit position to 1
# use loop below if only specific fields need to be flattened.
writer_annot.update({
NameObject("/Ff"): NumberObject(1) # changing bit position to 1 flattens field
})
output_stream = BytesIO()
#lock fields
permissions = pypdf.constants.UserAccessPermissions(
pypdf.constants.UserAccessPermissions.PRINT |
pypdf.constants.UserAccessPermissions.PRINT_TO_REPRESENTATION |
pypdf.constants.UserAccessPermissions.EXTRACT_TEXT_AND_GRAPHICS |
pypdf.constants.UserAccessPermissions.EXTRACT
)
writer.encrypt(user_password="", owner_password="my-secret-password", algorithm="AES-256", use_128bit=False, permissions_flag=permissions)
writer.write(output_stream)
writer.set_need_appearances_writer(True)
return output_stream.getvalue()
out = fill_with_pypdf("forms/CERERE INMATRICULARE form.pdf", data)
with open("output_pypdf.pdf", "wb") as f:
f.write(out)Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
Traceback
This is the complete traceback I see:
# TODO: Your traceback goes here (if applicable)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-formsFrom a users perspective, forms is the affected feature/workflowFrom a users perspective, forms is the affected feature/workflow