-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflowFrom a users perspective, image handling is the affected feature/workflow
Description
When encoding/decoding 1bit (Group 4/CCITT)) tiff files black can either mean 0 or 1 depending on /DecodeParms /BlackIs1 variable
Code + PDF
By simply changing BlackIs1 from false to true on imagemagick-CCITTFaxDecode.pdf we are already able to observe this behaviour.
from pypdf import PdfReader
def extract_images(path):
reader = PdfReader(path)
page = reader.pages[0]
for count, image_file_object in enumerate(page.images):
with open(path + str(count) + image_file_object.name, "wb") as fp:
fp.write(image_file_object.data)
extract_images('imagemagick-CCITTFaxDecode.pdf') # Standard black background with a white smiley face
extract_images('imagemagick-CCITTFaxDecode_BlackIs1-true.pdf') # On browsers and other PDF viewers displays a white background with a black smiley face, but the output here is the same as the original PDF
I suppose these changes should go into '_get_imagemode', but /DecodeParms might be specific to /CCITTFaxDecode filter in which case they'd probably go into '_get_mode_and_invert_color' or even '_xobj_to_image' instead.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflowFrom a users perspective, image handling is the affected feature/workflow