Skip to content

BUG: Handle IndirectObject as image filter#2355

Merged
MartinThoma merged 2 commits intopy-pdf:mainfrom
stefan6419846:image-filter-indirectobject
Dec 23, 2023
Merged

BUG: Handle IndirectObject as image filter#2355
MartinThoma merged 2 commits intopy-pdf:mainfrom
stefan6419846:image-filter-indirectobject

Conversation

@stefan6419846
Copy link
Copy Markdown
Collaborator

Previously, we might pass "4bits" as image mode to Pillow, leading to "unrecognized image mode". Example: lfilters = IndirectObject(26, 0, 139771595681120), whose get_object() would yield ['/FlateDecode'] (going into the else branch of the filter handling until now).

While I have a corresponding document where I stumbled upon this error, I cannot disclose it due to privacy reasons.

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 21, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (908797f) 94.47% compared to head (14e09c4) 94.54%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2355      +/-   ##
==========================================
+ Coverage   94.47%   94.54%   +0.06%     
==========================================
  Files          43       43              
  Lines        7564     7547      -17     
  Branches     1491     1490       -1     
==========================================
- Hits         7146     7135      -11     
+ Misses        259      253       -6     
  Partials      159      159              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma
Copy link
Copy Markdown
Member

I was trying to find a file that reproduces the issue:

I guess you have a private file with which you have tested this?

@MartinThoma MartinThoma added the workflow-images From a users perspective, image handling is the affected feature/workflow label Dec 21, 2023
@stefan6419846
Copy link
Copy Markdown
Collaborator Author

I guess you have a private file with which you have tested this?

This is correct. I might have a look at this again tomorrow to check whether I am able to generate a corresponding test file to demonstrate this and ensure appropriate coverage, so feel free to delay merging this for now. I just opened this PR with the current research state shortly before leaving the office today.

@MartinThoma
Copy link
Copy Markdown
Member

I trust you. If you have tested this with the file that was failing previously, I would merge. Otherwise I would wait.

Did you test it with your private file?

@stefan6419846
Copy link
Copy Markdown
Collaborator Author

I just sent you a minimal version of the file in question, while I am not able to make it public and have no public/uncritical alternative version.

@pubpub-zz
Copy link
Copy Markdown
Collaborator

@stefan6419846
I would propose a generic fix for all errors with indirect object adding in generic/_base.py, line 317:

    def __getattr__(self, name):
        """
        Attribute not found in object: look in pointed object
        """
        try:
            return self.getattr(name)
        except AttributeError:
            raise AttributeError(f"no attribute {name} in indirect nor in pointed Object{str(type(self.indirect_object))}")

    def __getitem__(self, key):
        """
        Item not found in object: look in pointed object
        """
        return self.getitem(key)

Can you tell me if my fix would work for you?

@stefan6419846
Copy link
Copy Markdown
Collaborator Author

@pubpub-zz If I am not mistaken, this will not work here without further changes (at least during my tests): lfilters is checked with either lfilters in (value1, value2) or lfilters == value3, so this will neither access an attribute nor use an index/a key of the IndirectObject lfilters.

@stefan6419846
Copy link
Copy Markdown
Collaborator Author

The CI seems to fail due to the known concurrency issue at the moment.

@MartinThoma MartinThoma self-requested a review December 23, 2023 10:39
@MartinThoma
Copy link
Copy Markdown
Member

The Traceback was:

>>> reader.pages[0].images[0]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/pypdf/_page.py", line 2726, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/pypdf/_page.py", line 557, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/pypdf/filters.py", line 822, in _xobj_to_image
    Image.frombytes(mode, size, data),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/PIL/Image.py", line 2950, in frombytes
    im = new(mode, size)
         ^^^^^^^^^^^^^^^
  File "/home/user/.pyenv/versions/3.11.1/lib/python3.11/site-packages/PIL/Image.py", line 2914, in new
    return im._new(core.fill(mode, size, color))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: unrecognized image mode

@MartinThoma MartinThoma merged commit 133ccb1 into py-pdf:main Dec 23, 2023
@MartinThoma
Copy link
Copy Markdown
Member

@stefan6419846 Thank you for taking care of this!

@stefan6419846 stefan6419846 deleted the image-filter-indirectobject branch December 23, 2023 11:30
MartinThoma added a commit that referenced this pull request Dec 24, 2023
## What's new

### Bug Fixes (BUG)
-  Handle IndirectObject as image filter (#2355) by @stefan6419846

### Documentation (DOC)
-  Quote specs in generate_file_identifiers (#2363) by @exiledkingcc
-  Notes about form fields and annotations (#1945) by @dmjohnsson23
-  Notes about update_page_form_field_values(auto_regenerate) (#2359) by @dmjohnsson23
-  Fix stamping example (#2358) by @dmjohnsson23
-  Stamp images directly on a PDF (#2357) by @dmjohnsson23
-  Correct the example of adding highlight annotation (#2341) by @Tobeabellwether

### Maintenance (MAINT)
-  Update upload-artifact and download-artifact actions from v3 to v4 (#2352) by @stefan6419846

### Testing (TST)
-  Add xfail test for #2336 (#2365) by @MartinThoma
-  Increase test coverage for flate handling of image mode 1 (#2339) by @stefan6419846

### Code Style (STY)
-  File identifier generation restructuring (#2362) by @exiledkingcc
-  Add PdfWriter._ID attribute (#2361) by @exiledkingcc
-  Variable naming convention (#2360) by @MartinThoma

[Full Changelog](3.17.3...3.17.4)
@Didi3333
Copy link
Copy Markdown

Hi, i still have issue in 3.17.4

panda.pdf

Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

workflow-images From a users perspective, image handling is the affected feature/workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants