Skip to content

ROB: Let _build_destination skip in case of missing /D key#2018

Merged
MartinThoma merged 3 commits intopy-pdf:mainfrom
nickryand:build_destination_fix
Dec 9, 2023
Merged

ROB: Let _build_destination skip in case of missing /D key#2018
MartinThoma merged 3 commits intopy-pdf:mainfrom
nickryand:build_destination_fix

Conversation

@nickryand
Copy link
Copy Markdown
Contributor

Calling _build_destination with a dictionary can cause it to error if the code which slices the array is executed. This skips the scenario where DictionaryObject does not have a "/D" key and is currently passed to _build_destination.

* Adds an else block to prevent calling _build_destination is not called
  with a dictionary object when it is expecting a list.
@MartinThoma
Copy link
Copy Markdown
Member

Do you have a PDF / some code that shows the issue you want to fix?

@nickryand
Copy link
Copy Markdown
Contributor Author

This is running against:

poetry show | grep pypdf
pypdf             3.14.0 A pure-python PDF library capable of splitting, me...
import sys

from pypdf import PdfReader, PdfWriter


def main():
    file = PdfReader(sys.argv[1])

    new_pdf = PdfWriter()
    new_pdf.append(fileobj=file, pages=(0, 20))

    with open("test_copy.pdf", "wb") as fp:
        new_pdf.write(fp)


if __name__ == "__main__":
    main()
$ python test.py test.pdf
Traceback (most recent call last):
  File "/tmp/pypdf/test.py", line 17, in <module>
    main()
  File "/tmp/pypdf/test.py", line 10, in main
    new_pdf.append(fileobj=file, pages=(0, 20))
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_writer.py", line 2812, in append
    self.merge(
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_utils.py", line 466, in wrapper
    return func(*args, **kwargs)
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_writer.py", line 2904, in merge
    reader.named_destinations
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 513, in named_destinations
    return self._get_named_destinations()
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 777, in _get_named_destinations
    self._get_named_destinations(kid.get_object(), retval)
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 794, in _get_named_destinations
    dest = self._build_destination(key, value)  # type: ignore
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/_reader.py", line 1001, in _build_destination
    page, typ = array[0:2]  # type: ignore
  File "/tmp/py_env/.devenv/state/venv/lib/python3.10/site-packages/pypdf/generic/_data_structures.py", line 320, in __getitem__
    return dict.__getitem__(self, key).get_object()
TypeError: unhashable type: 'slice'

@pubpub-zz
Copy link
Copy Markdown
Collaborator

@nickryand can you provide the test.pdf file you are using ?

@nickryand
Copy link
Copy Markdown
Contributor Author

I cannot provide it as it's part of a collection I'm working on for someone else. I can provide metadata information about it.

@pubpub-zz
Copy link
Copy Markdown
Collaborator

do you accept to send it by mail to @MartinThoma info@martin-thoma.de ? it will be easier for analysis

@MartinThoma
Copy link
Copy Markdown
Member

@nickryand Your PR breaks CI and cannot be merged like this. See
pytest tests/test_reader.py::test_named_destination\[stored_directly\].

You could either remove the else: continue or change the block to:

                if isinstance(val, DictionaryObject):
                    if "/D" in val:
                        val = val["/D"].get_object()
                    else:
                        continue

@MartinThoma MartinThoma changed the title Build destination fix ROB: Build destination fix Aug 13, 2023
@MartinThoma MartinThoma added the is-robustness-issue From a users perspective, this is about robustness label Aug 13, 2023
@pubpub-zz
Copy link
Copy Markdown
Collaborator

do you accept to send it by mail to @MartinThoma info@martin-thoma.de ? it will be easier for analysis

Is this request still active?

@MartinThoma
Copy link
Copy Markdown
Member

I haven't seen any e-mail, but even without that I can see that this kind of structure can be problematic. In general, we likely always have to assume that some key does not exist (even if the specs require it).

At some point I hope we can find more general solutions, e.g. defining the specs formally in code and defining the fallback (or exception) we raise if pypdf wants to access a key that does not exist.

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 13, 2023

Codecov Report

Patch coverage: 42.85% and project coverage change: +0.04% 🎉

Comparison is base (11ee648) 94.07% compared to head (acab21a) 94.12%.
Report is 38 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2018      +/-   ##
==========================================
+ Coverage   94.07%   94.12%   +0.04%     
==========================================
  Files          33       41       +8     
  Lines        7077     7391     +314     
  Branches     1413     1460      +47     
==========================================
+ Hits         6658     6957     +299     
- Misses        262      269       +7     
- Partials      157      165       +8     
Files Changed Coverage Δ
pypdf/_reader.py 91.33% <42.85%> (-0.17%) ⬇️

... and 20 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma MartinThoma changed the title ROB: Build destination fix ROB: Let _build_destination skip in case of missing /D key Dec 9, 2023
@MartinThoma MartinThoma merged commit 5e59160 into py-pdf:main Dec 9, 2023
@MartinThoma
Copy link
Copy Markdown
Member

@nickryand Thank you for your contribution 🙏 It will be part of the pypdf release on 10.12.2023 (Sunday).

@MartinThoma
Copy link
Copy Markdown
Member

@nickryand If you want I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-)

@MartinThoma
Copy link
Copy Markdown
Member

Oh, wow, I just noticed it took me way too long to merge this. I'm sorry for that 🙈

MartinThoma added a commit that referenced this pull request Dec 10, 2023
## What's new

### Bug Fixes (BUG)
-  Cope with deflated images with CMYK Black Only (#2322) by @pubpub-zz
-  Handle indirect objects as parameters for CCITTFaxDecode (#2307) by @stefan6419846
-  check words length in _cmap type1_alternative function (#2310) by @Takher

### Robustness (ROB)
-  Relax flate decoding for too many lookup values (#2331) by @stefan6419846
-  Let _build_destination skip in case of missing /D key (#2018) by @nickryand

### Documentation (DOC)
-  Note in reading form data (#2338) by @MartinThoma
-  Pull Request prefixes and size by @MartinThoma
-  Add https://github.com/zuypt for #2325 as a contributor by @MartinThoma
-  Fix docstring for RunLengthDecode.decode (#2302) by @stefan6419846

### Maintenance (MAINT)
-  Enable `disallow_any_generics` and add missing generics (#2278) by @nilehmann

### Testing (TST)
-  Centralize file downloads (#2324) by @MartinThoma

### Code Style (STY)
-  Fix typo "steam" \xe2\x86\x92 "stream" (#2327) by @stefan6419846
-  Run black by @MartinThoma
-  Make Traceback in bug report template uppercase (#2304) by @stefan6419846

[Full Changelog](3.17.1...3.17.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

is-robustness-issue From a users perspective, this is about robustness

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants