Conversation
5e6eca0 to
e2ec897
Compare
Codecov ReportBase: 92.03% // Head: 91.91% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1519 +/- ##
==========================================
- Coverage 92.03% 91.91% -0.12%
==========================================
Files 32 33 +1
Lines 5976 6037 +61
Branches 1163 1180 +17
==========================================
+ Hits 5500 5549 +49
- Misses 312 318 +6
- Partials 164 170 +6
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
e2ec897 to
413d4e2
Compare
|
@MasterOdin @pubpub-zz What do you think about this one? There are two TODOs because I haven't seen an example with |
MasterOdin
left a comment
There was a problem hiding this comment.
I think it's fine to merge stuff that's not fully done so long as it's not like the entire thing will have to be completely redone to support the additional cases. Maybe create an issue or so to track support for those / with an ask to the community to see if can get a PDF that has those features.
Co-authored-by: Matthew Peveler <matt.peveler@gmail.com>
Not sure if this is the correct place to post this... But i have been using the pypdf code to extract bookmarks from multiple PDFs in a folder and found one PDF which wouldn't work and gave me a message to share the PDF here. message was "/Kids or /Limits found in PageLabels." PDF file is 87MB so too big to share here directly |
|
Thank you! If there is nothing confidential / copyright protected in there, could you maybe share it in another way? Maybe compression (zip / bzip2) helps? You could also send it to me via email: info@martin-thoma.de (I hope my mail server doesn't reject it) I'm opening this issue again so I don't forget about that part |
|
I am also coming up against the |
|
If you simply don't want to display the warning: https://pypdf.readthedocs.io/en/latest/user/suppress-warnings.html#warnings |
|
@loganpowell Is there any PDF you can share that causes this warning? |
|
@MartinThoma I have just shared the PDF with you via google drive, you should just be able to download a copy. |
|
Nice! I'm on a business trip until Saturday. I hope I'll get a chance to look at it on Sunday :-) |
|
@MartinThoma I could share a subset of the document (a couple of pages). Would that suffice? |
|
If it still causes the warnings: sure! |
|
@MartinThoma Hello, im running into the same /Kids or /Limits issue above, causing page_labels not to be read correctly. Can you please explain how to resolve? |
|
I have the same, But i have 1,3GB of PDF's so i dont know which PDF it is |
Wouldn't you be able to use appropriate logging on your side to pinpoint the offending file? |
|
I use LLAMA_Index SimpleDirectoryLoader so idk how i can log that by my best |
|
I came across a pdf with the '/kids or /limits found in PageLabels' warning. I can send it through if you are still looking for examples |
|
Yes, please! If it's ok to have it public, you can post it here. Otherwise, you can send it to me ( info@martin-thoma.de ) |
|
@MartinThoma I stumbled upon this warning message for the following PDF file: https://www.bk.admin.ch/dam/bk/de/dokumente/terminologie/publikation_25_jahre_rtd.pdf.download.pdf/Terminologie_Epochen,%20Schwerpunkte,%20Umsetzungen.pdf |
|
@MartinThoma Hi Martin, I was using SimpleDirectoryReader to load a 175 MB file and came across the same issue: /Kids or /Limits found in PageLabels. Please share this PDF with pypdf: #1519. Can you please help me understand if there is a solution yet? |
|
@kyrakangaa As long as you are using the latest pypdf version (see https://pypi.org/project/pypdf/#files) and still receive this warning without seeing a corresponding PR linked here, this most likely remains unresolved for now.
|
|
Here's a PDF you can use that reproduces the issue:
Which triggers bazillions of: |
|
I have another /Kids or /Limits |
|
I have another /Kids or /Limits Has someone explained this error? I had a few special character errors that were fixable by modifying the doc but this is not the case with /Kids or /Limits G:\My Drive\PDF Library\Chinese Traditional Religion\chinese_traditional_religion_pure_land_sutras.pdf with error: Invalid Elementary Object starting with b'\xce' @0: b'\xce\xc5'. Skipping... |
|
@khukharev I cannot reproduce: from tests import get_data_from_url
from pypdf import PdfReader, __version__
from io import BytesIO
print(f"pypdf=={__version__}")
reader = PdfReader(BytesIO(get_data_from_url('https://github.com/py-pdf/pypdf/files/14412329/Coatue_Next_Decade_in_FinTech_Oct-22.pdf', name="Coatue_Next_Decade_in_FinTech_Oct-22.pdf")))
print(reader.page_labels)gives: |
|
@ltorsini Without a PDF, we cannot help you. |
|
@JackTrapper http://6502.org/documents/publications/dr_dobbs_journal/dr_dobbs_journal_vol_06.pdf is 258 MB 😅 Still, seems to work: |
|
I would recommend any new report of an issue to create a new issue with minimal test code and pdf file. |
|
@pubpub-zz We should probably adjust the error message in this case which redirects here explicitly by opening a corresponding issue and validating the documents already supplied above. Maybe we already have enough examples and do not need to ask for more anyway. |
Can you propose a PR? |
|
I'll lock the discussion to avoid new comments here. Please check #2560 instead. |

Introduce a new PdfReader property
page_labelsthat returns a list of strings.In most cases, the list will just be
or similar, but sometimes it will be:
Evidence for User Need