Conversation
fixes py-pdf#1295 includes test file adjustment
Codecov Report
@@ Coverage Diff @@
## main #1297 +/- ##
==========================================
- Coverage 95.07% 94.67% -0.41%
==========================================
Files 30 30
Lines 4973 5106 +133
Branches 1023 1052 +29
==========================================
+ Hits 4728 4834 +106
- Misses 139 157 +18
- Partials 106 115 +9
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
@MartinThoma, |
|
stdby |
fixes py-pdf#1279 / Status_v1_Reviewers-Guide.pdf
fixes py-pdf#1294 and may be others
* if chained xref/trailer are not good
* if the object header ('id' 'gen' obj) or if the object is not present in the xref table, will search the file for the object.
fixes py-pdf#1273
| reader = PdfReader(BytesIO(get_pdf_from_url(url, name=name))) | ||
| reader.xmp_metadata | ||
| assert exc.value.args[0].startswith("XML in XmpInformation was invalid") | ||
| assert exc.value.args[0].startswith("Stream length not defined") |
There was a problem hiding this comment.
Why did this change? I guess the reader.xmp_metadata isn't even touched, is it?
There was a problem hiding this comment.
Before this PR, one could at least get the number of pages:
assert len(reader.pages) == 5
I guess with this PR it no longer works?
There was a problem hiding this comment.
I had to modify the test result. I did not analyze further
There was a problem hiding this comment.
Before this PR, one could at least get the number of pages:
assert len(reader.pages) == 5I guess with this PR it no longer works?
under analysis
There was a problem hiding this comment.
The PDF was corrupted : the XRef package had a /Length key corrupted. I've changed the code to discard the loading of the XRef object to allow the main program to recover to a maximum information : you can now get the metadata 😊
the access to number of pages is (still?) possible
discard non readable XRef object to try to do your best
|
I had to merge iss_1292 to have a global PR. this PR is now complete |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
|
5 sec before me 😝 |
|
I'll look into applying black automatically in the CI as an extra commit today 😄 Also, I want to make flake8 run in parallel to the tests and mypy after pytest so that I can still see issues there in a failed run. |
|
I don't think it worth it. |
It's a different test scenario. |
Version 2.10.5, 2022-09-04 -------------------------- New Features (ENH): - Process XRefStm (#1297) - Auto-detect RTL for text extraction (#1309) Bug Fixes (BUG): - Avoid scaling cropbox twice (#1314) Robustness (ROB): - Fix offset correction in revised PDF (#1318) - Crop data of /U and /O in encryption dictionary to 48 bytes (#1317) - MultiLine bfrange in cmap (#1299) - Cope with 2 digit codes in bfchar (#1310) - Accept '/annn' charset as ASCII code (#1316) - Log errors during Float / NumberObject initialization (#1315) - Cope with corrupted entries in xref table (#1300) Documentation (DOC): - Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324) - Creating a coverage report (#1319) - Fix AnnotationBuilder.free_text example (#1311) - Fix usage of page.scale by replacing it with page.scale_by (#1313) Developer Experience (DEV): - Only run coverage for PyPDF2 Maintenance (MAINT): - PdfReaderProtocol (#1303) - Throw PdfReadError if Trailer can't be read (#1298) - Remove catching OverflowException (#1302) Full Changelog: 2.10.4...2.10.5
Fixes #1273
Fixes #1279
Fixes #1292
Fixes #1294
Fixes #1295
ROB: Cope with xref starting on \r\n
ROB: Escaped octal code followed by decimal int
ROB: Cope with some corrupted entries in xref table
ROB: Extend xref autorepair cases