ENH: Add page label support to PdfWriter#1558
Conversation
How did you test it? Please add the test script |
|
Thank you a lot! This week I'm pretty busy, but I will try to address everything as soon as possible. At the latest, I think, next week I will be able to work on it. |
|
I have added the tests and changed a bit the behavior to manage correctly when ranges of page labels overlap. The three |
Co-authored-by: pubpub-zz <4083478+pubpub-zz@users.noreply.github.com>
|
Thank you for the mypy suggestions! |
|
Codecov ReportBase: 91.87% // Head: 91.84% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1558 +/- ##
==========================================
- Coverage 91.87% 91.84% -0.04%
==========================================
Files 33 33
Lines 6130 6202 +72
Branches 1210 1228 +18
==========================================
+ Hits 5632 5696 +64
- Misses 322 326 +4
- Partials 176 180 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
lorenzomanini
left a comment
There was a problem hiding this comment.
I think we are nearly there.
I wanted your opinion about the following: @MartinThoma @pubpub-zz
pypdf/_writer.py
Outdated
| def set_page_label( | ||
| self, | ||
| page_number_from: int, | ||
| page_number_to: int, |
There was a problem hiding this comment.
What do you think about page ranges indexes.
Now set_page_label requires page numbers starting with 1 while _set_page_label requires them starting with 0 (the parameters name change accordingly from page_number to page_index)
Also in both cases extremes are included.
I did it this way in the public interface because I think it is more natural this way for the user that probably is setting these watching a pdf (where page numbers start from 1), but I understand if you don't agree, especially considering that the private interface is 0 based.
There was a problem hiding this comment.
I'd prefer to use page_index everywhere and hence starting from 0.
Don't forget that pypdf users are Python developers. Starting a list of pages with index 0 is a lot more natural than starting with 1.
My suggestion would be to rename the parameters to page_index_from and page_index_to and letting it start with 0.
@pubpub-zz / @MasterOdin What's your opinion?
There was a problem hiding this comment.
Agree with your approach
There was a problem hiding this comment.
hehehe 1 based indexing is never the answer. The more I think about it the more I agree with you. I'll wait a bit for MasterOdin answer and then I'll change that. Unfortunately, I will have to change all the indexes in the test but I knew what I was risking when I did it:sweat_smile:
There was a problem hiding this comment.
I would agree with making it 0-based indexed for the simple reason that all other functions that refer to pages uses a 0-based index.
I agree with @MartinThoma that using page_index is the more "proper" way to refer to this as people may conflate page_number with a 1-based index scheme, whereas page_index, in my opinion, really only refers to 0-based index schemes in the context of Python.
There was a problem hiding this comment.
Thank you! I'll do that
There was a problem hiding this comment.
The last commit changes everything to 0-based indexing. Are you convinced by including both extremes or should we do lower included and upper excluded?
|
I still need to go over a lot of details, but I wanted to give you a small heads-up: Excellent work 🎉 ❤️ |
|
Overall it looks good to me. There are some minor docstring improvements I suggested (the first line should always be a summary of the function) + the page_number/page_index question needs to be clarified before we merge. I'm confident that this can be merged + released until Sunday :-) @lorenzomanini If you want, you can add yourself to the https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html ( |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Thank you!:blush: I'm having a lot of fun. And thank you for being so welcoming!:heart: |
Cool! Thank you! |
|
@lorenzomanini Looks good from my side - let me know when I can merge :-) |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
|
Good, I think it is ready! Thank you @MartinThoma and @pubpub-zz for the help and for being so nice. |
|
I'm happy to hear that you feel welcome 🤗 Your contribution is now in the Really good work! |
New Features (ENH): - Add page label support to PdfWriter (#1558) - Accept inline images with space before EI (#1552) - Add circle annotation support (#1556) - Add polygon annotation support (#1557) - Make merging pages produce a deterministic PDF (#1542, #1543) Bug Fixes (BUG): - Fix error in cmap extraction (#1544) - Remove erroneous assertion check (#1564) - Fix dictionary access of optional page label keys (#1562) Robustness (ROB): - Set ignore_eof=True for read_until_regex (#1521) Documentation (DOC): - Paper size (#1550) Developer Experience (DEV): - Fix broken combination of dependencies of docs.txt - Annotate tests appropriately (#1551) [Full Changelog](3.2.1...3.3.0)
Introduce a new PdfWriter interface
set_page_labelsthat sets page labels to a page interval (as mentioned in #1554).It should support setting multiple page lables for different page intervals.
I've done some local testing and it seems to be working.
Pending: tests to be added
It's my first time contributing to an open-source project and I have little experience with python so my code will certainly need corrections, so I would like to know what you think about it.
Also, I've no idea how to set up correctly tests for my code but I'll look into it as soon as I will have some spare time.