Skip to content

ENH: Add Highlight text markup annotation#1740

Merged
MartinThoma merged 3 commits intomainfrom
highlight-annotation
Mar 26, 2023
Merged

ENH: Add Highlight text markup annotation#1740
MartinThoma merged 3 commits intomainfrom
highlight-annotation

Conversation

@MartinThoma
Copy link
Copy Markdown
Member

See #107

@MartinThoma
Copy link
Copy Markdown
Member Author

MartinThoma commented Mar 23, 2023

I've got the quadpoints by using pymupdf and inspecting the resulting document:

import fitz

doc = fitz.open("crazyones.pdf")
page = doc[0]

text_instances = page.search_for("crazy")

for inst in text_instances:
    highlight = page.add_highlight_annot(inst)
    highlight.set_colors({"stroke":(0, 0, 1), "fill":(0.75, 0.8, 0.95)})
    highlight.update()

doc.save("annotation.pdf")

@pubpub-zz How difficult would a PageObject.search_for(text) method be to implement that returns something containing the QuadPoints? Some people would be curious: https://stackoverflow.com/q/47497309/562769 :-)

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (4fc0040) 92.38% compared to head (b7bb307) 92.38%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1740   +/-   ##
=======================================
  Coverage   92.38%   92.38%           
=======================================
  Files          34       34           
  Lines        6553     6557    +4     
  Branches     1300     1301    +1     
=======================================
+ Hits         6054     6058    +4     
  Misses        326      326           
  Partials      173      173           
Impacted Files Coverage Δ
pypdf/generic/_annotations.py 93.91% <100.00%> (+0.21%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@pubpub-zz
Copy link
Copy Markdown
Collaborator

@pubpub-zz How difficult would a PageObject.search_for(text) method be to implement that returns something containing the QuadPoints? Some people would be curious: https://stackoverflow.com/q/47497309/562769 :-)

It will be part of the extension of extract_text... I raise it in the stack

@pubpub-zz
Copy link
Copy Markdown
Collaborator

the inprogress PR #1723 about set_color should also allow to change the color : definitively set_color() should be the good name

@MartinThoma MartinThoma merged commit 3da3b25 into main Mar 26, 2023
@MartinThoma MartinThoma deleted the highlight-annotation branch March 26, 2023 10:19
MartinThoma added a commit that referenced this pull request Mar 26, 2023
Security (SEC):
-  Use Python's secrets module instead of random module (#1748)

New Features (ENH):
-  Add AnnotationBuilder.highlight text markup annotation (#1740)
-  Add AnnotationBuilder.popup (#1665)
-  Add AnnotationBuilder.polyline annotation support (#1726)
-  Add clone_from parameter in PdfWriter constructor (#1703)

Bug Fixes (BUG):
-  'DictionaryObject' object has no attribute 'indirect_reference' (#1729)

Robustness (ROB):
-  Handle params NullObject in decode_stream_data (#1738)

Documentation (DOC):
-  Project scope (#1743)

Maintenance (MAINT):
-  Add AnnotationFlag (#1746)
-  Add LazyDict.__str__ (#1727)

[Full Changelog](3.6.0...3.7.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

is-feature A feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants