Skip to content

Copy-paste for hyphenated text copies hyphens #1267

@ghost

Description

Description

Problem
Currently when hyphenation is applied to a word, the word copies as prefix- suffix. Some PDF readers will copy this as prefixsuffix because they are sure enough that the minus character was a hyphenation character.

Using this, we can't really know if this hyphenation was done by an algorithm or written like this by the author, because the minus sign is used as hyphenation character.

Solution
Use the hyphen character „‐“ (U+2010) if and only if the hyphen character came from the hyphenation algorithm.
If a line break appeared within a word that already had some hyphenation character (mostly the minus sign), the hyphen character should not be used. Instead, the character should be used that was already in the input text.
When the text had a soft hyphen, the hyphen character should also be used.

Maybe this solution can only be applied if the glyph is available in the font. If so and the glyph is available in the font, it should be used, otherwise, stay with the minus sign if the font does not have the hyphenation character.

Use Case

It will be possible to copy and recreate the original input text much better and will give PDF readers better knowledge about the text to copy. It can also improve the text searching in the PDF.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpdfRelated to PDF export or PDF embedding.textRelated to the text category, which is all about text handling, shaping, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions