Skip to content

Different characters that use the same glyph result in the same character when copied from PDFs #526

@ghost

Description

Input:

birth\u{ad}day
birth­day // also contains a soft hyphen

In the resulting PDF both soft hyphens are searchable as spaces (birth day).
Expected result would be that the word is searchable as birthday (no soft hyphen included) or as birth­day (with soft hyphen included).
Including a space instead, changes the semantics of the text.

With LuaTeX, soft hyphens are not included in the searchable text.
Maybe related: #479

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpdfRelated to PDF export or PDF embedding.textRelated to the text category, which is all about text handling, shaping, etc.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions