Skip to content

Links to identifiers with unicode characters in them are not created properly #831

@iFreilicht

Description

@iFreilicht

Problem Description

When referencing an identifier in backticks that contains a unicode character in the documentation, the resulting link only includes the first parts of the identifier which do not contain a unicode character. For example:

See also: `api_contract.output.rückkauf`

Will be rendered as

See also: <code><a href="api_contract/output.html">api_contract.output</a>.rückkauf</code>

instead of the expected

See also: <code><a href="api_contract/output/rückkauf.html">api_contract.output.rückkauf</a></code>

Steps to reproduce the behavior:

  1. Include the above snippet in your documentation
  2. Observe the partial link

System Information

Paste the output of "pdoc --version" here.

pdoc: 15.0.1
Python: 3.12.8
Platform: Windows-2019Server-10.0.17763-SP0

Additional context

I already found that this is caused by the Regex in linkify using [a-zA-Z0-9_] instead of \w, which includes Unicode characters as well.

I'll be submitting a PR for this soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions