Skip to content

KoltinLexer get_tokendefs returns non-text #2964

@flofriday

Description

@flofriday

I haven't quite figured out what's happening here, so please excuse the vague title, but the output of KotlinLexer.get_tokendefs() certainly seems to be broken.

Reproducable script.

Run the following code (for example with uv run main.py or how ever you manage your dependencies):

# /// script
# dependencies = [
#   "pygments==2.19.2",
# ]
# ///

from pygments.lexers.jvm import KotlinLexer

print(KotlinLexer.get_tokendefs())

And it will print something that looks like something in binary format (screenshot here because GitHub won't accept invalid utf-8 bytes here):

Image

As you can see up until the highlighted Token.Literal.Number everything looks fine but below are random looking bytes.

The corresponding line in the KotlinLexer would be the parsing of identifiers:

# Identifiers
(r'' + kt_id + r'((\?[^.])?)', Name) # additionally handle nullable types

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions