Revise TOML lexer by jeanas · Pull Request #2576 · pygments/pygments

jeanas · 2023-11-08T21:08:09Z

The new lexer matches the TOML spec much more closely.

User-visible differences should be these:

Add MIME type
Highlight string escapes
Recognize \uXXXX and \UXXXX escapes
Also recognize booleans if they are followed by a comment
Fix single quotes inside multiline literal strings (closes TOML: Multi-line string lexing issue when using same quote character #2488)
Prevent multiline literal strings from eating comments
Add multiline basic strings (""")
Improve datetime recognition: recognize times without dates, dates without times and datetimes without time zone; allow sub-millisecond precision
Recognize floats with exponents (they used not to be recognized when having a decimal point)
Recognize binary, octal and hex literals
Recognize strings inside table headers
Recognize table headers followed by comments
Don't parse sequences of digits as integers when they are actually keys

Includes several new tests, most of which were not working before.

The new lexer matches the TOML spec much more closely. User-visible differences should be these: * Add MIME type * Highlight string escapes * Recognize \uXXXX and \UXXXX escapes * Also recognize booleans if they are followed by a comment * Fix single quotes inside multiline literal strings (closes pygments#2488) * Prevent multiline literal strings from eating comments * Add multiline basic strings (""") * Improve datetime recognition: recognize times without dates, dates without times and datetimes without time zone; allow sub-millisecond precision * Recognize floats with exponents (they used not to be recognized when having a decimal point) * Recognize binary, octal and hex literals * Recognize strings inside table headers * Recognize table headers followed by comments * Don't parse sequences of digits as integers when they are actually keys Includes several new tests, most of which were not working before.

birkenfeld

Looks good!

birkenfeld · 2023-11-09T16:26:58Z

pygments/lexers/configs.py

+            (r'[A-Za-z0-9_-]+', Keyword),
+            (r'"', String.Double, 'basic-string'),
+            (r"'", String.Single, 'literal-string'),
+            (r'\.', Keyword),


Any reason not to put the dot in the above char class?

I was trying to keep this state similar to 'key'. I'd be fine with changing it though.

jeanas · 2023-11-09T20:18:34Z

Oops — I realized that a [ \t]+ was missing in table-key because TOML allows table headers like [ foo . bar ].

Fixed, with a test.

Anteru · 2023-11-10T19:19:39Z

Looks good. Thanks! I'll try to wrap up a new release this or next weekend.

ilevkivskyi · 2023-11-18T22:05:20Z

pygments/lexers/configs.py

+            (r'[^"\\]+', String.Double),
+        ],
+        'literal-string': [
+            (r".*'", String.Single, '#pop'),


I think this is a bug. This will capture too much if there is a literal string followed by a comment containing '. This broke mypy docs build on this line:

'two\.pyi$', # but TOML's single-quoted strings do not

see https://github.com/python/mypy/actions/runs/6916132945/job/18815845352

Gosh. There should of course have been a ? here.

And the worst is I wrote a test exactly for this, but I missed that the output was wrong. I probably made some slight change after I checked all the golden outputs.

Sorry about that, will fix.

Thanks, NP!

Fixed in 220a2a9.

@Anteru In case you have some spare time to do a bugfix release... Thanks!

Oh dear. I'll try to get it done today, somehow. I'm always afraid this happens, and our release process is still fairly manual :( Goals for 2024 I guess.

Let me know if you want me to step in. Also, what would you like to automate?

Things I'd like to automate:

From tag to PyPi -- ideally, to test-pypi on every tagged commit (https://github.com/marketplace/actions/pypi-publish) -- and the actual release would be a special action I just click on. It's not that it takes a lot of time, but I'm always nervous I mess something up with the command line, forget to delete a file, git clean, etc. -- I very diligently work through the release-checklist to avoid that. Literally signing things off.

Auto-formatting -- I tend to clean up the formatting of the lexers every time close to release, at least the worst offenders. I use autopep8 at the moment, would rather apply flake8 or black on the entire codebase.

Auto-check that the new arguments like URL etc. are present on new lexers

Auto-check .. versionadded:: is there -- costs me a lot of time to open up every Lexer close to release and make sure it's present and in the right format (i.e. 2.17.0 vs. 2.17)

Actually get all checks working/passing (i.e. the additional checkers I wrote and possibly PyLint). check_whitespace_tokens and check_repeated_tokens need an expected-fail list so we can whitelist currently existing lexers until we fix those, but new lexers should always pass those tests.

Verify all PR numbers closed/merged since last release are mentioned in the CHANGES file. I'm pretty good at assigning tasks to milestones now, but I still miss things in the CHANGES file, and it's super time consuming to open 100 tabs, go through each item one-by-item, check the PR number/issue number is present, etc. If there was a way to auto-generate the changelog that would be even better, but my experience is that those look pretty ugly and some manual checkup is fine.

I'll get to the release in a moment, thanks for the offer though!

Thank you for the quick fix! 👍

With Pygments 2.17+, the TOML parser was rewritten [^1][^2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [^1] https://pygments.org/docs/changelog/#version-2-17-0 [^2] pygments/pygments#2576

With Pygments 2.17+, the TOML parser was rewritten[^1] and[^2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [^1] https://pygments.org/docs/changelog/#version-2-17-0 [^2] pygments/pygments#2576

With Pygments 2.17+, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576

With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576

With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576 (cherry picked from commit 7ff64f3)

With Pygments 2.17, the TOML parser was rewritten[1, 2]. It now fails to parse and highlight the `full-config.ini` file. The key-only `agent-testbox` within `[instances]` makes the standard toml parsing barf, too. Flip to ini. [1] https://pygments.org/docs/changelog/#version-2-17-0 [2] pygments/pygments#2576

jeanas added 2 commits November 8, 2023 22:06

Fixup: regen mapfiles

b9a2d22

birkenfeld approved these changes Nov 9, 2023

View reviewed changes

jeanas added 2 commits November 9, 2023 21:13

Fixup: add one more test

bc73f9c

Fixup: whitespace is allowed in table headers

e22339d

Anteru added the A-lexing area: changes to individual lexers label Nov 10, 2023

Anteru merged commit 6bc0332 into pygments:master Nov 10, 2023

Anteru added this to the 2.17 milestone Nov 10, 2023

Anteru added the changelog-update Items which need to get mentioned in the changelog label Nov 10, 2023

jeanas deleted the toml branch November 10, 2023 19:37

jeanas removed the changelog-update Items which need to get mentioned in the changelog label Nov 11, 2023

ilevkivskyi mentioned this pull request Nov 18, 2023

Docs build is broken on master python/mypy#16518

Closed

ilevkivskyi reviewed Nov 18, 2023

View reviewed changes

jeanas mentioned this pull request Nov 19, 2023

Release automation #2585

Open

awelzel mentioned this pull request Nov 20, 2023

management/full-config: no toml highlighting zeek/zeek-docs#228

Merged

Conversation

jeanas commented Nov 8, 2023

Uh oh!

birkenfeld left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeanas commented Nov 9, 2023

Uh oh!

Anteru commented Nov 10, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Anteru Nov 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Anteru Nov 19, 2023 •

edited

Loading