Skip to content

Commit 42c2aa1

Browse files
authored
Add helper script to generate supported langs markdown table (#1170)
**TL;DR: added a python script that generates a "supported languages" markdown table, making it easier to keep the table in the readme up to date** --- This PR adds a Python script `_tools/format_supported_langs.py` which accepts the output of `chroma --list` over stdin and outputs a Markdown table which can be added to the Supported Languages section of the README. Here's how I called the script: ```sh env -C cmd/chroma go run . | uv run _tools/format_supported_langs.py ``` I chose not to try automating edits to the README, and leave it up to the invoker to replace the table in the readme with this script's output. I also updated the table in the README because it's missing a few languages. Hopefully this can help with the maintenance burden.
1 parent e799618 commit 42c2aa1

File tree

2 files changed

+87
-7
lines changed

2 files changed

+87
-7
lines changed

README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -35,30 +35,30 @@ translators for Pygments lexers and styles.
3535
## Supported languages
3636

3737
| Prefix | Language
38-
| :----: | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
38+
| :----: | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3939
| A | ABAP, ABNF, ActionScript, ActionScript 3, Ada, Agda, AL, Alloy, Angular2, ANTLR, ApacheConf, APL, AppleScript, ArangoDB AQL, Arduino, ArmAsm, ATL, AutoHotkey, AutoIt, Awk
4040
| B | Ballerina, Bash, Bash Session, Batchfile, Beef, BibTeX, Bicep, BlitzBasic, BNF, BQN, Brainfuck
41-
| C | C, C#, C++, Caddyfile, Caddyfile Directives, Cap'n Proto, Cassandra CQL, Ceylon, CFEngine3, cfstatement, ChaiScript, Chapel, Cheetah, Clojure, CMake, COBOL, CoffeeScript, Common Lisp, Coq, Core, Crystal, CSS, CSV, CUE, Cython
41+
| C | C, C#, C++, C3, Caddyfile, Caddyfile Directives, Cap'n Proto, Cassandra CQL, Ceylon, CFEngine3, cfstatement, ChaiScript, Chapel, Cheetah, Clojure, CMake, COBOL, CoffeeScript, Common Lisp, Coq, Core, Crystal, CSS, CSV, CUE, Cython
4242
| D | D, Dart, Dax, Desktop file, Diff, Django/Jinja, dns, Docker, DTD, Dylan
4343
| E | EBNF, Elixir, Elm, EmacsLisp, Erlang
4444
| F | Factor, Fennel, Fish, Forth, Fortran, FortranFixed, FSharp
4545
| G | GAS, GDScript, GDScript3, Gemtext, Genshi, Genshi HTML, Genshi Text, Gherkin, Gleam, GLSL, Gnuplot, Go, Go HTML Template, Go Template, Go Text Template, GraphQL, Groff, Groovy
4646
| H | Handlebars, Hare, Haskell, Haxe, HCL, Hexdump, HLB, HLSL, HolyC, HTML, HTTP, Hy
4747
| I | Idris, Igor, INI, Io, ISCdhcpd
4848
| J | J, Janet, Java, JavaScript, JSON, JSONata, Jsonnet, Julia, Jungle
49-
| K | Kotlin
49+
| K | Kakoune, Kotlin
5050
| L | Lean4, Lighttpd configuration file, LLVM, lox, Lua
51-
| M | Makefile, Mako, markdown, Mason, Materialize SQL dialect, Mathematica, Matlab, MCFunction, Meson, Metal, MiniZinc, MLIR, Modula-2, Mojo, MonkeyC, MoonScript, MorrowindScript, Myghty, MySQL
51+
| M | Makefile, Mako, markdown, Mason, Materialize SQL dialect, Mathematica, Matlab, MCFunction, Meson, Metal, MiniZinc, MLIR, Modelica, Modula-2, Mojo, MonkeyC, MoonScript, MorrowindScript, Myghty, MySQL
5252
| N | NASM, Natural, NDISASM, Newspeak, Nginx configuration file, Nim, Nix, NSIS, Nu
5353
| O | Objective-C, ObjectPascal, OCaml, Octave, Odin, OnesEnterprise, OpenEdge ABL, OpenSCAD, Org Mode
5454
| P | PacmanConf, Perl, PHP, PHTML, Pig, PkgConfig, PL/pgSQL, plaintext, Plutus Core, Pony, PostgreSQL SQL dialect, PostScript, POVRay, PowerQuery, PowerShell, Prolog, Promela, PromQL, properties, Protocol Buffer, Protocol Buffer Text Format, PRQL, PSL, Puppet, Python, Python 2
5555
| Q | QBasic, QML
56-
| R | R, Racket, Ragel, Raku, react, ReasonML, reg, Rego, reStructuredText, Rexx, RPGLE, RPMSpec, Ruby, Rust
56+
| R | R, Racket, Ragel, Raku, react, ReasonML, reg, Rego, reStructuredText, Rexx, RGBDS Assembly, Ring, RPGLE, RPMSpec, Ruby, Rust
5757
| S | SAS, Sass, Scala, Scheme, Scilab, SCSS, Sed, Sieve, Smali, Smalltalk, Smarty, SNBT, Snobol, Solidity, SourcePawn, SPARQL, SQL, SquidConf, Standard ML, stas, Stylus, Svelte, Swift, SYSTEMD, systemverilog
5858
| T | TableGen, Tal, TASM, Tcl, Tcsh, Termcap, Terminfo, Terraform, TeX, Thrift, TOML, TradingView, Transact-SQL, Turing, Turtle, Twig, TypeScript, TypoScript, TypoScriptCssData, TypoScriptHtmlData, Typst
5959
| U | ucode
6060
| V | V, V shell, Vala, VB.net, verilog, VHDL, VHS, VimL, vue
61-
| W | WDTE, WebGPU Shading Language, WebVTT, Whiley
61+
| W | WDTE, WebAssembly Text Format, WebGPU Shading Language, WebVTT, Whiley
6262
| X | XML, Xorg
6363
| Y | YAML, YANG
6464
| Z | Z80 Assembly, Zed, Zig
@@ -211,7 +211,7 @@ the following:
211211
```sh
212212
uv run --script _tools/pygments2chroma_xml.py \
213213
pygments.lexers.jvm.KotlinLexer \
214-
> lexers/embedded/kotlin.xml
214+
> lexers/embedded/kotlin.xml
215215
```
216216

217217
A list of all lexers available in Pygments can be found in [pygments-lexers.txt](https://github.com/alecthomas/chroma/blob/master/pygments-lexers.txt).

_tools/format_supported_langs.py

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
#!/usr/bin/env python3
2+
3+
import re
4+
import sys
5+
from collections import defaultdict
6+
7+
8+
def parse_lexers(lines: list[str]) -> list[str]:
9+
"""Parse the output of chroma --list and return a list of lexer names"""
10+
11+
lexer_name_re: re.Pattern[str] = re.compile(r"^ ([^:\s].*?)\s*$")
12+
lexers: list[str] = []
13+
in_lexers = False
14+
15+
for line in lines:
16+
line = line.rstrip()
17+
if line.startswith("lexers:"):
18+
in_lexers = True
19+
continue
20+
if not in_lexers:
21+
continue
22+
23+
# stop when we hit styles/formatters/etc
24+
if line.startswith("styles:") or line.startswith("formatters:"):
25+
break
26+
27+
match: re.Match[str] | None = lexer_name_re.match(line)
28+
if match:
29+
name: str | None = match.group(1)
30+
if name:
31+
lexers.append(name)
32+
return lexers
33+
34+
35+
def group_by_prefix(lexers: list[str]) -> dict[str, list[str]]:
36+
"""Given a list of lexer names, return a dictionary mapping prefixes
37+
to lists of lexers that begin with that prefix"""
38+
groups: defaultdict[str, list[str]] = defaultdict(list[str])
39+
for name in lexers:
40+
prefix: str = name[0].upper()
41+
groups[prefix].append(name)
42+
# sort alphabetically
43+
for k in groups:
44+
groups[k] = sorted(groups[k], key=lambda s: s.lower())
45+
return dict(sorted(groups.items()))
46+
47+
48+
def emit_markdown(groups: dict[str, list[str]]) -> str:
49+
lines: list[str] = []
50+
longest = 0
51+
for prefix, lexers in groups.items():
52+
joined: str = ", ".join(lexers)
53+
l: int = len(joined)
54+
if l > longest:
55+
longest: int = l
56+
lines.append(f"| {prefix} | {joined}")
57+
splitter = f"| :----: | {longest * '-'}"
58+
markdown: list[str] = ["| Prefix | Language", splitter]
59+
markdown.extend(lines)
60+
return "\n".join(markdown)
61+
62+
63+
if __name__ == "__main__":
64+
if sys.stdin.isatty():
65+
print(
66+
"This script parses chroma --list piped from stdin and emits a markdown table for the README"
67+
)
68+
print("Recommended usage (from repo root):")
69+
print(
70+
"env -C cmd/chroma go run . --list | uv run _tools/format_supported_langs.py"
71+
)
72+
exit(1)
73+
74+
lines: list[str] | None = sys.stdin.readlines()
75+
if lines:
76+
lexers: list[str] = parse_lexers(lines)
77+
groups: dict[str, list[str]] = group_by_prefix(lexers)
78+
print(emit_markdown(groups))
79+
else:
80+
exit(1)

0 commit comments

Comments
 (0)