Skip to content

Commit a53c924

Browse files
authored
Create Lexer for Nu (#1110)
Closes #1003 This adds a lexer for [nushell](https://www.nushell.sh/). I began by converting the lexer provided by the [`pygments-nushell` package](https://pypi.org/project/pygments-nushell/) using `_tools/pygments2chroma_xml.py`. This got me most of the way there, but there were issues parsing [interpolated strings](https://www.nushell.sh/book/working_with_strings.html#string-interpolation). I then modified the generated `nu.xml` to handle these cases correctly. I added `lexers/testdata/nu.actual` based on segments from the [Nushell Book](https://www.nushell.sh/book/) [Here](https://gistpreview.github.io/?a995de44e0780bd7ec9ee2ff6280cdeb) is an example generated HTML document with highlighting (based on the test case added). Let me know if the commits should be reformatted, I was unsure how they should be structured.
1 parent 0e031c7 commit a53c924

File tree

3 files changed

+716
-0
lines changed

3 files changed

+716
-0
lines changed

lexers/embedded/nu.xml

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
<lexer>
2+
<config>
3+
<name>Nu</name>
4+
<alias>nu</alias>
5+
<filename>*.nu</filename>
6+
<mime_type>text/plain</mime_type>
7+
</config>
8+
<rules>
9+
<state name="root">
10+
<rule><include state="basic" /></rule>
11+
<rule><include state="data" /></rule>
12+
</state>
13+
<state name="basic">
14+
<rule
15+
pattern="\b(alias|all|ansi|ansi gradient|ansi link|ansi strip|any|append|ast|attr category|attr deprecated|attr example|attr search-terms|banner|bits|bits and|bits not|bits or|bits rol|bits ror|bits shl|bits shr|bits xor|break|bytes|bytes add|bytes at|bytes build|bytes collect|bytes ends-with|bytes index-of|bytes length|bytes remove|bytes replace|bytes reverse|bytes split|bytes starts-with|cal|cd|char|chunk-by|chunks|clear|collect|columns|commandline|commandline edit|commandline get-cursor|commandline set-cursor|compact|complete|config|config env|config flatten|config nu|config reset|config use-colors|const|continue|cp|date|date format|date from-human|date humanize|date list-timezone|date now|date to-timezone|debug|debug env|debug info|debug profile|decode|decode base32|decode base32hex|decode base64|decode hex|def|default|describe|detect columns|do|drop|drop column|drop nth|du|each|each while|echo|encode|encode base32|encode base32hex|encode base64|encode hex|enumerate|error make|every|exec|exit|explain|explore|export|export alias|export const|export def|export extern|export module|export use|export-env|extern|fill|filter|find|first|flatten|for|format|format bits|format date|format duration|format filesize|format number|format pattern|from|from csv|from json|from msgpack|from msgpackz|from nuon|from ods|from ssv|from toml|from tsv|from url|from xlsx|from xml|from yaml|from yml|generate|get|glob|grid|group-by|hash|hash md5|hash sha256|headers|help|help aliases|help commands|help escapes|help externs|help modules|help operators|help pipe-and-redirect|hide|hide-env|histogram|history|history import|history session|http|http delete|http get|http head|http options|http patch|http post|http put|if|ignore|input|input list|input listen|insert|inspect|interleave|into|into binary|into bool|into cell-path|into datetime|into duration|into filesize|into float|into glob|into int|into record|into sqlite|into string|into value|is-admin|is-empty|is-not-empty|is-terminal|items|job|job flush|job id|job kill|job list|job recv|job send|job spawn|job tag|job unfreeze|join|keybindings|keybindings default|keybindings list|keybindings listen|kill|last|length|let|let-env|lines|load-env|loop|ls|match|math|math abs|math arccos|math arccosh|math arcsin|math arcsinh|math arctan|math arctanh|math avg|math ceil|math cos|math cosh|math exp|math floor|math ln|math log|math max|math median|math min|math mode|math product|math round|math sin|math sinh|math sqrt|math stddev|math sum|math tan|math tanh|math variance|merge|merge deep|metadata|metadata access|metadata set|mkdir|mktemp|module|move|mut|mv|nu-check|nu-highlight|open|overlay|overlay hide|overlay list|overlay new|overlay use|panic|par-each|parse|path|path basename|path dirname|path exists|path expand|path join|path parse|path relative-to|path self|path split|path type|plugin|plugin add|plugin list|plugin rm|plugin stop|plugin use|port|prepend|print|ps|pwd|query db|random|random binary|random bool|random chars|random dice|random float|random int|random uuid|reduce|reject|rename|return|reverse|rm|roll|roll down|roll left|roll right|roll up|rotate|run-external|save|schema|scope|scope aliases|scope commands|scope engine-stats|scope externs|scope modules|scope variables|select|seq|seq char|seq date|shuffle|skip|skip until|skip while|sleep|slice|sort|sort-by|source|source-env|split|split cell-path|split chars|split column|split list|split row|split words|start|stor|stor create|stor delete|stor export|stor import|stor insert|stor open|stor reset|stor update|str|str camel-case|str capitalize|str contains|str distance|str downcase|str ends-with|str expand|str index-of|str join|str kebab-case|str length|str pascal-case|str replace|str reverse|str screaming-snake-case|str snake-case|str starts-with|str stats|str substring|str title-case|str trim|str upcase|sys|sys cpu|sys disks|sys host|sys mem|sys net|sys temp|sys users|table|take|take until|take while|tee|term|term query|term size|timeit|to|to csv|to html|to json|to md|to msgpack|to msgpackz|to nuon|to text|to toml|to tsv|to xml|to yaml|to yml|touch|transpose|try|tutor|ulimit|uname|uniq|uniq-by|update|update cells|upsert|url|url build-query|url decode|url encode|url join|url parse|url split-query|use|values|version|version check|view|view blocks|view files|view ir|view source|view span|watch|where|which|while|whoami|window|with-env|wrap|zip)(\s*)\b"
16+
><bygroups><token type="Keyword" /><token
17+
type="TextWhitespace"
18+
/></bygroups></rule>
19+
<rule pattern="\A#!.+\n"><token type="CommentHashbang" /></rule>
20+
<rule pattern="#.*\n"><token type="CommentSingle" /></rule>
21+
<rule pattern="\\[\w\W]"><token type="LiteralStringEscape" /></rule>
22+
<rule pattern="(\b\w+)(\s*)(\+?=)"><bygroups><token
23+
type="NameVariable"
24+
/><token type="TextWhitespace" /><token
25+
type="Operator"
26+
/></bygroups></rule>
27+
<rule pattern="[\[\]{}()=]"><token type="Operator" /></rule>
28+
<rule pattern="&lt;&lt;&lt;"><token type="Operator" /></rule>
29+
<rule pattern="&lt;&lt;-?\s*(\&#x27;?)\\?(\w+)[\w\W]+?\2"><token
30+
type="LiteralString"
31+
/></rule>
32+
<rule pattern="&amp;&amp;|\|\|"><token type="Operator" /></rule>
33+
<rule pattern="\$[a-zA-Z_]\w*"><token type="NameVariable" /></rule>
34+
</state>
35+
<state name="data">
36+
<rule pattern="\$&quot;"><token type="LiteralStringDouble" /><push
37+
state="interpolated_string"
38+
/></rule>
39+
<rule pattern="(?s)&quot;(\\.|[^&quot;\\])*&quot;"><token
40+
type="LiteralStringDouble"
41+
/></rule>
42+
<rule pattern="&quot;"><token type="LiteralStringDouble" /><push
43+
state="string"
44+
/></rule>
45+
<rule pattern="(?s)\$&#x27;(\\\\|\\[0-7]+|\\.|[^&#x27;\\])*&#x27;"><token
46+
type="LiteralStringSingle"
47+
/></rule>
48+
<rule pattern="(?s)&#x27;.*?&#x27;"><token
49+
type="LiteralStringSingle"
50+
/></rule>
51+
<rule pattern=";"><token type="Punctuation" /></rule>
52+
<rule pattern="&amp;"><token type="Punctuation" /></rule>
53+
<rule pattern="\|"><token type="Punctuation" /></rule>
54+
<rule pattern="\s+"><token type="TextWhitespace" /></rule>
55+
<rule pattern="\d+\b"><token type="LiteralNumber" /></rule>
56+
<rule pattern="[^=\s\[\]{}()$&quot;\&#x27;`\\&lt;&amp;|;]+"><token
57+
type="Text"
58+
/></rule>
59+
<rule pattern="&lt;"><token type="Text" /></rule>
60+
</state>
61+
<state name="string">
62+
<rule pattern="&quot;"><token type="LiteralStringDouble" /><pop
63+
depth="1"
64+
/></rule>
65+
<rule pattern="(?s)(\\\\|\\[0-7]+|\\.|[^&quot;\\$])+"><token
66+
type="LiteralStringDouble"
67+
/></rule>
68+
</state>
69+
<state name="interpolated_string">
70+
<rule pattern="&quot;"><token type="LiteralStringDouble" /><pop
71+
depth="1"
72+
/></rule>
73+
<rule pattern="\("><token type="LiteralStringInterpol" /><push
74+
state="interpolation"
75+
/></rule>
76+
<rule pattern="(?s)(\\\\|\\[0-7]+|\\.|[^&quot;\\(])+"><token
77+
type="LiteralStringDouble"
78+
/></rule>
79+
</state>
80+
<state name="interpolation">
81+
<rule pattern="\)"><token type="LiteralStringInterpol" /><pop
82+
depth="1"
83+
/></rule>
84+
<rule><include state="root" /></rule>
85+
</state>
86+
<state name="curly">
87+
<rule pattern="\}"><token type="LiteralStringInterpol" /><pop
88+
depth="1"
89+
/></rule>
90+
<rule pattern=":-"><token type="Keyword" /></rule>
91+
<rule pattern="\w+"><token type="NameVariable" /></rule>
92+
<rule pattern="[^}:&quot;\&#x27;`$\\]+"><token
93+
type="Punctuation"
94+
/></rule>
95+
<rule pattern=":"><token type="Punctuation" /></rule>
96+
<rule><include state="root" /></rule>
97+
</state>
98+
<state name="paren">
99+
<rule pattern="\)"><token type="Keyword" /><pop depth="1" /></rule>
100+
<rule><include state="root" /></rule>
101+
</state>
102+
<state name="math">
103+
<rule pattern="\)\)"><token type="Keyword" /><pop depth="1" /></rule>
104+
<rule pattern="\*\*|\|\||&lt;&lt;|&gt;&gt;|[-+*/%^|&amp;&lt;&gt;]"><token
105+
type="Operator"
106+
/></rule>
107+
<rule pattern="\d+#[\da-zA-Z]+"><token type="LiteralNumber" /></rule>
108+
<rule pattern="\d+#(?! )"><token type="LiteralNumber" /></rule>
109+
<rule pattern="0[xX][\da-fA-F]+"><token type="LiteralNumber" /></rule>
110+
<rule pattern="\d+"><token type="LiteralNumber" /></rule>
111+
<rule pattern="[a-zA-Z_]\w*"><token type="NameVariable" /></rule>
112+
<rule><include state="root" /></rule>
113+
</state>
114+
<state name="backticks">
115+
<rule pattern="`"><token type="LiteralStringBacktick" /><pop
116+
depth="1"
117+
/></rule>
118+
<rule><include state="root" /></rule>
119+
</state>
120+
</rules>
121+
</lexer>

lexers/testdata/nu.actual

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
"12" | into int
2+
3+
date now | date to-timezone "Europe/London"
4+
5+
{'name': 'nu', 'stars': 5, 'language': 'Python'} | upsert language 'Rust'
6+
7+
[one two three] | to yaml
8+
9+
[[framework, language]; [Django, Python] [Laravel, PHP]]
10+
11+
[{name: 'Robert' age: 34 position: 'Designer'}
12+
{name: 'Margaret' age: 30 position: 'Software Developer'}
13+
{name: 'Natalie' age: 50 position: 'Accountant'}
14+
] | select name position
15+
16+
let name = "Alice"
17+
$"greetings, ($name)!"
18+
# => greetings, Alice!
19+
20+
let string_list = "one,two,three" | split row ","
21+
$string_list
22+
# => ╭───┬───────╮
23+
# => │ 0 │ one │
24+
# => │ 1 │ two │
25+
# => │ 2 │ three │
26+
# => ╰───┴───────╯
27+
28+
"Hello, world!" | str contains "o, w"
29+
# => true
30+
31+
let str_list = [zero one two]
32+
$str_list | str join ','
33+
# => zero,one,two
34+
35+
'Hello World!' | str substring 4..8
36+
# => o Wor
37+
38+
'Nushell 0.80' | parse '{shell} {version}'
39+
40+
"acronym,long\nAPL,A Programming Language" | from csv
41+
42+
$'(ansi purple_bold)This text is a bold purple!(ansi reset)'
43+
# => This text is a bold purple!
44+
45+
[foo bar baz] | insert 1 'beeze'
46+
47+
[1, 2, 3, 4] | update 1 10
48+
49+
let numbers = [1, 2, 3]
50+
$numbers | prepend 0
51+
52+
let numbers = [1, 2, 3]
53+
$numbers | append 4
54+
55+
[cammomile marigold rose forget-me-not] | first 2
56+
57+
let planets = [Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune]
58+
$planets | each { |elt| $"($elt) is a planet of the solar system" }
59+
60+
$planets | enumerate | each { |elt| $"($elt.index + 1) - ($elt.item)" }
61+
62+
let scores = [3 8 4]
63+
$"total = ($scores | reduce { |elt, acc| $acc + $elt })"
64+
# => total = 15
65+
66+
def "main run" [] {
67+
print "running"
68+
}
69+
70+
def "main build" [] {
71+
print "building"
72+
}
73+
74+
def main [x: string] {
75+
$"Hello ($x | describe) ($x)"
76+
}

0 commit comments

Comments
 (0)