-
Notifications
You must be signed in to change notification settings - Fork 71
Description
Certain SUP parsing errors are currently reported without any detail on where they're located in the input, making them difficult to find in large files. Other tools are able to include line/column info for where such problems begin, so hopefully super can be enhanced to do the same.
Details
Repro is with super commit d0f9457.
The issue was reported by a user in a community Slack thread. In their own words:
Just wanna +1 anything that can be added to
superto diagnose a syntax error in a large file of sup records.
my case was a record that had unescaped double-quotes [inside a string value] … it was like line 316 out of around 500, andsuperbinary just said something about a syntax problem, but there was just no help in identifying where in the file it was.
Here's a simplified repro they provided:
$ super -version
Version: v0.1.0-21-gd0f9457d4
$ { cat << EOF > repro.sup
{id:1,text:"valid record before the bad one"}
{id:2,text:"this has "unescaped quotes" inside it"}
{id:3,text:"valid record after the bad one"}
{id:4,text:"also valid but parser state is corrupted"}
{id:5,text:"still valid, still fails to parse"}
EOF
} && super repro.sup
repro.sup: parse error: mismatched braces while parsing record type
Compare this with jq on an equivalent problem with JSON input, which is able to report a line number. (The error message is admittedly not perfect, but it's understood these parsing problems can't always be expressed in a way that hints at the human's intent.)
$ { cat << EOF > repro.json
{"id":1,"text":"valid record before the bad one"}
{"id":2,"text":"this has "unescaped quotes" inside it"}
{"id":3,"text":"valid record after the bad one"}
{"id":4,"text":"also valid but parser state is corrupted"}
{"id":5,"text":"still valid, still fails to parse"}
EOF
} && jq -c < repro.json
{"id":1,"text":"valid record before the bad one"}
jq: parse error: Invalid numeric literal at line 2, column 36
The user confirmed the origin of the bad SUP:
a shell script that wasn’t properly vetting the contents before appending to the file
Given that SUP has a formal spec, it may be reasonable for us to argue that all bets are off with SUP that comes from anywhere other than a tool that purports to adhere to that spec, such as super itself. That said, like with JSON/CSV/etc., users are likely to do their own quick hacks to create what they believe to be SUP, so issues like this are likely to keep coming up, so being able to give the same kind of detail as jq does seem like a helpful improvement.
The user also happened to share their own super-based debug query that helped them improve on what's there currently:
this command can help find where the problem starts - then i think it hits that state corruption known issue
$ super -i line -j -c ' values {raw: this, parsed: parse_sup(this)} | where is_error(parsed) | cut raw ' repro.sup {"raw":"{id:2,text:\"this has \"unescaped quotes\" inside it\"}"} {"raw":"{id:3,text:\"valid record after the bad one\"}"} {"raw":"{id:4,text:\"also valid but parser state is corrupted\"}"}
This is similar to a trick used in another existing issue #6234, so maybe it'd be worth addressing both at once.