Skip to content

Reporting line numbers for SUP parsing errors #6627

@philrz

Description

@philrz

Certain SUP parsing errors are currently reported without any detail on where they're located in the input, making them difficult to find in large files. Other tools are able to include line/column info for where such problems begin, so hopefully super can be enhanced to do the same.

Details

Repro is with super commit d0f9457.

The issue was reported by a user in a community Slack thread. In their own words:

Just wanna +1 anything that can be added to super to diagnose a syntax error in a large file of sup records.
my case was a record that had unescaped double-quotes [inside a string value] … it was like line 316 out of around 500, and super binary just said something about a syntax problem, but there was just no help in identifying where in the file it was.

Here's a simplified repro they provided:

$ super -version
Version: v0.1.0-21-gd0f9457d4

$ { cat << EOF > repro.sup
{id:1,text:"valid record before the bad one"}
{id:2,text:"this has "unescaped quotes" inside it"}
{id:3,text:"valid record after the bad one"}
{id:4,text:"also valid but parser state is corrupted"}
{id:5,text:"still valid, still fails to parse"}
EOF
} && super repro.sup

repro.sup: parse error: mismatched braces while parsing record type

Compare this with jq on an equivalent problem with JSON input, which is able to report a line number. (The error message is admittedly not perfect, but it's understood these parsing problems can't always be expressed in a way that hints at the human's intent.)

$ { cat << EOF > repro.json
{"id":1,"text":"valid record before the bad one"}
{"id":2,"text":"this has "unescaped quotes" inside it"}
{"id":3,"text":"valid record after the bad one"}
{"id":4,"text":"also valid but parser state is corrupted"}
{"id":5,"text":"still valid, still fails to parse"}
EOF
   } && jq -c < repro.json

{"id":1,"text":"valid record before the bad one"}
jq: parse error: Invalid numeric literal at line 2, column 36

The user confirmed the origin of the bad SUP:

a shell script that wasn’t properly vetting the contents before appending to the file

Given that SUP has a formal spec, it may be reasonable for us to argue that all bets are off with SUP that comes from anywhere other than a tool that purports to adhere to that spec, such as super itself. That said, like with JSON/CSV/etc., users are likely to do their own quick hacks to create what they believe to be SUP, so issues like this are likely to keep coming up, so being able to give the same kind of detail as jq does seem like a helpful improvement.

The user also happened to share their own super-based debug query that helped them improve on what's there currently:

this command can help find where the problem starts - then i think it hits that state corruption known issue

$ super -i line -j -c '
   values {raw: this, parsed: parse_sup(this)}
   | where is_error(parsed)
   | cut raw
' repro.sup

{"raw":"{id:2,text:\"this has \"unescaped quotes\" inside it\"}"}
{"raw":"{id:3,text:\"valid record after the bad one\"}"}
{"raw":"{id:4,text:\"also valid but parser state is corrupted\"}"}

This is similar to a trick used in another existing issue #6234, so maybe it'd be worth addressing both at once.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions