Skip to content

Add } as a delimiter. #18827

Merged
nik9000 merged 1 commit intoelastic:masterfrom
nik9000:painless_rcurly_ok
Jun 11, 2016
Merged

Add } as a delimiter. #18827
nik9000 merged 1 commit intoelastic:masterfrom
nik9000:painless_rcurly_ok

Conversation

@nik9000
Copy link
Copy Markdown
Member

@nik9000 nik9000 commented Jun 11, 2016

Add } is statement delimiter but only in places where it is
otherwise a valid part of the syntax, specificall the end of a block.
We do this by matching but not consuming it. Antlr 4 doesn't have
syntax for this so we have to kind of hack it together by actually
matching the } and then seeking backwards in the token stream to
"unmatch" it. This looks reasonably efficient. Not perfect, but way
better than the alternatives.

I tried and rejected a few options:

  1. Actually consuming the } and piping a boolean all through the
    grammar from the last statement in a block to the delimiter. This
    ended up being a rather large change and made the grammar way more
    complicated.
  2. Adding a semantic predicate to delimiter that just does the
    lookahead. This doesn't work out well because it doesn't work (I
    never figured out why) and because it generates an amazing
    adaptivePredict which makes a super huge DFA. It looks super
    inefficient.

Closes #18821

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 11, 2016

Oop, conflicts. One moment.

@nik9000 nik9000 force-pushed the painless_rcurly_ok branch from 230330f to ea67556 Compare June 11, 2016 14:32
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 11, 2016

Rebased to remove conflicts. Parser merge conflicts are impressive but you can just regenerate....

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 11, 2016

@jdconrad and/or @rmuir: this ready for review when you are ready for it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note you can do expectScriptThrows, which works just like expectThrows except it unboxes ScriptException automatically. This makes it easier to assert you really got the exception you want.

e.g.:

IllegalArgumentException e = expectScriptThrows(IllegalArgumentException.class, ....);

Behind the scenes, expectScriptThrows first looks for a ScriptException (it will throw a nice error if its not that), then unboxes it and acts like expectThrows from there.

I assume in this case, the fact we have PICKY on means you get a crazy error like NoViableAlt, but its not actually complaining about ambiguity? I can look into improving PICKY potentially here, we don't want it lenient, and we want it on as much as possible.

Sadly, in antlrv4, PICKY is the only way to detect ambiguities in the grammar: You cannot do it at compile time, and antlr just defaults to being slow (bugs like #18398). It means we have to test everything too, to detect these :( See https://groups.google.com/d/msg/antlr-discussion/z8otgVzqB7I/IAHEU9keZLgJ

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch it to expectScriptThrows - I had done that at first but it wasn't working with picky and I didn't notice picky was the problem until after I changed it and then didn't change it back. I'll do so now.

I turned off picky because I was trying to test that the user gets a sensible error message by default. It looks like we aren't picky by default?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Picky has a performance cost and we only enable it in tests (https://github.com/elastic/elasticsearch/blob/master/modules/lang-painless/src/main/java/org/elasticsearch/painless/antlr/Walker.java#L191-L207)

As far as error messages for the user, we don't really test them directly from unit tests. If you want to see what the user will actually get, you have to call .toJsonString() on the ScriptException. I think these kind of integration tests are best done from REST?

Instead today in unit tests we test the actual cause of the script exception: expectScriptThrows. Some unit tests also assert that the offset (line number) is reasonable too. If these things are good, then the ScriptException to the user will be good.

@rmuir
Copy link
Copy Markdown
Contributor

rmuir commented Jun 11, 2016

thanks for looking into this. It will make more pre-existing simple scripts work as-is (we can never do that perfectly, but it is nice when it works). It should also make lambda syntax be less stupid.

we need @jdconrad to look at the grammar change, as he understands the antlr side best.

@jdconrad
Copy link
Copy Markdown
Contributor

@nik9000 +1. Thanks for looking into this. I'm also surprised the semantic predicate didn't work, but this solution looks good too.

@jdconrad jdconrad mentioned this pull request Jun 11, 2016
18 tasks
Add `}` is statement delimiter but only in places where it is
otherwise a valid part of the syntax, specificall the end of a block.
We do this by matching but not consuming it. Antlr 4 doesn't have
syntax for this so we have to kind of hack it together by actually
matching the `}` and then seeking backwards in the token stream to
"unmatch" it. This looks reasonably efficient. Not perfect, but way
better than the alternatives.

I tried and rejected a few options:
1. Actually consuming the `}` and piping a boolean all through the
grammar from the last statement in a block to the delimiter. This
ended up being a rather large change and made the grammar way more
complicated.
2. Adding a semantic predicate to delimiter that just does the
lookahead. This doesn't work out well because it doesn't work (I
never figured out why) and because it generates an *amazing*
`adaptivePredict` which makes a super huge DFA. It looks super
inefficient.

Closes elastic#18821
@nik9000 nik9000 force-pushed the painless_rcurly_ok branch from 5d20aa4 to 4a265d0 Compare June 11, 2016 16:51
@nik9000 nik9000 merged commit 4a265d0 into elastic:master Jun 11, 2016
@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 11, 2016

Thanks for reviewing @rmuir and @jdconrad!

Since I didn't make this clear in the description: painless now allows syntax like:

if (foo) {return 1}

where it used to require:

if (foo) {return 1;}

Note the extra semicolon in the "before" example.

@jdconrad
Copy link
Copy Markdown
Contributor

@nik9000 Just as a note, I've thought about the predicate situation some more... The reason this doesn't work is because even though we know the next token must match RBRACK due to the predicate, the path available for ANTLR to make this decision doesn't necessarily require the RBRACK. All the predicate is doing is making the statements look like this --

| DO block WHILE LP expression RP delimiter --> | DO block WHILE LP expression RP delimiter?

Meaning we no longer require any specific token after delimiter for that rule. This is what's causing the ambiguities because now there's an opening for any number of paths to match since ANTLR has no knowledge of what the predicate is doing other than opening or closing a path, and it must write the DFA as such.

Your solution (the correct one) forces RBRACK to match closing off the possibility of any other paths. You'll also note this is different from the ELSE situation which actually closes off other paths in the event of an else forcing a single path. Unfortunately this is not possible for delimiter because we may optionally include a semicolon before a right bracket.

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 11, 2016

Right! I was reasonably sure that was it. I think antlr 3 had support for this kind of thing natively...

@nik9000
Copy link
Copy Markdown
Member Author

nik9000 commented Jun 11, 2016

Indeed, syntactic predicates look like they did this. I'm sure the antlr experts have a better solution than the one I just merged but I'm unsure of a way to do it that wouldn't way overcomplicate the grammar and/or introduce that crazy backtracking.

@clintongormley clintongormley changed the title Painless: Add } as a delimiter. Kindof. Add } as a delimiter. Jun 13, 2016
@clintongormley clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Core/Infra/Scripting Scripting abstractions, Painless, and Mustache >enhancement v5.0.0-alpha4

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants