Add } as a delimiter. by nik9000 · Pull Request #18827 · elastic/elasticsearch

nik9000 · 2016-06-11T14:30:38Z

Add } is statement delimiter but only in places where it is
otherwise a valid part of the syntax, specificall the end of a block.
We do this by matching but not consuming it. Antlr 4 doesn't have
syntax for this so we have to kind of hack it together by actually
matching the } and then seeking backwards in the token stream to
"unmatch" it. This looks reasonably efficient. Not perfect, but way
better than the alternatives.

I tried and rejected a few options:

Actually consuming the } and piping a boolean all through the
grammar from the last statement in a block to the delimiter. This
ended up being a rather large change and made the grammar way more
complicated.
Adding a semantic predicate to delimiter that just does the
lookahead. This doesn't work out well because it doesn't work (I
never figured out why) and because it generates an amazing
adaptivePredict which makes a super huge DFA. It looks super
inefficient.

Closes #18821

nik9000 · 2016-06-11T14:30:47Z

Oop, conflicts. One moment.

nik9000 · 2016-06-11T14:33:04Z

Rebased to remove conflicts. Parser merge conflicts are impressive but you can just regenerate....

nik9000 · 2016-06-11T14:34:03Z

@jdconrad and/or @rmuir: this ready for review when you are ready for it.

rmuir · 2016-06-11T14:47:39Z

modules/lang-painless/src/test/java/org/elasticsearch/painless/WhenThingsGoWrongTests.java

note you can do expectScriptThrows, which works just like expectThrows except it unboxes ScriptException automatically. This makes it easier to assert you really got the exception you want.

e.g.:

IllegalArgumentException e = expectScriptThrows(IllegalArgumentException.class, ....);

Behind the scenes, expectScriptThrows first looks for a ScriptException (it will throw a nice error if its not that), then unboxes it and acts like expectThrows from there.

I assume in this case, the fact we have PICKY on means you get a crazy error like NoViableAlt, but its not actually complaining about ambiguity? I can look into improving PICKY potentially here, we don't want it lenient, and we want it on as much as possible.

Sadly, in antlrv4, PICKY is the only way to detect ambiguities in the grammar: You cannot do it at compile time, and antlr just defaults to being slow (bugs like #18398). It means we have to test everything too, to detect these :( See https://groups.google.com/d/msg/antlr-discussion/z8otgVzqB7I/IAHEU9keZLgJ

I'll switch it to expectScriptThrows - I had done that at first but it wasn't working with picky and I didn't notice picky was the problem until after I changed it and then didn't change it back. I'll do so now.

I turned off picky because I was trying to test that the user gets a sensible error message by default. It looks like we aren't picky by default?

Picky has a performance cost and we only enable it in tests (https://github.com/elastic/elasticsearch/blob/master/modules/lang-painless/src/main/java/org/elasticsearch/painless/antlr/Walker.java#L191-L207)

As far as error messages for the user, we don't really test them directly from unit tests. If you want to see what the user will actually get, you have to call .toJsonString() on the ScriptException. I think these kind of integration tests are best done from REST?

Instead today in unit tests we test the actual cause of the script exception: expectScriptThrows. Some unit tests also assert that the offset (line number) is reasonable too. If these things are good, then the ScriptException to the user will be good.

rmuir · 2016-06-11T14:50:23Z

thanks for looking into this. It will make more pre-existing simple scripts work as-is (we can never do that perfectly, but it is nice when it works). It should also make lambda syntax be less stupid.

we need @jdconrad to look at the grammar change, as he understands the antlr side best.

jdconrad · 2016-06-11T15:48:47Z

@nik9000 +1. Thanks for looking into this. I'm also surprised the semantic predicate didn't work, but this solution looks good too.

Add `}` is statement delimiter but only in places where it is otherwise a valid part of the syntax, specificall the end of a block. We do this by matching but not consuming it. Antlr 4 doesn't have syntax for this so we have to kind of hack it together by actually matching the `}` and then seeking backwards in the token stream to "unmatch" it. This looks reasonably efficient. Not perfect, but way better than the alternatives. I tried and rejected a few options: 1. Actually consuming the `}` and piping a boolean all through the grammar from the last statement in a block to the delimiter. This ended up being a rather large change and made the grammar way more complicated. 2. Adding a semantic predicate to delimiter that just does the lookahead. This doesn't work out well because it doesn't work (I never figured out why) and because it generates an *amazing* `adaptivePredict` which makes a super huge DFA. It looks super inefficient. Closes elastic#18821

nik9000 · 2016-06-11T16:54:06Z

Thanks for reviewing @rmuir and @jdconrad!

Since I didn't make this clear in the description: painless now allows syntax like:

if (foo) {return 1}

where it used to require:

if (foo) {return 1;}

Note the extra semicolon in the "before" example.

jdconrad · 2016-06-11T18:28:29Z

@nik9000 Just as a note, I've thought about the predicate situation some more... The reason this doesn't work is because even though we know the next token must match RBRACK due to the predicate, the path available for ANTLR to make this decision doesn't necessarily require the RBRACK. All the predicate is doing is making the statements look like this --

| DO block WHILE LP expression RP delimiter --> | DO block WHILE LP expression RP delimiter?

Meaning we no longer require any specific token after delimiter for that rule. This is what's causing the ambiguities because now there's an opening for any number of paths to match since ANTLR has no knowledge of what the predicate is doing other than opening or closing a path, and it must write the DFA as such.

Your solution (the correct one) forces RBRACK to match closing off the possibility of any other paths. You'll also note this is different from the ELSE situation which actually closes off other paths in the event of an else forcing a single path. Unfortunately this is not possible for delimiter because we may optionally include a semicolon before a right bracket.

nik9000 · 2016-06-11T18:34:17Z

Right! I was reasonably sure that was it. I think antlr 3 had support for this kind of thing natively...

nik9000 · 2016-06-11T18:39:23Z

Indeed, syntactic predicates look like they did this. I'm sure the antlr experts have a better solution than the one I just merged but I'm unsure of a way to do it that wouldn't way overcomplicate the grammar and/or introduce that crazy backtracking.

nik9000 added review v5.0.0-alpha4 labels Jun 11, 2016

nik9000 force-pushed the painless_rcurly_ok branch from 230330f to ea67556 Compare June 11, 2016 14:32

rmuir reviewed Jun 11, 2016
View reviewed changes

jdconrad mentioned this pull request Jun 11, 2016

Ongoing Painless Improvements #17992

Closed

18 tasks

nik9000 force-pushed the painless_rcurly_ok branch from 5d20aa4 to 4a265d0 Compare June 11, 2016 16:51

nik9000 added the >enhancement label Jun 11, 2016

nik9000 merged commit 4a265d0 into elastic:master Jun 11, 2016

clintongormley changed the title ~~Painless: Add } as a delimiter. Kindof.~~ Add } as a delimiter. Jun 13, 2016

clintongormley added :Core/Infra/Scripting Scripting abstractions, Painless, and Mustache and removed :Plugin Lang Painless labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add } as a delimiter. #18827

Add } as a delimiter. #18827
nik9000 merged 1 commit intoelastic:masterfrom
nik9000:painless_rcurly_ok

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

rmuir Jun 11, 2016

Uh oh!

nik9000 Jun 11, 2016

Uh oh!

rmuir Jun 11, 2016

Uh oh!

rmuir commented Jun 11, 2016

Uh oh!

jdconrad commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

jdconrad commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

rmuir Jun 11, 2016

Choose a reason for hiding this comment

Uh oh!

nik9000 Jun 11, 2016

Choose a reason for hiding this comment

Uh oh!

rmuir Jun 11, 2016

Choose a reason for hiding this comment

Uh oh!

rmuir commented Jun 11, 2016

Uh oh!

jdconrad commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

jdconrad commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

nik9000 commented Jun 11, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants