Conversation
|
Oop, conflicts. One moment. |
230330f to
ea67556
Compare
|
Rebased to remove conflicts. Parser merge conflicts are impressive but you can just regenerate.... |
There was a problem hiding this comment.
note you can do expectScriptThrows, which works just like expectThrows except it unboxes ScriptException automatically. This makes it easier to assert you really got the exception you want.
e.g.:
IllegalArgumentException e = expectScriptThrows(IllegalArgumentException.class, ....);
Behind the scenes, expectScriptThrows first looks for a ScriptException (it will throw a nice error if its not that), then unboxes it and acts like expectThrows from there.
I assume in this case, the fact we have PICKY on means you get a crazy error like NoViableAlt, but its not actually complaining about ambiguity? I can look into improving PICKY potentially here, we don't want it lenient, and we want it on as much as possible.
Sadly, in antlrv4, PICKY is the only way to detect ambiguities in the grammar: You cannot do it at compile time, and antlr just defaults to being slow (bugs like #18398). It means we have to test everything too, to detect these :( See https://groups.google.com/d/msg/antlr-discussion/z8otgVzqB7I/IAHEU9keZLgJ
There was a problem hiding this comment.
I'll switch it to expectScriptThrows - I had done that at first but it wasn't working with picky and I didn't notice picky was the problem until after I changed it and then didn't change it back. I'll do so now.
I turned off picky because I was trying to test that the user gets a sensible error message by default. It looks like we aren't picky by default?
There was a problem hiding this comment.
Picky has a performance cost and we only enable it in tests (https://github.com/elastic/elasticsearch/blob/master/modules/lang-painless/src/main/java/org/elasticsearch/painless/antlr/Walker.java#L191-L207)
As far as error messages for the user, we don't really test them directly from unit tests. If you want to see what the user will actually get, you have to call .toJsonString() on the ScriptException. I think these kind of integration tests are best done from REST?
Instead today in unit tests we test the actual cause of the script exception: expectScriptThrows. Some unit tests also assert that the offset (line number) is reasonable too. If these things are good, then the ScriptException to the user will be good.
|
thanks for looking into this. It will make more pre-existing simple scripts work as-is (we can never do that perfectly, but it is nice when it works). It should also make lambda syntax be less stupid. we need @jdconrad to look at the grammar change, as he understands the antlr side best. |
|
@nik9000 +1. Thanks for looking into this. I'm also surprised the semantic predicate didn't work, but this solution looks good too. |
Add `}` is statement delimiter but only in places where it is otherwise a valid part of the syntax, specificall the end of a block. We do this by matching but not consuming it. Antlr 4 doesn't have syntax for this so we have to kind of hack it together by actually matching the `}` and then seeking backwards in the token stream to "unmatch" it. This looks reasonably efficient. Not perfect, but way better than the alternatives. I tried and rejected a few options: 1. Actually consuming the `}` and piping a boolean all through the grammar from the last statement in a block to the delimiter. This ended up being a rather large change and made the grammar way more complicated. 2. Adding a semantic predicate to delimiter that just does the lookahead. This doesn't work out well because it doesn't work (I never figured out why) and because it generates an *amazing* `adaptivePredict` which makes a super huge DFA. It looks super inefficient. Closes elastic#18821
5d20aa4 to
4a265d0
Compare
|
@nik9000 Just as a note, I've thought about the predicate situation some more... The reason this doesn't work is because even though we know the next token must match RBRACK due to the predicate, the path available for ANTLR to make this decision doesn't necessarily require the RBRACK. All the predicate is doing is making the statements look like this --
Meaning we no longer require any specific token after delimiter for that rule. This is what's causing the ambiguities because now there's an opening for any number of paths to match since ANTLR has no knowledge of what the predicate is doing other than opening or closing a path, and it must write the DFA as such. Your solution (the correct one) forces RBRACK to match closing off the possibility of any other paths. You'll also note this is different from the ELSE situation which actually closes off other paths in the event of an else forcing a single path. Unfortunately this is not possible for delimiter because we may optionally include a semicolon before a right bracket. |
|
Right! I was reasonably sure that was it. I think antlr 3 had support for this kind of thing natively... |
|
Indeed, syntactic predicates look like they did this. I'm sure the antlr experts have a better solution than the one I just merged but I'm unsure of a way to do it that wouldn't way overcomplicate the grammar and/or introduce that crazy backtracking. |
Add
}is statement delimiter but only in places where it isotherwise a valid part of the syntax, specificall the end of a block.
We do this by matching but not consuming it. Antlr 4 doesn't have
syntax for this so we have to kind of hack it together by actually
matching the
}and then seeking backwards in the token stream to"unmatch" it. This looks reasonably efficient. Not perfect, but way
better than the alternatives.
I tried and rejected a few options:
}and piping a boolean all through thegrammar from the last statement in a block to the delimiter. This
ended up being a rather large change and made the grammar way more
complicated.
lookahead. This doesn't work out well because it doesn't work (I
never figured out why) and because it generates an amazing
adaptivePredictwhich makes a super huge DFA. It looks superinefficient.
Closes #18821