Skip to content

Implicit braces in ocamltest trees#13259

Closed
lukemaurer wants to merge 3 commits intoocaml:trunkfrom
lukemaurer:flatter-ocamltests
Closed

Implicit braces in ocamltest trees#13259
lukemaurer wants to merge 3 commits intoocaml:trunkfrom
lukemaurer:flatter-ocamltests

Conversation

@lukemaurer
Copy link
Copy Markdown
Contributor

The new tree syntax for ocamltest greatly improves the common case where several test cases are sequentially composed, each depending on the last. We can now write such tests as

test1;
test2;
test3;
test4;
...
test27;

without increasing the indentation level. This makes sense because these tests are not “nested” in any meaningful sense, any more than consecutive statements in imperative code are nested. Practically speaking, the great advantage is that we can now add test1a between test1 and test2 without increasing indentation (or other markers of nested-ness).

Unfortunately, this only works if test1a belongs in the sequence with the other tests. But it might be completely independent: for example, it might check an error case that is only meaningful after test1 but has no bearing on the other tests. In that case, we have three unpalatable options:

  1. Write it in the sequence anyway. This obscures the structure of the test cases and causes the entire rest of the sequence to be skipped if test1a fails.
  2. Indent every test in the sequence after test1a and risk emotional damage the next time you try to rebase.
  3. Add braces around every test in the sequence after test1a but don't indent, then try to invent a convention that will help you keep the formatting straight.

The annoying thing here is that test4 is still not meaningfully “more nested” than test1. It just happens that there's an extra branch jutting out of the sequence somewhere between test1 and test4, and the syntax makes that everyone's problem.

My proposed syntax is this:

test1;
{
 test1a;
}
test2;
test3;
test4;
...
test27;

This is exactly as if we'd put braces around everything from test2 to test27, but it once again expresses that this is simply a sequence of tests in a chain. It's just that now there are some tests that branch off from the sequence.

The new tree syntax for ocamltest greatly improves the common case where several
test cases are sequentially composed, each depending on the last. We can now
write such tests as

```
test1;
test2;
test3;
test4;
...
test27;
```

without increasing the indentation level. This makes sense because these tests
are not “nested” in any meaningful sense, any more than consecutive statements
in imperative code are nested. Practically speaking, the great advantage is that
we can now add `test1a` between `test1` and `test2` without increasing
indentation (or other markers of nested-ness).

Unfortunately, this only works if `test1a` belongs in the sequence with the
other tests. But it might be completely independent: for example, it might check
an error case that is only meaningful after `test1` but has no bearing on the
other tests. In that case, we have three unpalatable options:

1. Write it in the sequence anyway. This obscures the structure of the test
   cases and causes the entire rest of the sequence to be skipped if `test1a`
   fails.
2. Indent _every test in the sequence after `test1a`_ and risk emotional damage
   the next time you try to rebase.
3. Add braces around every test in the sequence after `test1a` but don't indent,
   then try to invent a convention that will help you keep the formatting
   straight.

The annoying thing here is that `test4` is _still_ not meaningfully “more nested”
than `test1`. It just happens that there's an extra branch jutting out of the
sequence somewhere between `test1` and `test4`, and the syntax makes that
everyone's problem.

My proposed syntax is this:

```
test1;
{
 test1a;
}
test2;
test3;
test4;
...
test27;
```

This is exactly as if we'd put braces around everything from `test2` to
`test27`, but it once again expresses that this is simply a sequence of tests in
a chain. It's just that now there are some tests that branch off from the
sequence.
@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 20, 2024

Technical remarks:

  • Naively I expect an ambiguity in your new parser.mly grammar, where statement_list nonempty_statement_list tree_list could be parsed as either (statement_list nonempty_statement_list) tree_list or statement_list (nonempty_statement_list tree_list). I think that the semantics are equivalent but that the first choice gives a simpler AST. Which one did conflict resolution choose? Can we make the parser unambiguous?

  • tree_list used to but just a list of tree but now it is something weirder, I think that a better name is in order -- parallel-tail?

Design remark: we scratched our head to understand what would be a good semantics for a; (b | c); d, and the current syntax was designed precisely to rule this case out -- the parallel branching can occur only at the very end of sequences. But it does not seem impossible that we would want to choose a semantics for this case in the future, because in most cases (b | c) are in fact mutually exclusive and there is a natural, well-defined semantics. If we wanted to add this to the current syntax we would naturally propose something like: a; { b } { c }; d. I am uneasy about the fact that your proposal is eating the same syntactic space.

I think that deep down your complaint is that the parallel construct has a nesting syntax that encourages indentation (otherwise the closing delimiter is dangling weirdly). Could the problem be solved by adding an infix form to introduce parallelism? For example, I wonder if we could define reasonable precedence rules for | such that

test1;
{ test1a; } |
test2;
test3;
test4;
...
test27;

is an unambiguous way to write what you have in mind. (Note: there is a weird relation to the use of foo & background jobs in shell syntax.)

@lukemaurer
Copy link
Copy Markdown
Contributor Author

Technical remarks:

  • Naively I expect an ambiguity in your new parser.mly grammar, where statement_list nonempty_statement_list tree_list could be parsed as either (statement_list nonempty_statement_list) tree_list or statement_list (nonempty_statement_list tree_list). I think that the semantics are equivalent but that the first choice gives a simpler AST. Which one did conflict resolution choose? Can we make the parser unambiguous?

It was indeed ambiguous (this was ported from flambda-backend, which apparently isn't running Menhir with --strict). Pushed a fix so that the new clause is only allowed after at least one tree.

  • tree_list used to but just a list of tree but now it is something weirder, I think that a better name is in order -- parallel-tail?

Sure, pushed a commit renaming tree_list to parallel_tail.

I think that deep down your complaint is that the parallel construct has a nesting syntax that encourages indentation (otherwise the closing delimiter is dangling weirdly).

That's about right, yeah. In particular, you do want to indent sometimes, but maintaining a file in which there are both indenting and non-indenting braces sounds pretty miserable.

Could the problem be solved by adding an infix form to introduce parallelism? For example, I wonder if we could define reasonable precedence rules for | such that

test1;
{ test1a; } |
test2;
test3;
test4;
...
test27;

is an unambiguous way to write what you have in mind. (Note: there is a weird relation to the use of foo & background jobs in shell syntax.)

This could work. One awkward thing is that I do still want test2 to be a descendant of test1, so I would want A {B} {C} | D to be A ({B} {C} | D) rather than (A {B} {C}) | D.

If we use the PR as is, we could later express “do this after all the branches” as

test1;
{ test1a; }
{ test1b; }
finally { test1z; }

Then test1z depends on both test1a and test1b.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 20, 2024

finally could be reasonably interpreted as "do this even if the parallel branches fail", which is not the right semantics. But your suggestion looks reasonable and certainly we could think of another keyword, for example then, to restart sequencing after the parallel part.

I agree that my proposal with | looks a bit ugly -- I haven't found a formulation that I find really nice.

On the other hand, I find that your proposal has counter-intuitive semantics: if we show the test script to people who haven't read the ocamltest documentation (nobody does), are they going to guess the real semantics? (And how would they realize that their assumption is wrong?)

@lukemaurer
Copy link
Copy Markdown
Contributor Author

Hmm. Maybe we could flip it around a bit and add a keyword that embeds a tree into a statement list? Maybe branch? So my example would look like

test1;
branch { test1a; }
test2;
test3;
...

(I was going to suggest fork at first but, like finally, it has baggage.) That avoids stealing syntax and hopefully suggests to the reader that things that come after test1a don't depend on it.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jun 21, 2024

So for example

s1;
s2;
branch { s3; }
s4;
branch { s5; }
s6;
s7;

would be equivalent to

s1;
s2;
{ s3; }
{ s4;
  { s5; }
  { s6;
    s7; }
}

This makes sense to me : it reads better than my proposal, and there is less potential for confusion than with your initial proposal.

Before you do the work of adapting the implementation, I think that it would be nice to have the opinion of other people who formulated opinions on ocamltest syntax in the past, for example @shindere, @damiendoligez or @Octachron.

Note: we do have the property that s1; branch { s2; } branch { s3; } is equivalent to s1; { s2; } { s3; } which is what I would expect.

@gasche
Copy link
Copy Markdown
Member

gasche commented Jul 4, 2024

My impression is that other people are not planning to give a different opinion on this topic, so I propose to assume that my dislike of your current syntax forms a (singleton) consensus. Should we close the current PR and let you propose a different syntax when available? (Of course closing is not a final decision, people can always discuss further to revisit.)

@Octachron
Copy link
Copy Markdown
Member

I agree that the singleton syntax feels too ambiguous. The branch seems okish to me, but I am not sure how common is the situation that you describe. Do you have a rough idea of how many tests are you hoping to simplify with the new syntax?

@lukemaurer
Copy link
Copy Markdown
Contributor Author

I agree that the singleton syntax feels too ambiguous. The branch seems okish to me, but I am not sure how common is the situation that you describe. Do you have a rough idea of how many tests are you hoping to simplify with the new syntax?

I had a look through the existing tests and I don't see any that are obviously in a state to rewrite this way right this second. It would make it easier to add to existing tests going forward, though. For example, link-test/test.ml builds up mylib.cma and then uses it and a few other files for further tests. One could at that point run a few tests that only cover .cma functionality, but currently you'd have to indent the whole rest of the block to do so (or add the new tests into the sequence, which obscures the structure and may mean carefully resetting the environment after the new tests).

@gasche gasche closed this Aug 1, 2024
@gasche gasche mentioned this pull request Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants