Skip to content

Optimize desugaring of with expressions#1050

Closed
Gabriella439 wants to merge 3 commits intomasterfrom
gabriel/scale_with_desugar
Closed

Optimize desugaring of with expressions#1050
Gabriella439 wants to merge 3 commits intomasterfrom
gabriel/scale_with_desugar

Conversation

@Gabriella439
Copy link
Copy Markdown
Contributor

The motivation for this change is:

dhall-lang/dhall-haskell#1960

… where a deeply nested and chained with expression caused a blow-up
in the size of the desugared syntax tree.

This alternative desugaring was actually an idea we contemplated in the
original change that standardized with:

#923

… but at the time we elected to go with the simpler desugaring, reasoning
that users could work around performance issues with explicit let
bindings:

#923 (comment)

However, for chained with expressions that work-around is not
ergonomic, as illustrated in
dhall-lang/dhall-haskell#1960. Instead,
this adds language support for the let optimization.

The change to the WithMultiple test demonstrates that this leads to
a more compact desugared expression. Before this change the
expression { a.b = 1, c.d = 2 } with a.b = 3 with c.e = 4 would
desugar to:

  { a.b = 1, c.d = 2 }
 { a = { a.b = 1, c.d = 2 }.a  { b = 3 } }
 { c =
        ({ a.b = 1, c.d = 2 }  { a = { a.b = 1, c.d = 2 }.a  { b = 3 } }).c
       { e = 4 }
  }

… and now the same expression desugars to:

let _ = let _ = { a.b = 1, c.d = 2 } in _  { a = _.a  { b = 3 } }

in  _  { c = _.c  { e = 4 } }

The motivation for this change is:

dhall-lang/dhall-haskell#1960

… where a deeply nested and chained `with` expression caused a blow-up
in the size of the desugared syntax tree.

This alternative desugaring was actually an idea we contemplated in the
original change that standardized `with`:

#923

… but at the time we elected to go with the simpler desugaring, reasoning
that users could work around performance issues with explicit `let`
binding:

#923 (comment)

However, for chained `with` expressions that work-around is not
ergonomic, as illustrated in
dhall-lang/dhall-haskell#1960.  Instead,
this adds language support for the `let` work-around.

The change to the `WithMultiple` test demonstrates that this leads to
a more compact desugared expression.  Before this change the
expression `{ a.b = 1, c.d = 2 } with a.b = 3 with c.e = 4` would
desugar to:

```dhall
  { a.b = 1, c.d = 2 }
⫽ { a = { a.b = 1, c.d = 2 }.a ⫽ { b = 3 } }
⫽ { c =
        ({ a.b = 1, c.d = 2 } ⫽ { a = { a.b = 1, c.d = 2 }.a ⫽ { b = 3 } }).c
      ⫽ { e = 4 }
  }
```

… and now the same expression desugars to:

```dhall
let _ = let _ = { a.b = 1, c.d = 2 } in _ ⫽ { a = _.a ⫽ { b = 3 } }

in  _ ⫽ { c = _.c ⫽ { e = 4 } }
```
@Gabriella439
Copy link
Copy Markdown
Contributor Author

Here is the matching change to the Haskell implementation: dhall-lang/dhall-haskell#1964

@Nadrieril
Copy link
Copy Markdown
Member

From what I remember, the decisions we made #923 should have allowed an implementation to desugar using let while remaining standard-compliant. Did we overlook something?
I'm not a fan of this change because it forces the manipulation of variables at parse time. I would want this kind of thing to happen later on, when I represent variables with something more clever than text, and where I can avoid traversing the whole subtree when I introduce a fresh variable.
I do however agree that the let desugaring is the better one.
I guess I would prefer to have a node for with in the binary representation, and do the desugaring just before tck instead of during parsing.

@Gabriella439
Copy link
Copy Markdown
Contributor Author

@Nadrieril: The main difference is the change to one test suite file, which specifies how to encode multiple withs.

I would be fine changing things to make with a node in the syntax tree, though, if nobody else objects. I only took this route because I thought it would be simpler for implementations.

@philandstuff
Copy link
Copy Markdown
Collaborator

I’m going to try implementing in dhall-golang and will report back.

@sjakobi
Copy link
Copy Markdown
Collaborator

sjakobi commented Aug 3, 2020

@Gabriel439 Could you give an example (and a test?) of the desugaring with a field-path that is longer than two fields?

E.g.

r with a.b.c = x

Currently I'm unsure whether a with will result in exactly one let-binding or in n-1 let-bindings, where n is the length of the field-path.

@Gabriella439
Copy link
Copy Markdown
Contributor Author

Gabriella439 commented Aug 3, 2020

@sjakobi: r with a.b.c = x desugars to:

let _ = r in _  { a = let _ = _.a in _  { b = _.b  { c = x } } }

I'll hold off on the test until we sort out whether with expressions should be syntactic sugar or not.

@sjakobi
Copy link
Copy Markdown
Collaborator

sjakobi commented Aug 4, 2020

@sjakobi: r with a.b.c = x desugars to:

let _ = r in _  { a = let _ = _.a in _  { b = _.b  { c = x } } }

What would be the performance implications if we'd produce at most one let per with? In this case

let _ = r in _  { a = _.a  { b = _.a.b  { c = x } } }

@Gabriella439
Copy link
Copy Markdown
Contributor Author

@sjakobi: The difference would probably be negligible in practice, but I'd expect the one with more lets to be more efficient in the extreme case of a large number of fields because the number of field accesses is linear in the depth of the with expression instead of quadratic.

@Gabriella439
Copy link
Copy Markdown
Contributor Author

Gabriella439 commented Aug 4, 2020

@Nadrieril: So I took a stab at your suggestion and the main issue I ran into is that if we have a separate phase for desugaring (e.g. in between parsing and type-checking), then that would entail other changes to the AST, too. For example, we'd have to be able to represent the dotted field syntax (e.g. { x.y = 1 }) or duplicate record fields (e.g. { x = 1, x = 1 }) in the AST and standardize how to serialize that. Same thing for multi-line string literals.

Another possible option is to keep the current desugaring treatment, but instead specify the behavior of with at β-normalization time (which implies we'd also need a type-checking rule for with expressions), but I'd be fine with that if people didn't mind the implementation cost. If anything, it would likely improve error messages.

@Nadrieril
Copy link
Copy Markdown
Member

Nadrieril commented Aug 4, 2020

Ok, thanks for trying, I do see the issue. with in normalization and typechecking seems like a good option, I'd be ok with that

@Gabriella439
Copy link
Copy Markdown
Contributor Author

@Nadrieril: Alright, I'll put up a parallel pull request soon with the non-desugaring route. I don't expect it will be too complicated: it will probably just be something like "to type-check/normalize a with expression, type-check/normalize it as if it were desugared, but feel free to customize error messages and the evaluation strategy"

@Gabriella439
Copy link
Copy Markdown
Contributor Author

@Nadrieril: So I implemented what you suggested in the Haskell implementation and realized that there's one downside of doing things the way you suggest, which is that we don't have a good way to verify within our standard test suite that the desugaring was done correctly if it's always done "just in time" as part of type-checking and/or normalization. The parsing tests would only show that the with expression is faithfully encoded (without desugaring) and the normalization tests would miss the intermediate desugaring step in their expected output.

Also, I don't think it's that bad to add the dependency on shift at desugaring time, given that shift is the one function in the standard that has no dependencies on other functions within the standard, so it's cheap to implement just for this purpose.

@Gabriella439
Copy link
Copy Markdown
Contributor Author

Also, for reference, here is the Haskell code for the alternative approach:

https://github.com/dhall-lang/dhall-haskell/compare/gabriel/with_encoding

It's not much different because the Haskell implementation was already using a separate constructor internally for With, even if it didn't show up in the binary encoding

@Gabriella439
Copy link
Copy Markdown
Contributor Author

@Nadrieril: Actually, I take that back. The fact that the exact encoding doesn't show up in the test output is actually a feature, because the idea was that implementations would now be free to use different encodings internally so long as they produce the same final result.

@Gabriella439
Copy link
Copy Markdown
Contributor Author

Alright, the alternative approach is ready for review here: #1052

If that is approved then it will supersede this proposal

@Nadrieril
Copy link
Copy Markdown
Member

For me, shift is an artifact of the way the standard is specified. I expect non-toy implementations to never use such an operation, and instead use environments, de Bruijn indices, and this reverse-deBruijn indices that your NbE evaluator uses. Since that machinery is probably not yet in place just after parsing, that's my reason for wanting to avoid pre-tck shifts

@Nadrieril
Copy link
Copy Markdown
Member

we don't have a good way to verify within our standard test suite that the desugaring was done correctly if it's always done "just in time" as part of type-checking and/or normalization

I think we do: a normalization test should do the trick

\(x: {}) -> x with a.b.c = 42

As you mention later, we indeed can't check desugaring to happen before nztion/tck, but that's the point. Hopefully the test above and a few tck tests should be enough

Copy link
Copy Markdown
Collaborator

@philandstuff philandstuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this will likely be superseded, but I spotted a couple of issues

Comment thread standard/record.md Outdated
Comment thread standard/record.md Outdated
Gabriella439 and others added 2 commits August 15, 2020 17:56
… as caught by @philandstuff

Co-authored-by: Philip Potter <philip.g.potter@gmail.com>
… as caught by @philandstuff

Co-authored-by: Philip Potter <philip.g.potter@gmail.com>
@Gabriella439
Copy link
Copy Markdown
Contributor Author

This is superseded by #1052

@Gabriella439 Gabriella439 deleted the gabriel/scale_with_desugar branch August 19, 2020 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants