Skip to content

Internal refactor to improve performance#225

Merged
sparkprime merged 3 commits intogoogle:masterfrom
sparkprime:perf-improvements
Jun 1, 2018
Merged

Internal refactor to improve performance#225
sparkprime merged 3 commits intogoogle:masterfrom
sparkprime:perf-improvements

Conversation

@sparkprime
Copy link
Contributor

It's quite big...

@coveralls
Copy link

coveralls commented May 24, 2018

Coverage Status

Coverage increased (+3.1%) to 76.479% when pulling e0b7882 on sparkprime:perf-improvements into 530cf7b on google:master.

@sparkprime
Copy link
Contributor Author

https://github.com/google/jsonnet/blob/master/perf_tests/realistic2.jsonnet

cpp: 34s
go head: 15s
go PR: 8.8s

@sbarzowski
Copy link
Contributor

The performance improvement is pretty awesome.

I'm slightly worried about the flexibility - how it's going to affect adding builtins, handling tailstrict and some more advanced optimizations. Also memory usage may increase.

I want to look into it more deeply. I have some ideas for getting the best of both worlds. I'll try to send a detailed review in 24h.

@sparkprime
Copy link
Contributor Author

The general theme is reducing GC pressure by avoiding heap allocations. This is achieved by using JSONNET_MEM_PROFILE as normal and then analyzing the profile with go tool pprof -alloc_objects which shows you which lines did the most allocations.

@sparkprime
Copy link
Contributor Author

If you have ideas for major changes I suggest we do it incrementally on top of this as it's already big enough and seems to work at this checkpoint.

I believe we somewhat urgently need to avoid the stack blow up problem now, and that might require more refactoring and performance loss (if I understand correctly how to accomplish that).

Copy link
Contributor

@sbarzowski sbarzowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've written down my thoughts. One proposition for a non-trivial change, a few nits, some general notes and questions. Generally LGTM (trade-off is worth it).

limitations under the License.
*/

package jsonnet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... this helped avoid some annoying boilerplate (passing TraceElement everywhere). Was there a difference in performance between passing it as value and inlining it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it ended up not being passed as a value (alloced on heap instead) because some functions weren't inlined (because they went via interfaces), but it has been a month now and I can't quite remember the details. Maybe it was possible to modify it to keep the abstraction without gc overhead but for only 2 params the value of it was quite low vs the cognitive overhead (and additional lines of code) of adding another abstraction.

@@ -1 +1 @@
std.makeArray(1e100, error "shouldn't happen")
std.makeArray(1e100, function(i) i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why? (I don't see how it makes any difference either way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The args are forced now, so the fact that 1e100 is too large is not discovered before the error is evaluated. The cpp version always had that behavior. The semantics are ambiguous (because it's an error either way and the semantics does not distinguish between different kinds of errors).

@@ -29,20 +29,12 @@ import (
"github.com/google/go-jsonnet/ast"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's very important that:

  1. Jsonnet functions can be directly rewritten as builtins.
  2. Adding a builtin is a local change - doesn't require any changes to other files - generally it should be enough to just write one function and register it somewhere.

Rationale:

  • Builtins are really important for performance and the easiest way to improve it. I think we should have a lot more of them. And it should be very obvious how to write one.
  • Builtins are a good entry point for new contributors - it would be nice to keep the code here local.
  • It signifies that the runtime is decoupled from the source code representation - which helps further optimization.

This change breaks these assumptions - there is no longer an easy and local way of creating some types of thunks (func calls, + on object, creating functions...). I think we should fix it or at least add appropriate TODOs. A proposition is in the comments for thunks.go.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find it, can you point me to it / call me in

Copy link
Contributor

@sbarzowski sbarzowski May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github marked it as outdated after something got changed. I'll added it again, with slight modifications. https://github.com/google/go-jsonnet/pull/225/files/1e40f390b04972292a0bc960d78469bad9f88eb8#diff-71edf958342e55666bf90639cda1172eR58

(Eh, github takes 2s to jump to the right line, I almost thought it doesn't work).

// it can be embedded in cachedThunk without needing to execute it ahead of
// time. It is equivalent to `local i = 42; func(i)`. It therefore has no
// free variables and needs only an empty environment to execute.
type astMakeArrayElement struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like creating artificial AST nodes like that.

  1. Requires handling outside of builtins. We'll need a lot of that when we add more builtins.
  2. It may be really confusing - I would expect all ast nodes to live in ast.
  3. It stores a bunch of data that is completely unnecessary - for every single element of every array.
  4. Dispatching an AST is an indirection similar to a virtual call in performance (https://stackoverflow.com/questions/28024884/does-a-type-assertion-type-switch-have-bad-performance-is-slow-in-go). And we create an object on heap. So I don't think we gain much in this case... I see of course how it was done to handle the change in representation, which helps in other situations.
  5. I generally dislike mixing abstraction layers.

A proposal in thunks.go would make it unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't like this either but couldn't think of a better way, let's discuss it.

// If nil, use err.
content value
// If also nil, content is not cached yet.
err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having four pointers for every thunk seems like a lot... I don't think it's a big problem now (well, I'm actually proposing adding a fifth one). But at some point it should be made more compact - perhaps simulate C union, using unsafe package.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can do that for values too (rather than forcing a heap allocation for a value) that would be nice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, thinking about it now, it could be very problematic - the GC wouldn't like it. Unless we embedded it all the way, but that would get messy I think. The "right" way of doing discriminated unions in Go is via interfaces and type switches - but then we're back at square one.

builtins.go Outdated
elems := make([]*cachedThunk, to-from+1)
for i := from; i <= to; i++ {
elems[i-from] = &readyValue{intToValue(i)}
elems[i-from] = &cachedThunk{content: intToValue(i)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps helper functions for creating &cachedThunk{content: foo} etc. We are quite likely to change the representation sooner or later. I think top-level functions shouldn't change anything performance-wise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added readyThunk(v).

interpreter.go Outdated
type callFrame struct {
// True if it switches to a clean environment (function call or array element)
// False otherwise, e.g. for local
// False otherwise, i.g. for local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like find-and-replace artifact

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


func (i *interpreter) newLocal(vars bindingFrame) {
s := &i.stack
func (s *callStack) newLocal(vars bindingFrame) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What difference does it make?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes any difference for performance, I just felt it should be actually in the stack because it was conceptually only affecting the stack.

if err != nil {
return nil, err
}
builtin := bopBuiltins[node.Op]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micro-optimization (maybe worth it to add TODO): It could be made even more direct by having an array of actual functions, not whole builtins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're talking about saving a dereference, right? I wonder if we could just make bopBuiltins map to actual structs (rather than pointers). In fact, bopBuiltins is not even an array, it's a map from enum. I'm not sure if that's optimized to an array (probably not). Maybe a switch is better. I'll add a TODO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scratch that, it is an array not a map. The type says so. It's just initialized using the map syntax.

positional: make([]potentialValue, len(node.Arguments.Positional)),
positional: make([]*cachedThunk, len(node.Arguments.Positional)),
named: make([]namedCallArgument, len(node.Arguments.Named)),
tailstrict: node.TailStrict,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[tailstrict]

Out of scope for this change, but I think it is affected. To handle tailstrict, evaluating needs to return the last thunk - rather than evaluating it inside. When an actual value is needed, a loop is necessary which evaluates until it gets a value.

So the current interpreter.evaluate will be transformed to interpreter.partiallyEvaluate which returns a *cachedThunk. The existing API should stay the same ofc, so interpreter.evaluate would be a helper to fully evaluate to an actual value (a loop I mentioned before).

In particular if my mental model is correct, it would need to:
a) return a thunked function call, instead of actually calling it inside.
b) pass the partially evaluated thunk from the body of the local (part after ;). Same with asserts and probably a few other constructs.
c) pass the partially evaluated thunk from if branches .

And I think that would be pretty much it, all the changes would be local here. It could be applied more broadly maybe, but this is the part that actually matters.

I think if handled properly it would make the special handling of stack traces obsolete - a function will just return a thunk earlier - then it will get normally removed it from the stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, evaluating needs to return some continuation instead of actually recursively invoking itself. That allows you to come out before you go back in again, but at the expense of heap allocation. I don't think you have to build long chains though. It should be possible to keep going in and out without building up a long chain, right? The caller just has to force the continuation as soon as it returns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I know exactly how I would implement it, and the diff should be pretty small. So I can just do it, once this change is merged.

// The environment is a pointer because it may be a cyclic structure. A thunk
// may refer to itself, so inside `env` there will be a variable bound back to us.
env *environment
body ast.Node
Copy link
Contributor

@sbarzowski sbarzowski May 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Adding it again, slightly modified, because github marked it as outdated, after the comment above was changed. I also removed the old one to avoid confusion.)

Proposition:
Add another field evaluable or even func (i *interpreter, t *TraceElement) (value, error). Each thunk is represented as one of the following things then:

  • body and env
  • evaluable
  • value
  • err

Evaluable would be used for cases when we don't have AST.

In case of AST evaluation it would be just one more condition to check - that the evaluable is nil - and then we use env and body normally.
In case of thunks constructed in builtins, there is exactly one level of indirection and the same number of objects created on the heap. The same as in the case of AST.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think adding another field would be a big deal for performance. I did feel dirty adding that backend-specific AST.

// Note: All potentialValues are required to provide the same value every time,
// so it's only there for efficiency.
// TODO(sbarzowski) investigate efficiency of various representations
type cachedThunk struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps renaming it to just thunk would make sense.

Or even better: rename it to thunkImpl and create a strongly typed alias type thunk *thunkImpl. The point is to:
a) prevent accidental usage as a non-pointer. Thunk always needs to be on the heap for caching to work and this way we can enforce it on the type level.
b) Make thunk opaque in other places.
I just hope it doesn't introduce any overhead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That all SGTM

@sbarzowski
Copy link
Contributor

Ok, so I think we should just merge it as-is (with maybe some cleanup if there is anything left). Also I think it's a good point for releasing 0.11.

Once it is merged we can think about cleaning up various parts, one-by-one. And other things would get unblocked as well.

@sparkprime
Copy link
Contributor Author

I was going to try some of the things you suggested but didn't get around to it. So I'll commit this now and we can iterate on it.

@sparkprime sparkprime merged commit 2973e24 into google:master Jun 1, 2018
@sparkprime sparkprime deleted the perf-improvements branch June 10, 2018 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants