Internal refactor to improve performance#225
Conversation
|
https://github.com/google/jsonnet/blob/master/perf_tests/realistic2.jsonnet cpp: 34s |
|
The performance improvement is pretty awesome. I'm slightly worried about the flexibility - how it's going to affect adding builtins, handling tailstrict and some more advanced optimizations. Also memory usage may increase. I want to look into it more deeply. I have some ideas for getting the best of both worlds. I'll try to send a detailed review in 24h. |
|
The general theme is reducing GC pressure by avoiding heap allocations. This is achieved by using JSONNET_MEM_PROFILE as normal and then analyzing the profile with |
|
If you have ideas for major changes I suggest we do it incrementally on top of this as it's already big enough and seems to work at this checkpoint. I believe we somewhat urgently need to avoid the stack blow up problem now, and that might require more refactoring and performance loss (if I understand correctly how to accomplish that). |
sbarzowski
left a comment
There was a problem hiding this comment.
Ok, I've written down my thoughts. One proposition for a non-trivial change, a few nits, some general notes and questions. Generally LGTM (trade-off is worth it).
| limitations under the License. | ||
| */ | ||
|
|
||
| package jsonnet |
There was a problem hiding this comment.
Hmmm... this helped avoid some annoying boilerplate (passing TraceElement everywhere). Was there a difference in performance between passing it as value and inlining it?
There was a problem hiding this comment.
I believe it ended up not being passed as a value (alloced on heap instead) because some functions weren't inlined (because they went via interfaces), but it has been a month now and I can't quite remember the details. Maybe it was possible to modify it to keep the abstraction without gc overhead but for only 2 params the value of it was quite low vs the cognitive overhead (and additional lines of code) of adding another abstraction.
| @@ -1 +1 @@ | |||
| std.makeArray(1e100, error "shouldn't happen") | |||
| std.makeArray(1e100, function(i) i) | |||
There was a problem hiding this comment.
why? (I don't see how it makes any difference either way)
There was a problem hiding this comment.
The args are forced now, so the fact that 1e100 is too large is not discovered before the error is evaluated. The cpp version always had that behavior. The semantics are ambiguous (because it's an error either way and the semantics does not distinguish between different kinds of errors).
| @@ -29,20 +29,12 @@ import ( | |||
| "github.com/google/go-jsonnet/ast" | |||
There was a problem hiding this comment.
I think it's very important that:
- Jsonnet functions can be directly rewritten as builtins.
- Adding a builtin is a local change - doesn't require any changes to other files - generally it should be enough to just write one function and register it somewhere.
Rationale:
- Builtins are really important for performance and the easiest way to improve it. I think we should have a lot more of them. And it should be very obvious how to write one.
- Builtins are a good entry point for new contributors - it would be nice to keep the code here local.
- It signifies that the runtime is decoupled from the source code representation - which helps further optimization.
This change breaks these assumptions - there is no longer an easy and local way of creating some types of thunks (func calls, + on object, creating functions...). I think we should fix it or at least add appropriate TODOs. A proposition is in the comments for thunks.go.
There was a problem hiding this comment.
I can't find it, can you point me to it / call me in
There was a problem hiding this comment.
Github marked it as outdated after something got changed. I'll added it again, with slight modifications. https://github.com/google/go-jsonnet/pull/225/files/1e40f390b04972292a0bc960d78469bad9f88eb8#diff-71edf958342e55666bf90639cda1172eR58
(Eh, github takes 2s to jump to the right line, I almost thought it doesn't work).
| // it can be embedded in cachedThunk without needing to execute it ahead of | ||
| // time. It is equivalent to `local i = 42; func(i)`. It therefore has no | ||
| // free variables and needs only an empty environment to execute. | ||
| type astMakeArrayElement struct { |
There was a problem hiding this comment.
I don't like creating artificial AST nodes like that.
- Requires handling outside of builtins. We'll need a lot of that when we add more builtins.
- It may be really confusing - I would expect all ast nodes to live in ast.
- It stores a bunch of data that is completely unnecessary - for every single element of every array.
- Dispatching an AST is an indirection similar to a virtual call in performance (https://stackoverflow.com/questions/28024884/does-a-type-assertion-type-switch-have-bad-performance-is-slow-in-go). And we create an object on heap. So I don't think we gain much in this case... I see of course how it was done to handle the change in representation, which helps in other situations.
- I generally dislike mixing abstraction layers.
A proposal in thunks.go would make it unnecessary.
There was a problem hiding this comment.
I didn't like this either but couldn't think of a better way, let's discuss it.
| // If nil, use err. | ||
| content value | ||
| // If also nil, content is not cached yet. | ||
| err error |
There was a problem hiding this comment.
Having four pointers for every thunk seems like a lot... I don't think it's a big problem now (well, I'm actually proposing adding a fifth one). But at some point it should be made more compact - perhaps simulate C union, using unsafe package.
There was a problem hiding this comment.
If we can do that for values too (rather than forcing a heap allocation for a value) that would be nice.
There was a problem hiding this comment.
Well, thinking about it now, it could be very problematic - the GC wouldn't like it. Unless we embedded it all the way, but that would get messy I think. The "right" way of doing discriminated unions in Go is via interfaces and type switches - but then we're back at square one.
builtins.go
Outdated
| elems := make([]*cachedThunk, to-from+1) | ||
| for i := from; i <= to; i++ { | ||
| elems[i-from] = &readyValue{intToValue(i)} | ||
| elems[i-from] = &cachedThunk{content: intToValue(i)} |
There was a problem hiding this comment.
Perhaps helper functions for creating &cachedThunk{content: foo} etc. We are quite likely to change the representation sooner or later. I think top-level functions shouldn't change anything performance-wise.
There was a problem hiding this comment.
Added readyThunk(v).
interpreter.go
Outdated
| type callFrame struct { | ||
| // True if it switches to a clean environment (function call or array element) | ||
| // False otherwise, e.g. for local | ||
| // False otherwise, i.g. for local |
There was a problem hiding this comment.
seems like find-and-replace artifact
|
|
||
| func (i *interpreter) newLocal(vars bindingFrame) { | ||
| s := &i.stack | ||
| func (s *callStack) newLocal(vars bindingFrame) { |
There was a problem hiding this comment.
What difference does it make?
There was a problem hiding this comment.
I don't think it makes any difference for performance, I just felt it should be actually in the stack because it was conceptually only affecting the stack.
| if err != nil { | ||
| return nil, err | ||
| } | ||
| builtin := bopBuiltins[node.Op] |
There was a problem hiding this comment.
Micro-optimization (maybe worth it to add TODO): It could be made even more direct by having an array of actual functions, not whole builtins.
There was a problem hiding this comment.
You're talking about saving a dereference, right? I wonder if we could just make bopBuiltins map to actual structs (rather than pointers). In fact, bopBuiltins is not even an array, it's a map from enum. I'm not sure if that's optimized to an array (probably not). Maybe a switch is better. I'll add a TODO.
There was a problem hiding this comment.
Scratch that, it is an array not a map. The type says so. It's just initialized using the map syntax.
| positional: make([]potentialValue, len(node.Arguments.Positional)), | ||
| positional: make([]*cachedThunk, len(node.Arguments.Positional)), | ||
| named: make([]namedCallArgument, len(node.Arguments.Named)), | ||
| tailstrict: node.TailStrict, |
There was a problem hiding this comment.
[tailstrict]
Out of scope for this change, but I think it is affected. To handle tailstrict, evaluating needs to return the last thunk - rather than evaluating it inside. When an actual value is needed, a loop is necessary which evaluates until it gets a value.
So the current interpreter.evaluate will be transformed to interpreter.partiallyEvaluate which returns a *cachedThunk. The existing API should stay the same ofc, so interpreter.evaluate would be a helper to fully evaluate to an actual value (a loop I mentioned before).
In particular if my mental model is correct, it would need to:
a) return a thunked function call, instead of actually calling it inside.
b) pass the partially evaluated thunk from the body of the local (part after ;). Same with asserts and probably a few other constructs.
c) pass the partially evaluated thunk from if branches .
And I think that would be pretty much it, all the changes would be local here. It could be applied more broadly maybe, but this is the part that actually matters.
I think if handled properly it would make the special handling of stack traces obsolete - a function will just return a thunk earlier - then it will get normally removed it from the stack.
There was a problem hiding this comment.
More generally, evaluating needs to return some continuation instead of actually recursively invoking itself. That allows you to come out before you go back in again, but at the expense of heap allocation. I don't think you have to build long chains though. It should be possible to keep going in and out without building up a long chain, right? The caller just has to force the continuation as soon as it returns.
There was a problem hiding this comment.
Ok, I think I know exactly how I would implement it, and the diff should be pretty small. So I can just do it, once this change is merged.
| // The environment is a pointer because it may be a cyclic structure. A thunk | ||
| // may refer to itself, so inside `env` there will be a variable bound back to us. | ||
| env *environment | ||
| body ast.Node |
There was a problem hiding this comment.
(Adding it again, slightly modified, because github marked it as outdated, after the comment above was changed. I also removed the old one to avoid confusion.)
Proposition:
Add another field evaluable or even func (i *interpreter, t *TraceElement) (value, error). Each thunk is represented as one of the following things then:
- body and env
- evaluable
- value
- err
Evaluable would be used for cases when we don't have AST.
In case of AST evaluation it would be just one more condition to check - that the evaluable is nil - and then we use env and body normally.
In case of thunks constructed in builtins, there is exactly one level of indirection and the same number of objects created on the heap. The same as in the case of AST.
There was a problem hiding this comment.
I don't think adding another field would be a big deal for performance. I did feel dirty adding that backend-specific AST.
| // Note: All potentialValues are required to provide the same value every time, | ||
| // so it's only there for efficiency. | ||
| // TODO(sbarzowski) investigate efficiency of various representations | ||
| type cachedThunk struct { |
There was a problem hiding this comment.
Perhaps renaming it to just thunk would make sense.
Or even better: rename it to thunkImpl and create a strongly typed alias type thunk *thunkImpl. The point is to:
a) prevent accidental usage as a non-pointer. Thunk always needs to be on the heap for caching to work and this way we can enforce it on the type level.
b) Make thunk opaque in other places.
I just hope it doesn't introduce any overhead.
|
Ok, so I think we should just merge it as-is (with maybe some cleanup if there is anything left). Also I think it's a good point for releasing 0.11. Once it is merged we can think about cleaning up various parts, one-by-one. And other things would get unblocked as well. |
|
I was going to try some of the things you suggested but didn't get around to it. So I'll commit this now and we can iterate on it. |
It's quite big...