Internal refactor to improve performance by sparkprime · Pull Request #225 · google/go-jsonnet

sparkprime · 2018-05-24T02:47:37Z

It's quite big...

coveralls · 2018-05-24T02:55:02Z

Coverage increased (+3.1%) to 76.479% when pulling e0b7882 on sparkprime:perf-improvements into 530cf7b on google:master.

sparkprime · 2018-05-24T02:56:24Z

https://github.com/google/jsonnet/blob/master/perf_tests/realistic2.jsonnet

cpp: 34s
go head: 15s
go PR: 8.8s

sbarzowski · 2018-05-24T12:22:11Z

The performance improvement is pretty awesome.

I'm slightly worried about the flexibility - how it's going to affect adding builtins, handling tailstrict and some more advanced optimizations. Also memory usage may increase.

I want to look into it more deeply. I have some ideas for getting the best of both worlds. I'll try to send a detailed review in 24h.

sparkprime · 2018-05-24T15:52:29Z

The general theme is reducing GC pressure by avoiding heap allocations. This is achieved by using JSONNET_MEM_PROFILE as normal and then analyzing the profile with go tool pprof -alloc_objects which shows you which lines did the most allocations.

sparkprime · 2018-05-24T15:54:14Z

If you have ideas for major changes I suggest we do it incrementally on top of this as it's already big enough and seems to work at this checkpoint.

I believe we somewhat urgently need to avoid the stack blow up problem now, and that might require more refactoring and performance loss (if I understand correctly how to accomplish that).

sbarzowski

Ok, I've written down my thoughts. One proposition for a non-trivial change, a few nits, some general notes and questions. Generally LGTM (trade-off is worth it).

sbarzowski · 2018-05-24T07:20:25Z

evaluator.go

-limitations under the License.
-*/
-
-package jsonnet


Hmmm... this helped avoid some annoying boilerplate (passing TraceElement everywhere). Was there a difference in performance between passing it as value and inlining it?

I believe it ended up not being passed as a value (alloced on heap instead) because some functions weren't inlined (because they went via interfaces), but it has been a month now and I can't quite remember the details. Maybe it was possible to modify it to keep the abstraction without gc overhead but for only 2 params the value of it was quite low vs the cognitive overhead (and additional lines of code) of adding another abstraction.

sbarzowski · 2018-05-24T07:22:15Z

testdata/std.makeArray_noninteger_big.jsonnet

@@ -1 +1 @@
-std.makeArray(1e100, error "shouldn't happen")
+std.makeArray(1e100, function(i) i)


why? (I don't see how it makes any difference either way)

The args are forced now, so the fact that 1e100 is too large is not discovered before the error is evaluated. The cpp version always had that behavior. The semantics are ambiguous (because it's an error either way and the semantics does not distinguish between different kinds of errors).

sbarzowski · 2018-05-24T20:28:17Z

builtins.go

@@ -29,20 +29,12 @@ import (
 	"github.com/google/go-jsonnet/ast"


I think it's very important that:

Jsonnet functions can be directly rewritten as builtins.

Adding a builtin is a local change - doesn't require any changes to other files - generally it should be enough to just write one function and register it somewhere.

Rationale:

Builtins are really important for performance and the easiest way to improve it. I think we should have a lot more of them. And it should be very obvious how to write one.

Builtins are a good entry point for new contributors - it would be nice to keep the code here local.

It signifies that the runtime is decoupled from the source code representation - which helps further optimization.

This change breaks these assumptions - there is no longer an easy and local way of creating some types of thunks (func calls, + on object, creating functions...). I think we should fix it or at least add appropriate TODOs. A proposition is in the comments for thunks.go.

I can't find it, can you point me to it / call me in

Github marked it as outdated after something got changed. I'll added it again, with slight modifications. https://github.com/google/go-jsonnet/pull/225/files/1e40f390b04972292a0bc960d78469bad9f88eb8#diff-71edf958342e55666bf90639cda1172eR58

(Eh, github takes 2s to jump to the right line, I almost thought it doesn't work).

sbarzowski · 2018-05-24T20:34:00Z

builtins.go

+// it can be embedded in cachedThunk without needing to execute it ahead of
+// time.  It is equivalent to `local i = 42; func(i)`.  It therefore has no
+// free variables and needs only an empty environment to execute.
+type astMakeArrayElement struct {


I don't like creating artificial AST nodes like that.

Requires handling outside of builtins. We'll need a lot of that when we add more builtins.

It may be really confusing - I would expect all ast nodes to live in ast.

It stores a bunch of data that is completely unnecessary - for every single element of every array.

Dispatching an AST is an indirection similar to a virtual call in performance (https://stackoverflow.com/questions/28024884/does-a-type-assertion-type-switch-have-bad-performance-is-slow-in-go). And we create an object on heap. So I don't think we gain much in this case... I see of course how it was done to handle the change in representation, which helps in other situations.

I generally dislike mixing abstraction layers.

A proposal in thunks.go would make it unnecessary.

I didn't like this either but couldn't think of a better way, let's discuss it.

sbarzowski · 2018-05-24T20:45:55Z

thunks.go

+	// If nil, use err.
+	content value
+	// If also nil, content is not cached yet.
+	err error


Having four pointers for every thunk seems like a lot... I don't think it's a big problem now (well, I'm actually proposing adding a fifth one). But at some point it should be made more compact - perhaps simulate C union, using unsafe package.

If we can do that for values too (rather than forcing a heap allocation for a value) that would be nice.

Well, thinking about it now, it could be very problematic - the GC wouldn't like it. Unless we embedded it all the way, but that would get messy I think. The "right" way of doing discriminated unions in Go is via interfaces and type switches - but then we're back at square one.

sbarzowski · 2018-05-24T22:47:27Z

builtins.go

+	elems := make([]*cachedThunk, to-from+1)
 	for i := from; i <= to; i++ {
-		elems[i-from] = &readyValue{intToValue(i)}
+		elems[i-from] = &cachedThunk{content: intToValue(i)}


Perhaps helper functions for creating &cachedThunk{content: foo} etc. We are quite likely to change the representation sooner or later. I think top-level functions shouldn't change anything performance-wise.

Added readyThunk(v).

sbarzowski · 2018-05-24T22:50:54Z

interpreter.go

 type callFrame struct {
 	// True if it switches to a clean environment (function call or array element)
-	// False otherwise, e.g. for local
+	// False otherwise, i.g. for local


seems like find-and-replace artifact

sbarzowski · 2018-05-24T22:52:54Z

interpreter.go


-func (i *interpreter) newLocal(vars bindingFrame) {
-	s := &i.stack
+func (s *callStack) newLocal(vars bindingFrame) {


What difference does it make?

I don't think it makes any difference for performance, I just felt it should be actually in the stack because it was conceptually only affecting the stack.

sbarzowski · 2018-05-24T22:58:26Z

interpreter.go

+			if err != nil {
+				return nil, err
+			}
+			builtin := bopBuiltins[node.Op]


Micro-optimization (maybe worth it to add TODO): It could be made even more direct by having an array of actual functions, not whole builtins.

You're talking about saving a dereference, right? I wonder if we could just make bopBuiltins map to actual structs (rather than pointers). In fact, bopBuiltins is not even an array, it's a map from enum. I'm not sure if that's optimized to an array (probably not). Maybe a switch is better. I'll add a TODO.

Scratch that, it is an array not a map. The type says so. It's just initialized using the map syntax.

sbarzowski · 2018-05-24T23:25:21Z

interpreter.go

-			positional: make([]potentialValue, len(node.Arguments.Positional)),
+			positional: make([]*cachedThunk, len(node.Arguments.Positional)),
 			named:      make([]namedCallArgument, len(node.Arguments.Named)),
 			tailstrict: node.TailStrict,


[tailstrict]

Out of scope for this change, but I think it is affected. To handle tailstrict, evaluating needs to return the last thunk - rather than evaluating it inside. When an actual value is needed, a loop is necessary which evaluates until it gets a value.

So the current interpreter.evaluate will be transformed to interpreter.partiallyEvaluate which returns a *cachedThunk. The existing API should stay the same ofc, so interpreter.evaluate would be a helper to fully evaluate to an actual value (a loop I mentioned before).

In particular if my mental model is correct, it would need to:
a) return a thunked function call, instead of actually calling it inside.
b) pass the partially evaluated thunk from the body of the local (part after ;). Same with asserts and probably a few other constructs.
c) pass the partially evaluated thunk from if branches .

And I think that would be pretty much it, all the changes would be local here. It could be applied more broadly maybe, but this is the part that actually matters.

I think if handled properly it would make the special handling of stack traces obsolete - a function will just return a thunk earlier - then it will get normally removed it from the stack.

More generally, evaluating needs to return some continuation instead of actually recursively invoking itself. That allows you to come out before you go back in again, but at the expense of heap allocation. I don't think you have to build long chains though. It should be possible to keep going in and out without building up a long chain, right? The caller just has to force the continuation as soon as it returns.

Ok, I think I know exactly how I would implement it, and the diff should be pretty small. So I can just do it, once this change is merged.

sbarzowski · 2018-05-28T08:17:54Z

thunks.go

+	// The environment is a pointer because it may be a cyclic structure.  A thunk
+	// may refer to itself, so inside `env` there will be a variable bound back to us.
+	env  *environment
+	body ast.Node


(Adding it again, slightly modified, because github marked it as outdated, after the comment above was changed. I also removed the old one to avoid confusion.)

Proposition:
Add another field evaluable or even func (i *interpreter, t *TraceElement) (value, error). Each thunk is represented as one of the following things then:

body and env

evaluable

value

err

Evaluable would be used for cases when we don't have AST.

In case of AST evaluation it would be just one more condition to check - that the evaluable is nil - and then we use env and body normally.
In case of thunks constructed in builtins, there is exactly one level of indirection and the same number of objects created on the heap. The same as in the case of AST.

I don't think adding another field would be a big deal for performance. I did feel dirty adding that backend-specific AST.

sbarzowski · 2018-05-28T08:54:53Z

thunks.go

 // Note: All potentialValues are required to provide the same value every time,
 // so it's only there for efficiency.
-// TODO(sbarzowski) investigate efficiency of various representations
 type cachedThunk struct {


Perhaps renaming it to just thunk would make sense.

Or even better: rename it to thunkImpl and create a strongly typed alias type thunk *thunkImpl. The point is to:
a) prevent accidental usage as a non-pointer. Thunk always needs to be on the heap for caching to work and this way we can enforce it on the type level.
b) Make thunk opaque in other places.
I just hope it doesn't introduce any overhead.

That all SGTM

sbarzowski · 2018-05-28T09:52:44Z

Ok, so I think we should just merge it as-is (with maybe some cleanup if there is anything left). Also I think it's a good point for releasing 0.11.

Once it is merged we can think about cleaning up various parts, one-by-one. And other things would get unblocked as well.

sparkprime · 2018-06-01T14:52:02Z

I was going to try some of the things you suggested but didn't get around to it. So I'll commit this now and we can iterate on it.

Internal refactor to improve performance

1e40f39

googlebot added the cla: yes label May 24, 2018

sbarzowski reviewed May 25, 2018

View reviewed changes

sparkprime added 2 commits May 26, 2018 21:49

respond to comments

8e3848d

fix typo

e0b7882

sbarzowski reviewed May 28, 2018

View reviewed changes

sparkprime merged commit 2973e24 into google:master Jun 1, 2018

stang mentioned this pull request Jun 7, 2018

Bump go-jsonnet version ksonnet/ksonnet#597

Merged

sparkprime deleted the perf-improvements branch June 10, 2018 16:23

Quentin-M mentioned this pull request Jun 15, 2018

Preserving cache between evaluate calls #208

Closed

		@@ -1 +1 @@
		std.makeArray(1e100, error "shouldn't happen")
		std.makeArray(1e100, function(i) i)

		@@ -29,20 +29,12 @@ import (
		"github.com/google/go-jsonnet/ast"

Conversation

sparkprime commented May 24, 2018

Uh oh!

coveralls commented May 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sparkprime commented May 24, 2018

Uh oh!

sbarzowski commented May 24, 2018

Uh oh!

sparkprime commented May 24, 2018

Uh oh!

sparkprime commented May 24, 2018

Uh oh!

sbarzowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbarzowski May 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbarzowski May 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbarzowski commented May 28, 2018

Uh oh!

sparkprime commented Jun 1, 2018

coveralls commented May 24, 2018 •

edited

Loading

sbarzowski May 28, 2018 •

edited

Loading

sbarzowski May 28, 2018 •

edited

Loading