add benchmarks to compare protocompile perf and memory usage against other compilers#64
add benchmarks to compare protocompile perf and memory usage against other compilers#64
Conversation
1a9524d to
0344eb4
Compare
| if !t.e.c.RetainASTs { | ||
| file.RemoveAST() | ||
| } |
There was a problem hiding this comment.
This is one significant improvement for memory usage in protocompile. It allows the AST to be dropped early in the process, so a large compilation doesn't accumulate a large amount of ASTs pinned on the heap.
| @@ -0,0 +1,21 @@ | |||
| module github.com/jhump/protocompile/internal/benchmarks | |||
There was a problem hiding this comment.
I made this a separate module so that go test ./... in the repo root would exclude these.
Also, I didn't want to pull in github.com/jhump/protoreflect as a dependency in the root go.mod.
| file *result | ||
| locs []protoreflect.SourceLocation | ||
| index map[interface{}]protoreflect.SourceLocation | ||
| index map[interface{}]int |
There was a problem hiding this comment.
This was another big win for reducing memory usage in protocompile.
| files map[protoreflect.FileDescriptor]struct{} | ||
| symbols map[protoreflect.FullName]symbolEntry | ||
| exts map[protoreflect.FullName]map[protoreflect.FieldNumber]ast.SourcePos | ||
| exts map[extNumber]ast.SourcePos |
There was a problem hiding this comment.
This looked like low-hanging fruit -- nested maps are often less dense/more wasteful than a single map. But files typically don't have enough extensions for this to be a material source of memory usage. So this wasn't really a noticeable improvement. I left the change in anyway since it made the insertion logic cleaner and the tests a little less verbose.
0344eb4 to
9edddae
Compare
…gs); run benchmarks in CI, too
This includes some tweaks to reduce memory usage in
protocompile, too.Perf Summary:
protocompileis about 20% faster thanprotoparse.protocompileallocates about 14% less memory thanprotoparse(* caveats below).protocompile's advantage in speed and memory is more pronounced: 50% faster and about 35% less allocation.protoc(protocompilenatively; andprotoparseusing "chunks" that are run concurrently like Buf CLI currently does) . Thoughprotocis the clear winner in single-threaded mode (not unexpected since it's optimized C++).protocbenchmark. Since it just shells out to an external executable, no memory is really used inside the Go process.)Caveats:
protocompileis about 26-28% larger thanprotoparse.protoparsedoes linking in batch, with large tables;protocompiledoes it file-at-a-time with tables per file).protocompileimplements;protoparsedoes not). But even when not generating source code info, the size difference persists.protocompile, on the other hand, is less than half as big asprotoparse.protocompilecan release the AST as it goes (as soon as it's no longer needed, after a file is complete), so the total memory pressure for a large compilation is just the total for descriptor sizes. But inprotoparse, the total memory pressure for a large compilation is the descriptor sizes plus the AST sizes. With that in mind, the total memory pressure forprotocompileis a little less than half that ofprotoparse.This includes my own implementation of code to measure the memory usage of an object. It tries to avoid double-counting memory by using a range tree to track regions of counted memory. I couldn't find anything that does this with some Google-searching... but maybe I missed it? Is there something that exists already like this? The tests also run
runtime.GC()and print the total heap usage, as a sanity check that the measured size is reasonable (since it is close to the reported total heap usage).I'm using
googleapissince I thought a large batch of files was a more interesting performance test than trying to do something like a micro-benchmark (which wouldn't be able to exercise the same variety and mix of call paths). This downloads it (much like similar tests in the buf repo).Here's the output of running tests and benchmarks in the new package:
Fixes TCN-545