JavaScript – Thinking Inside a Different Box

I was evaluating a nifty looking lightweight Javascript interpreter called DukTape. It has a set of performance metrics, the first of them being Octane. I google’d Octane, never having heard of it before. My first hit was this: https://v8.dev/blog/retiring-octane

The short summary of that page is pretty simple. Basically, Octane is was a benchmark for a programming language (Javascript), and just like every other language benchmark ever made, it turned out to be minimally useful — detrimental even because it tested for things no real programmer would ever do, and to optimize for those things would mean generating worse code for real software. Google stopped using Octane early, and then released this blog a couple years later when they realized too many other people were treating it like something useful.

First, the blog reads like the same old sob story I’ve read about so many other short-lived compiler benchmarks — they result in over-optimization for the benchmark and hurt real software in the long run.

Second, why people feel compelled to turn such simple conclusions into extremely long-winded blog posts, I’ll never know. There’s a certain amount of verbiage dedicated to them defending the benchmark, pointing out the couple of narrow-scope items it apparently did help with. Though I have a hard time buying it based on comments made at the end of the blog. Let’s review quickly:

[…] the Octane benchmark suite was first released in 2012.

By 2015, however, most JavaScript implementations had implemented the compiler optimizations needed to achieve high scores on Octane.

So it had an effective lifespan of three years. That’s not very good for a benchmark, even in the long history of short-lived programming language benchmarks. Also interesting is that this blog was written in 2017, more than two years after Google had already discontinued using it internally. So Google released a benchmark, and then while everyone else was busily trying to “be like Google!”, Google internally was going in a completely different direction. How every Microsoft-like of them. Just another interesting example of how being a large and successful corporate entity can result in competitive advantage even when you’re meaning to be helpful.

Octane helped engine developers deliver optimizations that allowed computationally-heavy applications to reach speeds that made JavaScript a viable alternative to C++ or Java.

Wut? First, Java even isn’t a viable alternative to C++ for most “computationally heavy” applications so I have no idea how those could be mentioned together in the same object of the preposition of that sentence. Any quick browse through StackOverflow for the millions of questions about horribly slow Java performance for simple data algorithms is proof of that. Second, JavaScript on a good day doesn’t hold a stick up to Java or C++. Ok, sure, if you have a simple loop iterating over a fixed array, all three are kinda going to give you a similar benchmark result, because that’s so easy to peephole-optimize for. And if you have a lot of complex string operations, all three are going to give a similar perf profile because most of the bottleneck is in memory allocation and executing internal natively-optimized libraries. And sure JavaScript and Java do better than C++ for memory allocation, but only because they defer overhead to the Garbage Collector later on, which brings us to:

In addition, Octane drove improvements in garbage collection which helped web browsers avoid long or unpredictable pauses.

Fair enough. Garbage collectors are notoriously hard to debug and improve in real-world applications. It definitely pays to have a set of stress-tests for various memory-pounding patterns. But I’m kind of suspicious that if this test were so useful, why wasn’t it split out and offered separately from the Octane Suite?

Please also take note: this “unpredictable pauses” thing wouldn’t even be a problem if they weren’t using a GC memory model in the first place. This highly volatile random-pause problem is a side effect of trying to get the perf up on the lazy execution of object-boxing and mutable string operations. It’s that simple. These sort of problems only happen when you have a GC-based heap.

The next frontier, however, is improving the performance of real web pages, modern libraries, frameworks

No kidding? That should be the first thing you do, not the “next frontier”. With the exception of a garbage collector, the best way to optimize any compiler or interpreter is by benchmarking real-world applications and optimizing for their hotspots. Maybe 20 yrs ago when JavaScript was still new, the idea of benchmarking the real-world might not have been a good idea. But that ship sailed more than a decade ago. Octane as a “benchmark” was obsolete before it even came out, and the V8 blog making a point to try to defend it only serves as more proof to that end.

By way of example, one of the best ways to benchmark a garbage collector is to take real-world JavaScript applications and:

change the GC parameters so that the GC either has very little heap or a whoooole lot of heap
Feed the application an excessively large data set, which is something that can usually be done pretty easily even with web apps you didn’t write.

Front-end JS apps are readily available to be manipulated simply by way of them being downloaded in source-code form and run on a client. Server-side JS apps are only slightly trickier, but a vast majority have been built on top of the 100% open source NPM ecosystem, so there’s still plenty of real-world applications to optimize from. Every compiler should exhaust these options before turning to benchmarks.

Category: JavaScript

On the Retiring of the V8 Octane Benchmark