I hate benchmarking code, as much as anyone (at this point, most viewers probably aren’t). Pretending that your value cache improves performance by 1000% is a lot more fun than testing to see what it does. Alas, benchmarking in JavaScript is still necessary, especially when JavaScript is used (When shouldn’t it?) in more performance-sensitive applications. Unfortunately, JavaScript doesn’t make benchmarking any easier due to many of its core architectural decisions.
What’s wrong with JavaScript?
JIT compiler reduces accuracy(?)
For those unfamiliar with the magic of modern scripting languages like JavaScript, their architecture can be quite complex. Rather than just running code through an interpreter that outputs instructions immediately, most JavaScript engines use an architecture more similar to compiled languages like C – they integrate Multi-layered “compilers”.
Each of these compilers offers a different trade-off between compile time and runtime performance, so users don’t need to expend computation to optimize code that is rarely run, while taking advantage of more advanced compilers for code that is run most often. Performance benefits of your code (“hot paths”). There are some other complications that arise when using an optimizing compiler, involving things like “Functional monotypy”, but I’ll spare you and avoid talking about that here.
So… why is this important for benchmarking? Well, as you might have guessed, since the benchmark is measuring Performance For programming code, a JIT compiler can have a considerable impact. When benchmarking smaller snippets of code, you will often see 10x+ performance improvements when fully optimized, but will introduce many errors into the results. For example, in the most basic benchmark setup (don’t use something like the following for a number of reasons):
for (int i = 0; i<1000; i++) {
console.time()
console.timeEnd()
}
(Don’t worry, we’ll discuss it console.time
also)
After a few attempts, much of your code will be cached, significantly reducing the time of each operation. Benchmarking programs often try to eliminate this caching/optimization, as it can also make programs tested later in the benchmarking process appear relatively faster. However, you ultimately have to ask whether unoptimized benchmarks match real-world performance. Of course, in some cases, such as infrequently visited web pages, optimization is unlikely, but in environments such as servers where performance is paramount, optimization should be done. If you have a piece of code running as middleware that handles thousands of requests per second, you better hope V8 is optimizing it.
So basically, even within an engine, there are 2-4 different ways to run code, with different performance levels. Oh, and also, in some cases, it’s very difficult to ensure that certain optimization levels are enabled. have fun:).
The engine will do its best to prevent you from timing it accurately
Do you know fingerprint recognition? Allowed technologies Do Not Track is used for assistance track? Yes, JavaScript engines have been doing their best to mitigate it. This effort, as well as preventing timed attackcausing the JavaScript engine to deliberately make timing inaccurate, so hackers cannot accurately measure the current performance of the computer or how expensive an operation is. Unfortunately, this means that without adjustments, the benchmarks suffer from the same issues.
The example in the previous section will not be accurate because it only measures in milliseconds. Now, switch it to performance.now()
. Great, now we have timestamps in microseconds!
console.time();
console.timeEnd();
const t = performance.now();
console.log(performance.now() - t);
Except…they are all in 100μs increments. now let us do add some titles to reduce the risk of timing attacks. Oops, we still only have 5μs increments. 5μs may be enough accuracy for many use cases, but you’ll have to look elsewhere for things that require higher granularity. As far as I know, no browser allows more granular timers. Node.js does, but of course, it has its own problems.
Even if you decide to execute the code through the browser and let the compiler do its work, obviously you’re still going to have more trouble if you want accurate timing. Oh yeah, not all browsers are the same.
Every environment is different
i love steamed stuffed bun For what it does to push server-side JavaScript, but boy, does it make server-side JavaScript benchmarking much more difficult. A few years ago, the only server-side JavaScript environments people cared about were Node.js and Denoboth use the V8 JavaScript engine (the same engine in Chrome). Bun instead uses JavaScriptCore, the engine in Safari, which has completely different performance characteristics.
The problem of multiple JavaScript environments with their own performance characteristics is relatively new in server-side JavaScript, but has long plagued clients. Three different commonly used JavaScript engines, namely V8, JSC and SpiderMonkey for Chrome, Safari and Firefox respectively, will all execute significantly faster or slower on the same code snippet.
An example of these differences is tail call optimization (TCO). TCO optimizes functions that are recursive at the end of their body, like this:
function factorial(i, num = 1) {
if (i == 1) return num;
num *= i;
i--;
return factorial(i, num);
}
Try benchmarking factorial(100000)
In the bun. Now, try the same thing in Node.js or Deno. You should receive an error similar to this:
function factorial(i, num = 1) {
^
RangeError: Maximum call stack size exceeded
In V8 (and by extension Node.js and Deno), every time factorial()
On the last call to itself, the engine creates a completely new function context for the nested function to run in, ultimately subject to call stacking. But why doesn’t this happen in Bun? The JavaScriptCore implementation used by Bun optimizes these types of functions by converting them into for loops, as shown below:
function factorial(i, num = 1) {
while (i != 1) {
num *= i;
i--;
}
return i;
}
Not only does the above design avoid the call stacking limitation, it’s also much faster because it doesn’t require any new function context, which means that a function like the above will benchmark very differently under different engines.
Essentially, these differences just mean that you should benchmark all engines you want to run your code on to make sure code that’s fast in one engine won’t be slow in another. Also, if you are developing a library that you want to use on multiple platforms, be sure to include more esoteric engines, e.g. Hermes;They have very different performance characteristics.
Honorable mention
- The garbage collector and its tendency to randomly pause everything
- JIT compiler can remove all code because it is “unnecessary”
- Flame graphs in most JavaScript development tools are very extensive
- I think you understand
So…what’s the solution?
I wish I could point to an npm package that would solve all of these problems, but there isn’t one.
On a server, your time will be slightly easier. You can use d8 Manually control optimization levels, control the garbage collector and get precise timing. Of course, you’ll need some Bash-fu to set up a well-designed baseline pipeline for this, and unfortunately d8 doesn’t integrate well (or at all) with Node.js. You can also enable certain flags in Node.js to get similar results, but you’ll miss out on features like enabling specific optimization layers.
v8 --sparkplug --always-sparkplug --no-opt [file]
D8 example with specific compilation layer (sparkplug) enabled. By default, D8 includes more GC control and more general debugging information.
Can you get some similar functionality on JavaScriptCore? To be honest, I haven’t used JavaScriptCore’s CLI much, and it’s heavily Insufficient records. You can enable specific layers using their command line flagsbut I’m not sure how much debugging information you can retrieve. Bun also includes some useful Benchmark Utilitybut like Node.js, they are limited.
Unfortunately, all of this requires a beta version of the base engine/engine, which can be difficult to obtain. I found the easiest way to manage the engine is Esvu and eshost-clibecause together they make it easier to manage engines and run code between them. Of course, there’s still a lot of manual work required, since these tools just manage running code across different engines – you still need to write the benchmark code yourself.
If you just want to benchmark your engine as accurately as possible on your server using default options, you can use ready-made Node.js tools such as measure Helps improve timing accuracy and GC related errors. Many of these tools (such as Mitata) also work on many engines; of course, you still need to set up the pipeline as above.
On a browser, everything is much more difficult. I’m not aware of any solution that would allow for more precise timing, and the control of the engine would be more limited. The most information you can get about the performance of JavaScript runtime in the browser will come from Chrome Dev Toolswhich provides basic flame graph and CPU slowdown simulation utilities.
in conclusion
Many of the same design decisions that make JavaScript (relatively) performant and portable make benchmarking more difficult than benchmarking in other languages. There are many more targets to benchmark against, and you have much less control over each target.
Hopefully, one day, solutions will simplify many of these problems. I may eventually make a tool to simplify cross-engine and compilation layer benchmarking, but for now, it would be quite a bit of work to create a pipeline to solve all of these problems. Of course, it’s important to remember that these issues don’t apply to everyone – if your code only runs in one environment, don’t waste time benchmarking other environments.
No matter how you choose to benchmark, I hope this article has shown you some of the problems with JavaScript benchmarking. Let me know if a tutorial on implementing some of the things I described above would be helpful.