Sneak Picsby

Pimp my JS: +200% of performance

Don't miss stories. Follow TopTechPhoto

Hi there, some time ago we released our online RAW converter Raw.pics.io. It is a pure JavaScript. Some users said it was a bit slow. And, you know, they were right. Today I will tell you how we made raw.pics.io almost 3x faster.

You probably wonder how it now compares to native applications. Here are some numbers:

Performance comparison

I don’t want to include tons of code lines with a lots of descriptions. I will try to focus on general principles of optimizations we used. As well, I will skip information regarding DOM access, reducing HTTP requests quantity, files minifying, server-side compression options, etc. There’s enough information about it here and there.

Let’s start

First thing we did was identifying the bottlenecks. I checked the whole flow and digged for blocks of code that took most of the processing time. Almost any software has these shitty lines of code. In most of the cases engineer knows about that parts. Or at least has some assumptions. It’s just a question of a good will to find that code. Except our genuine knowledge we use profiler and measure execution time with console.time(). It is worth to note a really practical object called performance. It appeared in browsers just recently.

Technology level optimization

Boosting performance with technology is probably the most thorough approach to performance optimization. At this level you can change programming language, compiler, technological stack. Everything required to reach the desired result. In our case we bet on parallel execution and won.

SIMD (Single Instruction Multiple Data)

Using MMX and SSE can boost performance significantly. It is good at operations concerning vectors (which was applicable for us). Such a thing exists in some programming languages and recently Intel announced Javascript language extension for SIMD. We were really excited when that happened. We checked Firefox nightly builds and found some realization of this technology. That’s a pity, but after a number of tests we concluded that it’s too early for SIMD. It is too slow in today’s implementation.

WebCL

We’re dreaming to start using this technology in Pics.io. A few month ago Khronos Group pulled out the first public spec that describes how WebCL should work. Sounds great but today it’s only available in a form of tests, hardly working add-ons and some custom builds of browsers. So that’s not a case for today but we are impatiently waiting for the first realizations.

WebWorkers

After a while, debayering task was optimized using WebWorkers. It’s a pretty stable browser feature that works fairly good. We’re splitting our picture into a number of chunks and processing each chunk in a separate thread (WebWorker). At first we used our proprietary wrapper for WebWorkers but after a while changed that to Parallel.js library. Please, note that it takes a time to initialize each WebWorker and some types of data couldn’t be passed into WebWorker and pulled back. But the biggest problem is that it’s not really easy to debug WebWorkers. You can debug them in Google Chrome only. Another tricky question is figuring out the right quantity of WebWorkers. We have found a correlation with the number of CPU cores, but browsers do not provide this information to JS environment.

So, WebWorkers is the only thing that works in raw.pics.io right now.

Algorithms optimization

It is essential to have right architecture and fast algorithms to speed up your software. We’re working with arrays of 20 000 000+ elements. When we speed up processing of each of the elements for just a bit, the whole process gets a significant boost. So eliminating of unnecessary operations is a good idea. That’s why we did a deep research on interpolation algorithms, re-coded them a number of times, changed math operators with bitwise shifts and eliminated a number of “if”s. Those milliseconds in a bulk gave us essential speed up. As well, we removed not exposed parts of the picture from the processing queue.

Optimization of structures and data types

Typed arrays

There are typed arrays in modern browsers. If you’re using this kind of arrays instead of classic arrays, your code will work much faster. In cases like ours (we’re working with binary data) ability to use typed arrays is like a sunshine on a rainy day. We won’t have such a speed without them.

Simple structures

The very first version of our decoder was built on top of a beautiful class hierarchy with a number of modules. That was really good from the point of OOP but it was also responsible for the poor performance during initialization and objects accessing. After some analysis we reduced hierarchy and kept just two modules. Such a denormalization reduced modules quantity and links between them. The flipside is that the code became a bit more complex.

Language level optimization

PerformanceNicholas Zakas has a number of outstanding posts regarding JavaScript performance. I don’t want to put all the details here, will try to mention the main. Slow code = one operation cost (time) x operations count. If we can’t reduce quantity of the operations then we have to reduce time of each operation. On each step we performed a function call, sometimes even twice. Probably you know that function call is pretty expensive so there’s a reason to avoid that in cycles. There’s no inline mechanisms in JavaScript (like in C++) which say compiler to put the function right in the place of a call. So we had to denormalize our code and eliminate those calls. The code became less readable but faster. Such a trick gave us performance boost in large cycles.

As well it worth to remember what is invariant and what doesn’t change in your cycles. Put your invariants out of the cycle scope. Check this:

// Slow cycle
for(var i=0; i<items.length; i++) {
// code is here...
}
// this one is faster
var size = items.length;
for(var i=0;i<size; i++){
// code is here...
}

There are tons of optimizations like the one above. The main idea is to reduce operation cost.

One more example is LookUpTables (LUT). We are cashing some values to avoid their calculation at each cycle. Some variables (pixel brightness in our case) might be the same so there’s no reason to calculate them all the time. But you can’t use LUT blindly. Sometimes it takes too much resources to pre-calculate LUT.

Platform based optimizations

JavaScript engines are quite different. Therefore the same code will be not equally effective in Firefox and Chrome. We didn’t performed such an optimization and right now our code is the same for all the browsers. I’m pretty sure that there’s a space for additional optimization. If you have some thoughts or experience on that, please, share with me in the comments.

Optimize perception
The most working trick is to show the result as fast as possible. Such kind of optimization could provide you with a number of ‘free’ seconds. While the user is waiting with the progress indicator we can show up some intermediate result – small map or low quality pre-rendered image. The main idea is to provide information early even if that quality is not enough. For example, Pinterest shows rectangulars with the intermediate color of images while loading them. Because of that user’s perception of the service performance is altered. In our case we can almost immediately show an embedded JPEG replacing that with actual decoding result. See it in our next release.

When it’s time to stop

If each next optimization step brings really small boost you better stop. It’s time to stop when you put too much effort (let’s say rewriting a huge piece of your code) in trade of a really small performance boost (less than 5%). Probably that’s enough to optimize or you’re on a wrong way. From the other hand, 5% might be essential if your operations are really loo-o-ong.

Afterword

Useful tools

In most of the cases it’s a good idea to bet on a speed of execution or memory consumption. Despite memory price is fairly low, there are cases when there is not enough RAM. You can check the RAM consumption using a profiler. But sometimes you can’t get what’s going on even with a profiler. Google Chrome supplies perfect developer tools. You can start Chrome with some tricky flags. For example, using this you can get access to the memory object and gc() method. Those are hidden in real life. Or with that you can route all the errors directly to the terminal window. With URL chrome://about you can find a whole bunch of embedded utils that will make you a happy developer.

How to check your results

There is a number of ways to optimize code, but you have no idea which one is better. Keep things simple and start with synthetic tests instead of rewriting all your code at once. Our favorite is http://jsperf.com/. Such a benchmarks let us to understand which optimization approach is better. Sometimes you can google and find tons of ready to use synth tests which you can always tune up for your case.
To conclude, the question “to optimize or not?” is a tricky one. The answer depends on a number of things that are hard to predict. Using everything above we boosted our RAW conversion process around 3x. Frankly, at the beginning it seemed totally impossible. I hope this post will help you to reach optimal performance of your JavaScript code.

Raw.pics.io screen

 

  • Nikola

    “Another tricky question is figuring out the right quantity of WebWorkers. We have found a correlation with the number of CPU cores, but browsers do not provide this information to JS environment.” – which is not entirely correct:

    https://github.com/oftn/core-estimator

    Natively supported in Chrome, and estimated in other browsers, which, according to my own limited testing, proved to be quite accurate.

    • Vlad Tsepelev

      Thanx. We already know about this library and probably will think about using it at start of app!