Sneak Picsby

In-browser RAW Processing: How We Did It

Don't miss stories. Follow TopTechPhoto

raw imageWell, we finally did it. “Cracking” of proprietary RAW formats from Canon and Nikon cameras for our service Pics.io is finished.

For those who do not know: the idea of Pics.io is to give people the opportunity to work with RAW images directly in the browser. Without installing any software, plug-ins and extensions – a true zero footprint approach.

When we started working on this about a year ago, we had a vision that in a few years the entire workflow of photography enthusiasts will move to the cloud. We knew that trend of mobility will get stronger, need for collaboration eager, and cloud storage cheaper. For us it was obvious that Cloud/Web lacks just one piece of a puzzle.

See, you can do everything with an image on the Web except adequate editing. There are plenty of online editors (mostly based on Flash), but they can’t satisfy photographers because of the certain limitations: they support only 8-bit images (JPG, PNG), can’t edit large images. We decided to make a RAW capable editor.

Back then we had a bunch of prototypes working with DNG (open and documented RAW format from Adobe) that proved this can actually be done inside modern browsers using Javascript and WebGL. Unfortunately, we can’t force everyone to convert their RAW images to DNG. Even Adobe failed at it. It was clear that we must support native RAW formats and few month ago we started working on the most widespread Canon and Nikon ones.

DNG and proprietary RAW formats processing

DNG has many advantages over CR2 and NEF. It is open and documented; there’s an ability to embed XMP inside it; more optimal data and metadata storing inside DNG container. About the differences and peculiarities of these formats we already wrote, and you can find quite a lot of information on the web. Here we call attention to the technical aspects hidden from the average user.

Most RAW formats (CR2, NEF and DNG) are based on the TIFF (a tagged format). And since TIFF provides the opportunity to expand its structure using private tags, Canon and Nikon actively take advantage of this by writing a bunch of information they need to their own tags in their own format. Reasons why camera manufacturers do so remains a mystery to me, and if anyone has an idea about reasoning behind this, please, post it in the comments.

Processing (or development) of any RAW consists of two major steps: decompression of JPG resulting in a “raw” image captured by the camera sensor, and demosaicing (or debayering) needed to reconstruct the color information (camera sensor captures the brightness, not the color).

This is how camera sensor “sees” the world:

Bayer array

JPG decompression

The first thing you need to do is to parse the metadata required for decompression algorithm. File stores certain size information, offsets, information about the method of data storage etc.. With DNG everything is simple: all you need is clearly written in the specification and its always in one place (and isn’t scattered throughout the entire file). JPG Decompression becomes a pleasing experience.

CR2 format is a bit more complicated since the variables are scattered in different groups of tags and decompression algorithm vary slightly from camera to camera. Nikon in its format always uses the same algorithm, Huffman trees used for decompression is only thing that varies. These trees (unlike in the case of Canon), can be subtracted from the metadata and doesn’t require rebuilding every time. Metadata is stored deep in the Makernote section which has its own format.

Actually, one of the main problems of proprietary formats is that within CR2 and NEF data is stored in one piece (actually Canon keeps several pieces, which then must be glued into one > _ <). DNG stores many small pieces (tiles), so the processing tasks can be easily parallelized. Comparing to original RAW file, decompression on DNG is 3-4 times faster in our Raw.pics.io.

Some cameras that support DNG can write uncompressed data. File size is bigger, but you can skip the step of decompression.

Demosaicing of “raw” data

The second big step is demosaicing. Metadata required for this step is recorded by camera manufacturers into custom TIFF-tags. And the metadata structure is changes with the release of new models. When manufacturers add new features to the camera they also add new tags to their closed formats. That’s what complicates the support of these functions by third-party software. When it comes to restoring the correct white balance or gamma correction, we have to take into account the manufacturer and the camera model.

Of course we have an optimization (metadata caching) as we already know features of cameras and their “hardware”, but for parameters depending on the shooting conditions it is necessary to maintain the entire zoo of formats.

Generally, the process of demosaicing is quite resource intensive. We need to perform several operations on each pixel (or its surrounding pixels) and the images of 20 megapixels aren’t performing quickly. = ( Here we use WebWorkers and parallelize everything we can. But still, we need and want faster, so now we do look at SIMD, WebCL and other fresh’n’hot browser technologies which will help to speed up the process.

Afterword

During development we found out many interesting about the inner structure of RAW files. If anyone is interested in the topic, feel free to ask.

You can try to convert your CR2 and NEF files on the page of our RAW converter. Doesn’t work fast, you’ll need to wait 15-20 seconds, but the last barrier on the way of photographers to the cloud is moved away. And with the recent Google Drive price cut-off almost in 5 times… You can image. Soon you’ll see “Lightroom in a browser”. We’re working on it.

Pics.io Catalog

UPD: Two weeks after this post we boosted the performance of Raw.pics.io 3 times. Here’s how we did it.

  • http://programmer-art.org/ Daniel G. Taylor

    Very cool. I look forward to more serious editing tools in the cloud.

  • http://en.dutras.org/ Leandro G Faria Corcete DUTRA

    What about Olympus and Panasonic raw files?

    • Vlad Tsepelev

      Sure we will support these cameras, but first priority is to deal with performance.

  • Nick Whiu

    love your work!

    • Vlad Tsepelev

      Thanx!

  • GetBulb

    Pentax DSLRs let you use DNG as the native raw format instead of Pentax’s own; do Canon & Nikon not allow you to do the same?

    • http://toptechphoto.com/ Konstantin Shtondenko

      Unfortunately, they don’t. And most likely won’t do it.

  • Greg Popovitch

    Konstantin, that pretty impressive, but I’m a bit surprised that you are writing this in Javascript, it such a crappy language. If you’re serious about doing a web Lightroom, I think it would make more sense to write it in C++ and convert to asm.js with enscriptem.

    • Vlad Tsepelev

      Javascript is very powerful and much better than 10 years ago.
      Actually converting C++ code is not very good solution in this case — as a result you will get a blackbox with no ability to manage or optimise decoding process.
      Algorithms written on pure JavaScript works faster and more controllable than any library we tried to port to asm.js. There are alot of great libraries written on C or C++ and optimised for best performance. But that translated code will not use features or consider with restrictions of JS/browser — this code was written to work mainly on desktops. That’s why our code works faster than any of ported libraries — we know where JS is fast/slow and how to deal with it.

  • SLR

    I’m a photog and not a programmer but my big question is why I would want to do image work in the cloud. Zenfolio already let’s me store raw in the cloud with my site. From the little I know, manipulating large files is processor and gpu intensive. Why would I want to add the delay of a net connection to my rendering time? Since I have many terabytes of photos which would need to travel up and down for printing and the like and I have a feeling I will end up paying for that bandwidth.

    As to your question around RAW formats my guess is that it has to do with how updated hardware deals with RAW in each generation. Unlike batteries and chargers Canon et al don’t profit from format tweaks. If anything it hurts sales until Adobe updates camera RAW which isn’t everyday. You guys might want to talk to someone like Ming Thein or Jeffrey Friedl. They are much more software/tech savvy than most of us. Jeffrey writes widely used Lightroom plug-ins and Ming is just a brainiac.

    • http://toptechphoto.com/ Konstantin Shtondenko

      >I would want to do image work in the cloud
      Because you won’t need to hang external hard drive with you and will be able to implement a teamwork environment.

      >Zenfolio already let’s me store raw in the cloud with my site.
      Yeah, but they’re disconnected from your catalog, right? Zenfolio is just a backup in that case.

      >Why would I want to add the delay of a net connection to my rendering time?
      That’s not the case. We are independent of the bandwidth if files are located locally. Also, you can use Pics.io with a NAS.

      >You guys might want to talk to someone like Ming Thein or Jeffrey Friedl.
      Thanks for the intro.

      Wish you best!

      • SLR

        Thanks for your thoughtful reply. I always think of photography as a solitary pursuit but in editorial being able to collaborate on retouching could be very powerful.

  • jonaswagner

    This is really cool. Do you plan to release the base of this as open source? I planed to do some experiments with RAW files and Javascript and it would be a shame to reimplement it. Did you try to emcripten libraw? That would solve supporting all the weird formats I guess. In any case this is really cool work you are doing and I’m looking forward to seeing more of it! :)

  • Thomas L.

    Any plan to release this as an open source JavaScript library? I’d just be interested in rendering raw photos in a browser.

    • Vlad Tsepelev

      Actually we thought about it, but for current moment answer is NO.

  • Daniel Blok

    Another +1 for open sourcing this. I’d love to build a node-webkit app using this. I know this is build with the cloud in mind, but with terabytes of photos (and some other concerns) I’d rather keep things local.