In this article, you’ll learn:
Well, we finally did it. “Cracking” of proprietary RAW formats from Canon and Nikon cameras for our service Pics.io is finished.
For those who do not know: the idea of Pics.io is to give people the opportunity to work with RAW images directly in the browser. Without installing any software, plug-ins and extensions - a true zero footprint approach.
When we started working on this about a year ago, we had a vision that in a few years the entire workflow of photography enthusiasts will move to the cloud. We knew that trend of mobility will get stronger, the need for collaboration eager, and cloud storage cheaper. For us it was obvious that Cloud/Web lacks just one piece of a puzzle.
See, you can do everything with an image on the Web except adequate editing. There are plenty of online editors (mostly based on Flash), but they can’t satisfy photographers because of the certain limitations: they support only 8-bit images (JPG, PNG), can’t edit large images. We decided to make a RAW capable editor.
Back then we had a bunch of prototypes working with DNG (open and documented RAW format from Adobe) that proved this can actually be done inside modern browsers using Javascript and WebGL. Unfortunately, we can’t force everyone to convert their RAW images to DNG. Even Adobe failed at it. It was clear that we must support native RAW formats and a few months ago we started working on the most widespread Canon and Nikon ones.
DNG and proprietary RAW formats processing
DNG has many advantages over CR2 and NEF. It is open and documented; there’s an ability to embed XMP inside it; more optimal data and metadata storing inside DNG container. About the differences and peculiarities of these formats we've already written, and you can find quite a lot of information on the web. Here we call attention to the technical aspects hidden from the average user.
Most RAW formats (CR2, NEF and DNG) are based on the TIFF (a tagged format). And since TIFF provides the opportunity to expand its structure using private tags, Canon and Nikon actively take advantage of this by writing a bunch of information they need for their own tags in their own format. Reasons why camera manufacturers do so remains a mystery to me, and if anyone has an idea about reasoning behind this, please, post it in the comments.
Processing (or development) of any RAW consists of two major steps: decompression of JPG resulting in a “raw” image captured by the camera sensor, and demosaicing (or debayering) needed to reconstruct the color information (camera sensor captures the brightness, not the color).
This is how camera sensor “sees” the world:
JPG decompression
The first thing you need to do is to parse the metadata required for decompression algorithm. File stores certain size information, offsets, information about the method of data storage etc.. With DNG everything is simple: all you need is clearly written in the specification and its always in one place (and isn’t scattered throughout the entire file). JPG Decompression becomes a pleasing experience.
CR2 format is a bit more complicated since the variables are scattered in different groups of tags and decompression algorithm vary slightly from camera to camera. Nikon in its format always uses the same algorithm, Huffman trees used for decompression is the only thing that varies. These trees (unlike in the case of Canon), can be subtracted from the metadata and doesn’t require rebuilding every time. Metadata is stored deep in the Makernote section which has its own format.
Actually, one of the main problems of proprietary formats is that within CR2 and NEF data is stored in one piece (actually Canon keeps several pieces, which then must be glued into one > _ <). DNG stores many small pieces (tiles), so the processing tasks can be easily parallelized. Comparing to an original RAW file, decompression on DNG is 3–4 times faster in our Raw.pics.io.
Some cameras that support DNG can write uncompressed data. File size is bigger, but you can skip the step of decompression.
Demosaicing of “raw” data
The second big step is demosaicing. Metadata required for this step is recorded by camera manufacturers into custom TIFF-tags. And the metadata structure is changed with the release of new models. When manufacturers add new features to the camera they also add new tags to their closed formats. That’s what complicates the support of these functions by third-party software. When it comes to restoring the correct white balance or gamma correction, we have to take into account the manufacturer and the camera model.
Of course we have an optimization (metadata caching) as we already know features of cameras and their “hardware”, but for parameters depending on the shooting conditions it is necessary to maintain the entire zoo of formats.
Generally, the process of demosaicing is quite resource intensive. We need to perform several operations on each pixel (or its surrounding pixels) and the images of 20 megapixels aren’t performing quickly. = ( Here we use WebWorkers and parallelize everything we can. But still, we need and want faster, so now we do look at SIMD, WebCL and other fresh’n’hot browser technologies which will help to speed up the process.
Afterword
During the development, we found out many interesting about the inner structure of RAW files. If anyone is interested in the topic, feel free to ask.
You can try to convert your CR2 and NEF files on the page of our RAW converter. Doesn’t work fast, you’ll need to wait 15–20 seconds, but the last barrier on the way of photographers to the cloud is moved away. And with the recent Google Drive price cut-off almost in 5 times... You can image. Soon you’ll see “Lightroom in a browser”. We’re working on it.
We’re always improving the Pics.io platform — and our users are the main source of insights! If you’ve got something in mind — don’t hesitate to drop us a word in the comments section below or on email. We’ll absolutely check it out and do our best to meet your needs.
If you haven’t registered yet, follow this link to create a free account.