It’s October 2016, and here at Mystery Box we’ve been working in HDR video for a little over a year.

While it’s easier today to find out information about the new standard than it was when I first started reading the research last year, it’s still not always clear what it is and how it works. So, to kick off our new weekly blog here on mysterybox.us, we’ve decided to publish five posts back-to-back on the subject of HDR video. This is Part 1: What is HDR Video?

HDR video is as much of a revolution and leap forward as the jump from analog standard definition, to digital 4K.

Or, to put it far less clinically, it’s mind-blowingly, awesomesauce, revolutionarily, incredible! If it doesn’t get you excited, I’m not sure why you’re reading this…

So what is it about HDR video that makes it so special, so much better than what we’ve been doing? That’s what we’re going to dive into here.

HDR Video vs. HDR Photography

If you’re a camera guy or a even an image guy, you’re probably familiar with HDR photography. And if you’re thinking “okay, what’s the big deal, we’ve had HDR for years”, think again. HDR video is completely unrelated to HDR photography, except for the ‘higher dynamic range’ part.

In general, any high dynamic range technique seeks to capture or display more levels of brightness within a scene, that is, increase the overall dynamic range. It’s kind of a ‘duh’ statement, but let’s go with it.

In photography, this usually means using multiple exposures at different exposure values (EVs), and blending the results into a single final image. The catch, of course, has always been that regardless of how many stops of light you capture with your camera or HDR technique, you’re still limited by the same 256 levels of brightness offered by 8 bit JPEG compression and computer/television displays, or the slightly bigger, but still limited set of tonality offered by inks for print.

So, most HDR photography relies on creating regions of local contrast throughout the image, blending in the different exposure levels to preserve the details in the darks and the lights:

Photograph with standard contrast vs. the same with local contrast

While the results are often beautiful, they are, at their core, unnatural or surreal.

HDR Video is Completely Different

Instead of trying to compress the natural dynamic range of a scene into a very limited dynamic range for display, HDR video expands the dynamic range of the display itself by increasing the average and peak display brightnesses (measured in nits), and by increasing the overall image bit depth from 8 bit to at least 10 bits / channel, or from 255 brightness levels & 16 million colors, to at least 1024 brightness levels & 1.02 billion colors.

Standard Video / Photography Range vs. HDR Photography vs. HDR Video Ranges

The change of the display light level allows for extended ranges of tonalities through the darks and the lights, so that the final displayed image itself is a more natural rendering of a scene, one that’s able to match the overall dynamic range of today’s digital cinema and film-sourced cameras. And perhaps more importantly, when fully implemented, HDR video will almost completely match the dynamic range of the human eye itself.

How big of a deal is it? I can’t describe it better than my younger brother did the first time I showed him HDR video:

“I want to say that it’s like you’re looking through a window into another world, except that when you look through a window, it’s not as crisp, or as clean, or as clear as this”.

How did we get here?

So if HDR video is so much better than what we’ve been using so far, why haven’t we been using it all along?

And now, for a history lesson (it’s interesting; but it’s not essential to know, so skip down if you don’t care).

Cathode Ray Tubes as scientific apparatus and ‘display’ devices have been around in some form or another since the late 1880s, but the first CRT camera wasn’t invented until the late 1920s. Early cameras were big with low resolutions; televisions were grainy, noisy, and low fidelity.

Things changed quickly in the early years of television. As more companies jumped on board the CRT television bandwagon, each created slightly different, and incompatible, television systems in an effort to avoid patent rights infringement. These different systems, with different signal types, meant that home television sets had to match the cameras used by the broadcaster, i.e., they had to be the made by the same company. As a result, the first broadcaster in an area created a local monopoly for the equipment manufacturer they sourced their first cameras from, and consumers had no choice.

Foreseeing a large problem when more people started buying televisions sets, and more broadcasters wanted to enter an area, the United States government stepped in and said that the diversity of systems wouldn’t fly - all television broadcasts and television sets had to be compatible. To that end they created a new governing body, the National Television System Committee, or NTSC, which went on to define the first national television standard in 1941.

We’ve had to deal with the outcomes of standardization, good and bad, ever since.

The good, obviously, has been that we don’t have to buy a different television for every channel we want to watch, or every part of the country we want to live in (though transnationals are often still out of luck). The bad is that every evolution of the standard since 1941 has required backwards compatibility: today’s digital broadcast standards, and computer display standards too, are still limited in part by what CRTs could do in the 1940s and 50s.

Don’t believe me? Even ignoring the NTSC 1/1.001 frame rate modifier, there’s still a heavy influence: let’s look at the list:

Color Space: The YIQ color space for NTSC and the YUV color space used in both PAL and SECAM are both based on the colors that can be produced by the short glow phosphors, which coat the inside of CRT screens and form the light and color producing element of the CRT. In the transition to digital, YIQ and YUV formed the basis for Rec. 601 color space (SD Digital), which in turn is the basis for Rec. 709 (HD Digital) color space (Rec. 709 uses almost the same primaries as Rec. 601).

And just in case your computer feels left out, the same color primaries are used in the sRGB display standard too, because all of these color spaces were display referenced, and they were all built on the same CRT technology. Because up until the early 2000s, CRTs were THE way of displaying images electronically - LCDs were low contrast, plasma displays were expensive, and neither LEDs nor DLPs had come into their own.
Transfer Function: The transfer function (also called the gamma curve) used in SD and HD is also based on the CRT’s natural light-to-electrical and electrical-to-light response. The CRT camera captured images with a light-to-voltage response curve of approximately gamma 1/2.2, while the CRT display recreated images with a voltage-to-light response curve of approximately gamma 2.4. Together, these values formed the standard approximate system gamma of 1.2, and form the basis for the current reference display gamma standard of 2.4, found in ITU-T Recommendation BT.1886.
Brightness Limits: Lastly, and probably most frustratingly, color accurate CRT displays require limited brightness to maintain their color accuracy. Depending on the actual phosphors used for primaries, that max-brightness value typically lands in the 80-120 nits range. And consumer CRT displays, while bigger, brighter, and less color accurate, still only land in the 200 nit max brightness levels. For comparison, the brightness levels found on different outdoor surfaces during a sunny day land in the 5000-14,000 range (or more!).

This large brightness disparity between reference and consumer display levels has been accentuated in recent years with the replacement of CRTs with LCD, Plasma and OLED displays, which can easily push 300-500 nits peak brightness. Those brightness levels skew the overall look of images graded at reference, while being very intolerant of changes in ambient light conditions. In short this means that with the current standards, consumers rarely have the opportunity to see content in their homes as filmmakers intended.

So, because of the legacy cathode ray tube, (a dead technology), we’re stuck with a set of legacy standards that limit how we can deliver images to consumers. But because CRTs are a dead technology, we now have an opportunity where we can choose to either be shackled by the 1950s for the rest of time, or, to say “enough is enough,” and use something better. Something forward thinking. Something our current technology can’t even match 100% yet. Something like, HDR video.

The HDR Way

At the moment, there two different categories and multiple standards covering HDR video, including CTA’s HDR 10 Media Profile, Dolby’s Dolby Vision, and the BBC’s Hybrid Log Gamma. And naturally, they all do things just a little differently. I’ll cover their differences in depth in Part 3: HDR Video Terms Explained, but for now I’m going to lump them all together and just focus on the common aspects of all HDR video, and what makes it different than video from the past.

There are four main things that are required to call something HDR video: ITU-T Recommendation BT.2020 or DCI-P3 color space, a high dynamic range transfer function, 10 bits per channel transmission and display values, and transmitted metadata.

BT.709, DCI-P3, and BT.2020 on CIE XYZ 1931

1. Color Space: For the most part, HDR video is seen by many as an extension of the existing BT.2020 UHD/FUHD and DCI specifications, and as such uses either the wider BT.2020 color gamut (BT.2020 is the 4K/8K replacement for BT.709/Rec.709 HD broadcast standards), or the more limited, but still wide, DCI-P3 gamut.

BT.2020 uses pure wavelength primaries, instead primary values based on the light emissions of CRT phosphors or any material. The catch is, of course, we can’t fully show these in a desktop display (yet), and only the most recent laser projectors can cover the whole color range. But ultimately, the breadth of the color space covers as many of the visible colors as is possible with three real primaries*, and includes all color values already available in Rec.709/sRGB and DCI-P3, as well as 100% of Adobe RGB and most printer spaces available with today’s pigments and dyes.

2. Transfer Function: Where HDR video diverges from standard BT.2020 and DCI specs is in its light-level-to-digital-value and digital-value-to-light-level relationship, called the OETF and EOTF respectively. I’m going to go into more depth on OETFs and EOTFs at another time, but for now what we need to know is that the current relationship between light levels and digital values is a legacy of the cathode ray tube days, and approximates gamma 2.4. Under this system, full white digital value of 235 translates to a light output of between 80-120nits.

Extending this same curve into a higher dynamic range output proves problematic because of the non-linear response of the human eye: it would either cause severe stepping in the darks and lights, or it would require 14-16 bits per channel while wasting digital values in increments that can’t actually be seen. And it still wouldn’t be backwards compatible, in which case, what’s the point?

So instead, HDR video uses one of two new transfer curves: the BBC’s Hybrid Log Gamma (HLG), standardized in ARIB STD-B67, which allows for output brightness levels from 0.01 nit up to around 5000 nits, and Dolby’s Perceptual Quantization (PQ) curve, standardized in SMPTE ST.2084, which allows for output brightness levels from 0.0001 nit up to 10,000 nits.

Gamma 2.4 vs Hybrid Log Gamma vs Perceptual Quantization Approximate EOTFs

PQ is the result of direct research done by Dolby to measure the response of the human eye, and to create a curve where no value is wasted with no visible stepping between values. The advantage of PQ is pretty clear, in terms of maximizing future output brightness (the best experimental single displays currently max out at 4000 nits; Dolby’s test apparatus ranged from 0.004 to 20,000 nits) and increasing the amount of detail captured in the darks.

HLG, on the other hand, provides a degree of backwards compatibility, matching the output levels of gamma 2.4 for the first 50% of the curve, and reserving the top 50% of the values to the higher light level output. Generally, HLG content with a system gamma of 1.2 looks pretty close to standard dynamic range content, though it’s whites sometimes end up compressed and greyer than content mastered in SDR to begin with.

Footage graded in Rec. 709 and the same graded in HLG.

(Side note: I prefer grading in SMPTE ST.2084 because of the extended dynamic range through the blacks, and smoother roll-into the whites.)

3. Bit Depth: The new transfer curves accentuate a problem that’s been with video since the switch from analog to digital values: stepping. As displays have gotten brighter, the difference between two code values (say, digital value of 25 and 26) is sometimes enough that we can see a clear distinguishing line between the two greys. This is especially true when using a display whose maximum brightness is greater than reference standard, and is more common in the blacks than in the whites.

Both the BT.2020 and DCI standards already have requirements to decrease stepping by switching signal encoding and transmission from 8 bits per channel to 10 bits minimum (12 bits for DCI), allowing for at least a 4 times smoother gradient. However, BT.2020 still permits 8 bit rendering at the display, which is what you’ll find on the vast majority of televisions and reference displays on the market today.

On the other hand, HDR video goes one step further and requires 10 bit rendering at the display panel itself; that is, each color sub pixel must be capable of between 876 and 1024 distinguishable light levels, in all operational brightness and contrast modes.

The reason that HDR requires a 10 bit panel while BT.2020 doesn’t, is that our eyes are more susceptible to stepping in the value of a color or gradient than to stepping in its hue or saturation: the eye can easily make up for lower color fidelity (8 bits per channel in BT.2020 space) by filling in the gaps, but with an HDR curve the jump in light levels between two codes in 8 bits per channel is big enough that it’s clearly noticeable.

Comparison between gradients step sizes at 8 bit, 10 bit, and 12 bit precisions (contrast emphasized)

4. Metadata: The last thing that HDR video requires that standard BT.2020 doesn’t, is metadata. All forms of HDR video should include information about both the content and the mastering environment. This includes which EOTF was used in the grade, the maximum and frame average brightnesses of the content and display, and which RGB primaries were used. Dolby Vision even includes metadata to define, shot by shot, how to translate the HDR values into the SDR range!

Consumer display manufacturers use this information to adapt content for their screens in real time, knowing when to clip or compress the highlights and darks (based on the capability of the screen it’s being shown on), and for the automatic selection of operational mode (switching from Rec. 709 to BT.2020, and in and out of HDR mode, without the end user ever having to change a setting).

So, in summary, what does HDR video do differently? Wider color gamuts, new transfer function curves to allow for a much larger range of brightnesses, 10 bits per channel minimum requirement at the display to minimize stepping, and the transmission of metadata to communicate information about the content and its mastering environment to the end user.

All of which are essential, none of which are completely backwards compatible.

Yes, but what does it look like?

Unfortunately, the only way to really show you what HDR looks like is to tell you to go to a trade show or post house with footage to show, or buy a TV with HDR capabilities and stream some actual HDR content. Because when you show HDR content on a normal display, it does not look right:

Images in SMPTE ST.2084 HDR Video formats do not appear normal when directly brought into Rec. 709 or sRGB Gamma 2.4 systems

You can get a little bit of a feel for it if I cut the brightness levels of a standard dynamic range image by half, and put it side-by-side with one that more closely follows the HDR range of brightnesses:

Normalized & Scaled SMPTE ST.2084 HDR Video vs Rec. 709 with Brightness Scaled

But that doesn’t capture what HDR video actually does. I don’t quite know how to describe it - it’s powerful, beautiful, clear, real, present and multidimensional. There’s an actual physiological and psychological response to the image that you don’t get with standard dynamic range footage - not simply an emotional response to the quality of the image, but the higher brightness levels actually trigger things in your eyes and brain that let you literally see it differently than anything you’ve seen before.

And once you start using it on a regular basis, nothing else seems quite as satisfactory, no other image quite as beautiful. You end up with a feeling that everything else is just a little bit inadequate. That’s why HDR will very rapidly become the new normal of future video.

So that's it for Part 1: What is HDR Video? In Part 2 of our series on HDR video, we’re going to cover what you need to grade in HDR, and how can you cheat a bit to get a feel for the format by emulating its response curve on your existing reference hardware.

Written by Samuel Bilodeau, Head of Technology and Post Production

Endnotes:

* While ACES does cover the entire visible color spectrum, it’s primary RGB values are imaginary, which means that while it can code for all possible colors, there’s no way of building a piece of technology that actually uses the ACES RGB values as its primary display colors. Or in other words, if you were to try and display ACES full value RED, you couldn’t, because that color doesn’t exist.

Blog

HDR Video Part 1: What is HDR Video?

HDR Video vs. HDR Photography

How did we get here?

The HDR Way

Yes, but what does it look like?