Help me understand bit depth and color space better

roxics

Veteran
I grew up with analog video as a kid and teenager. By the time I became an adult, DV was a thing. I moved into my professional career shooting DV and soon HDV, then onto DSLRs and mirrorless cameras. I currently shoot on the C100 and GH4. I’ve lived in an 8bit 4:1:1/4:2:0 world my entire amateur and professional career. And since I don’t really know any better (outside of 12/14bit raw photos) I’ve been a fairly happy camper.

It’s not that I’ve never used higher bitrates. I’ve just never shot them. Unless you count movie film, which I doubt. I’ve created higher bitrate files as intermediates when compositing. For example, going from an HDV green screen source to a DNxHD file with an alpha channel. Maybe accepted some higher color space or bit depth footage from an outside shooter from time to time over the years. But I don’t remember.

I understand the basic concept. More bit depth adds more gradiation to the image. Which is good for skies and anything with subtle variation. Although I’m not entirely sure how that works with bit depth.

Color space is about color on a per pixel level. Which is good for color correction/grading and keying.

What I don’t completely understand are why certain choices are made and how that changes the file sizes.

1. Which is better and for what: 8bit 4:2:2 or 10bit 4:2:0?

2. Why did Sony choose 8bit 4:2:2 for XDCAM Disc instead of 10bit 4:2:2?
I’m guessing file size, with a 50Mbps codec, so why not 10bit 4:2:0 instead of 8bit 4:2:2?
There is a whole series of questions I could get into with this, like why 50Mbps? And why do some people believe that’s too little? But I’ve also heard that’s about standard for broadcast. So if it’s good for broadcast why is it too little? I’m not really sure.

3. How much more file space does 10bit 4:2:0 take over 8bit 4:2:0?

4. How much more file space does adding 4:2:2 take up over 4:2:0 for a given bitrate?

5. How much more file space does 12bit take up?

6. Does 12bit 4:2:0 exist anywhere, and why or why not?

7. Why doesn’t anyone use 4:1:1 anymore like my old DV camera?

8. My understanding is that most TV panels/computer displays are 8bit or even 6bit that fake 8bit with dithering. So how do 10bit files display better gradiation if the display can’t display it natively? I’m guessing it’s like downsampling higher resolutions, start with a good source and smoosh all that goodness down. That in this case the 10bit is more for editing than displaying.

9. How does (compressed) raw play into all of this?
For example, why are some raw files 12bit, some 14bit, some 16bit? Are there any 8bit or 10bit raw files?

10 (bonus). How does any of this play into analog video space?
I know that these are digital concepts, but I know in the days of analog video there was some degree of this going on as well. Just as a history lesson I would love to know more about how all of that worked with the different signals. We had composite, versus s-video, versus component cables and so on. I also know even analog video can be compressed using MUSE. Although I really don’t understand that.

11 (bonus random question). Does anyone still use the Sony XDCAM Disc system? I just discovered this even existed about a month ago and it fascinates me. At first I thought it was blu-ray in a disc caddy, but apparently not. Though it must be a variation of blu-ray technology with such similar disc capacities. I'm curious how widespread this system was and why it took this long for me to even hear about it.
 
Bit depth is 'levels' between black and white.

255 for 8 bit
1024 for 10 bit
4096 for 12 bit

As you can see 10 and 12 bit have much more tonality which particularly comes inportant when grading an image or recording a wide dynamic range.

This I found
https://datavideo.com/us/article/What are 8-bit, 10-bit, 12-bit, 4:4:4, 4:2:2 and 4:2:0

Has a nice pic of 420 colour - getting 8 pixels and squashing them into 2!

=======

2 - see answer to 8
with a target display of 8 bit why have a 10 bit file
50mbs was considered to be the best combo of file size and image quality for HD.
Broadcasters store huge amounts of data.. so minimising it is good.
Its too little (for aquisition) because today we have 4k, we crop, we grade, we stabilise

3) 4 *

4) bitrate defines file size so any file of the same bitrate is the same size

5) 16* 8bit and 4 * 10 bit

6) because is is a poor method of compression - compression must be targetted to degrade the image in manners that our brains dont notice. eg off colour and compressed shadows we dont notice too much.

(By poor method of compression an example could be remove the green channel.. 2/3 the file size but instant odd colour.. a poor method)

7) because we dont need such small codecs in the age of cheap cards and drives

8) This one to think on. A 10 bit file does NOT look better on an 8 bit monitor. The monitor just crushes all the gradients (colour levels) toghether - a 10 bit file only has value as a place to grade from. Once you grade that file because it can store colours that an 8 bit cannot. The transformations performed in grading may 'need' those colours (when a colour is 'missing' we get banding)

9) ONe wonders what compressed raw really is. As raw is just a bunch of numbers off the sensor. The C200 at 50p has '10bit' raw - I think it is just some form of canon codec though.


11) afaik solid state. (camera cards) are cheaper smaller more reliable and one would therefore be nuts to use moving disk recording apart from getting some old school vibe.
 
8) Most panels in the pro world are usually 8-bit today, and most panels that are called 10-bit are really 8-bit that are dithered (8+2). True 10-bit panels are usually $$$.

11) Surprisingly yes. This year, I’ve shot several times on one of the popular cable real estate “reality” type shows and the crew that I have worked with still uses XDCAM HD discs and cameras. And as a side note, the codec itself is still alive & kicking and being used daily, mostly in ENG on SxS card based cameras. Side note 2: The F55/5 can record in the XDCAM HD codec, too(along with many other codecs including HDCAM SR).
 
Many big-name reality shows, especially Survivor have been using XDCAM optical for more than 10 years. It was a great medium for it's time, but can't compete with modern solid state memory cards and drives.
 
FWIW, high end/pro (medium format) cameras use 16-bit color space. Many consider 14-bit sufficient and 16-bit excessive but, when you're doing mag covers, 16-bit is what is expected.

As to codecs, they're a constantly moving target. When the first "consumer" 4K (JVC HMQ-10U) camcorder came out in 2012, it had a (four stream) 8-bit 144 Mbps MPEG-4 and it was considered the state of the art for $4,000. A year after GH-4 was released with 8-bit 4:2:0 in February, 2014, Panasonic added an external 10-bit 4:2:2 via Atmos Shogun and, for the money, that was the new state of the art. And the bit rates and the recording speeds keep climbing higher and higher. R5 8K DCI Raw is 2.6Gbps. And there's more to come (such as Blackmagic Design Ursa 12K that shoots 120 fps 12k raw at 5:1 compression at ~ 5 Gbps).
 
Bit depth is 'levels' between black and white.


3) 4 *

4) bitrate defines file size so any file of the same bitrate is the same size


Thank you :)
However I'm a little confused about your answer to number three (4 *). Are you saying 4 times (as in four times larger file size) or see answer number 4?
I think I read somewhere else that the file size increases by 125% when going 10bit over 8bit. But I wasn't sure about that.

As for number 4, I apologize. I meant to say bit depth instead of bitrate. So the question should be:

4. How much more file space does adding 4:2:2 take up over 4:2:0 for a given bit depth (10bit, 12bit) over 8bit?

So in other words, how much bigger is 8bit 4:2:2 over 8bit 4:2:0 or 10bit 4:2:2 over 10bit 4:2:0?
 
The answer to 3 is 'four times'

I dont know the answer to 4 but it may be mentioned in that link.

I think the answer to four is 4+4+4 =12 4+2+0=6

so 4:2:0 footage is 1/2 the size of 4:4:4 footage

but that is before other compression elements are brought into play.

My own experiments says the other compression elements actualy make more difference than these numbers.
 
My math isn't great but I think that would mean that 4:2:2 would then be something like 33% larger. 4+2+2=8. Which seems like it would make sense.

10bit being four times larger doesn't seem like it would make sense though. On paper it does, but in practice it doesn't seem to me like a 10bit 4:2:0 file is four times larger in file size than an 8bit 4:2:0 file.
Just looking at files from a 4K Blu-ray for example, those are 10bit 4:2:0 and despite H.265 better compression (something like 45-50%) for similar quality, you're also dealing with four times as many pixels and yet they still fit on 66GB and 100GB Blu-ray discs. So either they are way more compressed than regular 1080p 8bit H.264 Blu-rays, or something else is going on maybe? I don't know.
 
So then according to all of this. a 25GB Blu-ray file that stores (on average) 135 minutes at 8bit 4:2:0 1080p, would be 100GBs if converted to 10bit 4:2:0. But only 50GBs if converted to 8bit 4:4:4. Correct?
 
Also how is it that the GH5 only jumps up to 150Mbps when recording 10bit 4:2:2 versus 100Mbps when recording 8bit 4:2:0?

That's only a 50% increase in file size, not a 400% increase for just the bit depth alone. That's not even counting the extra file space you'd need for the increased color by going 4:2:2 over 4:2:0. So I'm assuming you're better off shooting 8bit in that case because there is less compression?
 
Last edited:
GH5 has a 400Mbps option as well.

All of the companies control their technology inside of their hardware differently. Before a few years ago, most companies were providing very low bitrates for 99% of their 4K cameras because they were considered consumer and prosumer products.

Changes in the industry forced them to provide better specifications, but they are still controlling the processing pipeline in other ways. There are hundreds of tests out there that compare 8-bit and 10-bit and no one ever sees a difference. In practice, you may/will in heavy post work, but still questionable.

To me, 10-bit is mostly a gimmick and old technology that's slowly being replaced by RAW recording which is how all cameras should have operated 10 years ago but couldn't because of patents.

If you're just curious in all of this as a history lesson, that's cool - but I wouldn't worry too much about it professionally as most of it will be as relevant as black and white film in 5 years or so.

[P.S. Different algorithms in post applications affect compression sizes as well.]
 
Thanks. Yeah it's more academic for me. I just want to understand it all better.

8bit 4:2:0 has served me fine for the last fifteen years of my professional career. I haven't really needed anything more for what I do. Although I admit raw would be nice from time to time. But I'm still shooting 1080p on a couple C100s because it works. My clients don't need anything more than that and I don't want to store anything more than that, if I don't have to. Why give people free Big Macs when all they're ordering and paying for are cheeseburgers? So I just try to give them great cheeseburgers.
 
Your train of thought actually makes perfect sense in 2021 because the industry is ruined.

Everyone who's not in the 1% has to scratch and claw because everyone has a camera and can do a good enough job with it.

And I think most appreciate their good clients more than their clients appreciate them. As a freelancer, you hold onto them for dear life.

___

10-20 years ago, even 5 years ago, we invested in the better technology, and audio, and lighting packages to get better rates, get hired by people who have better positions, opportunities than us (like maybe real producers).

So if you were making $500 a day, you could suddenly start to maybe find a few more clients who paid $1000, or $5000, whatever.

It's very different today and turbulent, although most people here still probably couldn't do anything with the C100 anymore so consider yourself fortunate!
 
Success in this business has very, very little to do with what equpment you have at your disposal. Knowing what to do with it is of far more value. The best clients don't really care what gear you have chosen to get the job done -- that they hired you to do. The more a client cares about the technnical specications of the equipment -- the less I am interested in having them for a client.

Investing in superior gear is for our own benefit as it allows us to work faster, better, more efficiently, and bring production value not possbile with lesser equipment. When I invest in new equipment, or make choices between competing gear, it is for my own benefit.
 
The answer to 3 is 'four times'

I dont know the answer to 4 but it may be mentioned in that link.

I think the answer to four is 4+4+4 =12 4+2+0=6

so 4:2:0 footage is 1/2 the size of 4:4:4 footage

but that is before other compression elements are brought into play.

My own experiments says the other compression elements actualy make more difference than these numbers.

Wait a tick. 4:2:0 records 2 values in the same image area that 4:4:4 records 8 values. So I would think it would take 1/4 the space, all else being equal?

But regarding bitrates and compression ratios on Panasonic cameras and the like, I wouldn't assume that a jump from a 100Mbps to a 150Mbps codec is a straightforward increase in quality and compression based on the change in your codec. In other words, they might be rounding it to the nearest 50Mbps and/or giving you higher compression without mentioning it. Whereas on RED and BM cameras, they will scale your bitrate exactly with your resolution and compression settings. Afaik

EDIT: actually, 8 luminance values + 2 chroma values for 4:2:0 = 10 values total. 8 luma + 8 chroma = 16 for 4:4:4. So 1.6x the size?
 
Last edited:
Success in this business has very, very little to do with what equpment you have at your disposal. Knowing what to do with it is of far more value. The best clients don't really care what gear you have chosen to get the job done -- that they hired you to do. The more a client cares about the technnical specications of the equipment -- the less I am interested in having them for a client.

Investing in superior gear is for our own benefit as it allows us to work faster, better, more efficiently, and bring production value not possbile with lesser equipment. When I invest in new equipment, or make choices between competing gear, it is for my own benefit.

Along these lines, i often find that the most impactful pieces of additional gear for me are little accessories. It could be the perfect size/shape bag for carrying exactly what i need in an unobtrusive way while gimbaling. Or it could be a 3rd party tripod collar that lets me pull my camera on and off gimbal while leaving the whole setup balanced. Either way, they're logistical improvements that help me switch filters/lenses/mics faster and therefore actually deploy them in a time-sensitive situation. Which has a massive impact on the footage i deliver but i wouldn't have predicted before repeating these scenarios and discovering the shortcomings of my arrangement. The refinements continue...
 
Success in this business has very, very little to do with what equpment you have at your disposal. Knowing what to do with it is of far more value. The best clients don't really care what gear you have chosen to get the job done -- that they hired you to do. The more a client cares about the technnical specications of the equipment -- the less I am interested in having them for a client.

Investing in superior gear is for our own benefit as it allows us to work faster, better, more efficiently, and bring production value not possbile with lesser equipment. When I invest in new equipment, or make choices between competing gear, it is for my own benefit.

True. Although for the last couple of years I have been mostly an editor. I hire a couple guys to do the shooting for me with my gear. They bring me back the gear and the footage and I do the editing. I've only been on a couple of shoots in the last couple of years.
So part of what I have to do is make sure the kit is usable but not overcomplicated. These guys aren't necessarily as technical or experienced as I am. So I try to keep things simple so they can pay attention to the things that matter most, like sound, focus, and exposure. So long as they can nail those three I can usually make something of what they get.

For example, I've thought about adding an external monitor to the kit. Which would be a happy addition for someone like me. But that complicates the setup for others. Suddenly you have this external box you have to mount on the shoe, hook up with an HDMI cabe, carry a different set of batteries than the camera uses (typically Sony versus Canon), and futz with all the settings the display has.

Part of the reason I bought the C100s a couple years ago was because they have the build in NDs, the XLRs, and the long battery run times. I went from shooting a GH4 (in 4K) with things hanging of the camera and more external stuff to have to carry around, to having more things built right in with the C100s. Despite it being 1080p. But that even proved to be a file saving benefit. Nobody had been asking for 4K or 10bit. Not even 1080p files necessarily. Just something that would work for Youtube/Vimeo 98% of the time. The other two percent being an event or local TV.
Today it's even less demanding because it's been a lot of Zoom broadcasts. Where the quality is... well I'm not sure that's a word anyone would want to use.

One of my shooters just bought himself a BMPCC4K. But he won't be using it for my shoots, because even he knows it's overkill for what we do. And I don't want hundreds of gigabytes of ProRes files. He bought it to do more filmmaking type stuff.
 
I will chime in with some general descriptions that partially answer some of your questions, and probably partially repeat what you and others have said.

Bit depth
(8 bit, 10 bit, 12 bit, etc.)

This is the number of bits per channel. There are three channels per pixel. So multiply it by 3 to know the number of bits per pixel. For example, 8-bit would be 24 bits per pixel (or 3 bytes per pixel).

A photodiode is at first analog. It has a noise floor, which represents the blackest shade it could reproduce, and on the other end its full-well capacity, which represents the whitest shade it could reproduce. Suppose its noise floor is 5 electrons, and its full-well capacity is 40,000 electrons. That means its brightest signal is 8,000 times as much as its darkest signal. This could also be expressed as 13 f-stops, because 2^13 = 8,096.

To record a photodiode's experience over the course of time, you could translate it to similar waves on magnetic tape (analog) or number its shades from noise floor to full well (digital). Suppose you thought 100 steps between darkest and lightest were enough. So darkest is 1, and lightest is 100. To save this to disk, you would need 7 bits (because 2^7=128). A bit is a 1 or a 0. Therefore it has 2 possible values. So two bits in a row could have 4 values (00, 01, 10, 11), three bits in a row could have 8 values (000, 001, 010, 100, 011, 110, 101, 111), and so on. And now you see why the number of bits becomes the exponent. 2^x = how high you can count with that many bits.

If you say 7 bits (100 levels), that actually gives you 1 million colors, because again it is 7 bits per channel, and 100 x 100 x 100 = 1 million. Seems like enough, doesn't it? Actually it is for a lot of cases, but not us picky photographers.

If you allow yourself 8 bits, you literally double the number of shades per channel, because 2^8 = 256. But that's not all! Your total color palette is now 256 x 256 x 256, or 16.8 million colors (a number which I bet you have seen many times in advertising, especially in the 2000s).

And so, now you can see why upping the ante to 10 or 12 or 16 bits will give you a ludicrous number of colors. Let's see:
2^10 is 1,024. Then 1,024^3 = 1 billion colors.
2^12 is 4,096. Then 4,096^3 = 68 billion colors.
2^16 is 65,536. Then 65,536^3 = 281 trillion colors.

(I'm not saying you won't need something like 10 bit, especially if you plan to push and pull the image in post. I think 10-bit is enough if distributed logarithmically rather than linearly. Norman Koren does a nice explanation of why, if you can follow: http://www.normankoren.com/digital_t...l#Human_vision)


Chroma subsampling
(4:1:1, 4:2:0, 4:2:2, etc.)

The three digits actually refer to YCrCb, not RGB.
Y means luminance (I don't know why)
Cr means the luminance minus the red.
Cb means the luminance minus the blue.
Somehow engineers find this easier to work with than RGB.

(and the three digits don't map to Y, Cr, and Cb, they map to some weird formula: https://en.wikipedia.org/wiki/Chroma_subsampling)

So if you convert RGB to YCrCb, and just leave it alone after that, then it is said to be 4:4:4. All three channels are at full-resolution.
So if your resolution was HD, then the Y layer would be 1920x1080, the Cr layer would be 1920x1080, and the Cb channel 1920x1080.

But now they can exploit the human eye's partiality toward luminance and just throw away parts of Cr and Cb.
So the mildest cut would be 4:2:2.
The Y layer would still be 1920x1080, but the Cr and Cb layers would each be 960x1080 (line-doubled horizontally, to stretch it back to full width).

Then 4:2:0 would be 1920x1080 for the luminance, but only 960x540 for Cr and Cb.

4:1:1 just cuts it horizontally. Vertical is full resolution. So Y would be 1920x1080, and Cr and Cb would each be 480x1080 --- which is kind of weird. It leads to more color artifacts than 4:2:0, which is probably why we left it in the dust. (It was easier, I think, to encode to magnetic tape, because you just quarter the carrier frequency, which is why it was used in DV).

So now for some storage comparisons.

1920x1080 12-bit 4:4:4 would be:
1920 pixels per line
x 1080 lines per frame
x 12 bits per channel
x 3 channels per pixel
= 74,649,600 bits per frame
or about 9 megabytes.

1920x1080 12-bit 4:2:2 would be:
(1920 x 1080 x 12) + (960 x 1080 x 12) + (960 x 1080 x 12)
= 49,766,400 bits per frame (exactly 2/3 of the 4:4:4 version)
or about 6 megabytes per frame

1920x1080 12-bit 4:2:0 would be:
(1920 x 1080 x 12) + (960 x 540 x 12) + (960 x 540 x 12)
= 37,324,800 bits per frame (exactly 1/2 of the 4:4:4 version)
or about 4.5 megabytes per frame

1920x1080 8-bit 4:2:0 would be:
(1920 x 1080 x 8) + (960 x 540 x 8) + (960 x 540 x 8)
= 24,883,200 bits per frame
or about 3 megabytes per frame

These are then usually further compressed with something like the Discrete Cosine Transform, maybe 10:1. So then 8-bit 4:2:0 could be, say, just 300 KB per frame. (Best explanation ever of DCT: https://www.youtube.com/watch?v=Q2aEzeMDHMA)
 
Last edited:
These are then usually further compressed with something like the Discrete Cosine Transform, maybe 10:1. So then 8-bit 4:2:0 could be, say, just 300 KB per frame. (Best explanation ever of DCT: https://www.youtube.com/watch?v=Q2aEzeMDHMA)

Thanks for the run-down, Combat. Very helpful. Makes me wonder why we're not all currently shooting 4:4:4 if it would only cost us an extra 50% of storage whilst yielding such greater color fidelity... Lame.

Regarding compression, I found it interesting to breeze through the vast array of mathematical trickery and motion prediction employed by Fraunhofer to squeeze H.266/VVC down to size: https://www.hhi.fraunhofer.de/en/dep...-overview.html

The picture partitioning structure, which is further described in section 3.2, divides the input video into blocks called coding tree units (CTUs). A CTU is split using a quadtree with nested multi-type tree structure into coding units (CUs), with a leaf coding unit (CU) defining a region sharing the same prediction mode (e.g. intra or inter). In this document, the term ‘unit’ defines a region of an image covering all colour components; the term ‘block’ is used to define a region covering a particular colour component (e.g. luma), and may differ in spatial location when considering the chroma sampling format such as 4:2:0.

The other features of VVC, including intra prediction processes, inter picture prediction processes, transform and quantization processes, entropy coding processes and in-loop filter processes, are covered in sections 3.3 to 3.9. The following features have been included in the VVC test model 11 on top of the bock tree structure.

Intra prediction

67 intra mode with wide angles mode extension

Block size and mode dependent 4 tap interpolation filter

Position dependent intra prediction combination (PDPC)

Cross component linear model intra prediction

Multi-reference line intra prediction

Intra sub-partitions

Weighted intra prediction with matrix multiplication

Inter-picture prediction

Block motion copy with spatial, temporal, history-based, and pairwise average merging candidates

Page: 16 Date Saved: 2021-01-04


Affine motion inter prediction

subblock based temporal motion vector prediction

Adaptive motion vector resolution

8x8 block based motion compression for temporal motion prediction

High precision (1/16 pel) motion vector storage and motion compensation with 8-tap interpolation filter for luma component and 4-tap interpolation filter for chroma component

Geometric partitioning mode

Combined intra and inter prediction

Merge with MVD (MMVD)

Symmetrical MVD coding

Bi-directional optical flow

Decoder side motion vector refinement

Bi-prediction with CU-level weight

Transform and quantization

Multiple primary transform selection with DCT2, DST7 and DCT8

Secondary transform for low frequency zone

Subblock transform for inter predicted residual

Dependent quantization with max QP increased from 51 to 63

Entropy Coding

Arithmetic coding engine with adaptive double windows probability update

Transform coefficient coding with sign data hiding

In loop filter

In-loop reshaping

Deblocking filter with strong longer filter

Sample adaptive offset

Adaptive Loop Filter

Screen content coding:

Intra block copy with reference region restriction

Palette coding mode

Adaptive color transform

Block differential pulse coded modulation

Transform skip residual coding

360-degree video coding

Horizontal wrap-around motion compensation

Page: 17 Date Saved: 2021-01-04


High-level syntax and parallel processing

Reference picture management with direct reference picture list signaling

Subpictures, slices and tiles
 
Last edited:
Back
Top