Help me understand bit depth and color space better

I will chime in with some general descriptions that partially answer some of your questions, and probably partially repeat what you and others have said.

Bit depth
(8 bit, 10 bit, 12 bit, etc.)

This is the number of bits per channel. There are three channels per pixel. So multiply it by 3 to know the number of bits per pixel. For example, 8-bit would be 24 bits per pixel (or 3 bytes per pixel).

A photodiode is at first analog. It has a noise floor, which represents the blackest shade it could reproduce, and on the other end its full-well capacity, which represents the whitest shade it could reproduce. Suppose its noise floor is 5 electrons, and its full-well capacity is 40,000 electrons. That means its brightest signal is 8,000 times as much as its darkest signal. This could also be expressed as 13 f-stops, because 2^13 = 8,096.

To record a photodiode's experience over the course of time, you could translate it to similar waves on magnetic tape (analog) or number its shades from noise floor to full well (digital). Suppose you thought 100 steps between darkest and lightest were enough. So darkest is 1, and lightest is 100. To save this to disk, you would need 7 bits (because 2^7=128). A bit is a 1 or a 0. Therefore it has 2 possible values. So two bits in a row could have 4 values (00, 01, 10, 11), three bits in a row could have 8 values (000, 001, 010, 100, 011, 110, 101, 111), and so on. And now you see why the number of bits becomes the exponent. 2^x = how high you can count with that many bits.

If you say 7 bits (100 levels), that actually gives you 1 million colors, because again it is 7 bits per channel, and 100 x 100 x 100 = 1 million. Seems like enough, doesn't it? Actually it is for a lot of cases, but not us picky photographers.

If you allow yourself 8 bits, you literally double the number of shades per channel, because 2^8 = 256. But that's not all! Your total color palette is now 256 x 256 x 256, or 16.8 million colors (a number which I bet you have seen many times in advertising, especially in the 2000s).

And so, now you can see why upping the ante to 10 or 12 or 16 bits will give you a ludicrous number of colors. Let's see:
2^10 is 1,024. Then 1,024^3 = 1 billion colors.
2^12 is 4,096. Then 4,096^3 = 68 billion colors.
2^16 is 65,536. Then 65,536^3 = 281 trillion colors.

(I'm not saying you won't need something like 10 bit, especially if you plan to push and pull the image in post. I think 10-bit is enough if distributed logarithmically rather than linearly. Norman Koren does a nice explanation of why, if you can follow: http://www.normankoren.com/digital_t...l#Human_vision)


Chroma subsampling
(4:1:1, 4:2:0, 4:2:2, etc.)

The three digits actually refer to YCrCb, not RGB.
Y means luminance (I don't know why)
Cr means the luminance minus the red.
Cb means the luminance minus the blue.
Somehow engineers find this easier to work with than RGB.

(and the three digits don't map to Y, Cr, and Cb, they map to some weird formula: https://en.wikipedia.org/wiki/Chroma_subsampling)

So if you convert RGB to YCrCb, and just leave it alone after that, then it is said to be 4:4:4. All three channels are at full-resolution.
So if your resolution was HD, then the Y layer would be 1920x1080, the Cr layer would be 1920x1080, and the Cb channel 1920x1080.

But now they can exploit the human eye's partiality toward luminance and just throw away parts of Cr and Cb.
So the mildest cut would be 4:2:2.
The Y layer would still be 1920x1080, but the Cr and Cb layers would each be 960x1080 (line-doubled horizontally, to stretch it back to full width).

Then 4:2:0 would be 1920x1080 for the luminance, but only 960x540 for Cr and Cb.

4:1:1 just cuts it horizontally. Vertical is full resolution. So Y would be 1920x1080, and Cr and Cb would each be 480x1080 --- which is kind of weird. It leads to more color artifacts than 4:2:0, which is probably why we left it in the dust. (It was easier, I think, to encode to magnetic tape, because you just quarter the carrier frequency, which is why it was used in DV).

So now for some storage comparisons.

1920x1080 12-bit 4:4:4 would be:
1920 pixels per line
x 1080 lines per frame
x 12 bits per channel
x 3 channels per pixel
= 74,649,600 bits per frame
or about 9 megabytes.

1920x1080 12-bit 4:2:2 would be:
(1920 x 1080 x 12) + (960 x 1080 x 12) + (960 x 1080 x 12)
= 49,766,400 bits per frame (exactly 2/3 of the 4:4:4 version)
or about 6 megabytes per frame

1920x1080 12-bit 4:2:0 would be:
(1920 x 1080 x 12) + (960 x 540 x 12) + (960 x 540 x 12)
= 37,324,800 bits per frame (exactly 1/2 of the 4:4:4 version)
or about 4.5 megabytes per frame

1920x1080 8-bit 4:2:0 would be:
(1920 x 1080 x 8) + (960 x 540 x 8) + (960 x 540 x 8)
= 24,883,200 bits per frame
or about 3 megabytes per frame

These are then usually further compressed with something like the Discrete Cosine Transform, maybe 10:1. So then 8-bit 4:2:0 could be, say, just 300 KB per frame. (Best explanation ever of DCT: https://www.youtube.com/watch?v=Q2aEzeMDHMA)

Wow! Great explanation. Simple to follow through. I really appreciate it. :)

Makes more sense now.
 
Y comes from the various human vision studies of the 1930's (I assume from the desire to develop a quality color film) on an artificial reproduction of the natural RGB colors where Y was randomly chosen as "luminance" in the mathematical (XYZ) replacement of the RGB. Because RGB was already taken. By G-d.
 
There's also 444 XQ (ProRes). Some of the only cameras in the world that have that come from Blackmagic.

This is a cool video showing the difference between the 12-bit RAW in the C200 and 8-bit and 10-bit.

If you'd like to skip, go to around 1:30 where it shows the difference between CRL and 8-bit side-by-side.

 
So based on this above math, 10bit is not coming out to four times the file size of 8bit as previously mentioned it would. It's really only a 1.25x increase. This also checks out with the calculator link I posted earlier. Not sure how that is, since logically it would seem like 4x going from 256 to 1024 per channel, but...
 
Last edited:
It's true that with 10 bits, you can count all the way up to 1,024. Meanwhile, with 8 bits, you can count only to 256. So 10 bits lets you count 4 times as high, and have 4 times as many levels of gradation.

But each step doesn't take up 1 bit.

The confusion may be because of the strangeness of a base 2 number systems. We're used to a base-10 number system. It is called base 10 because there are 10 characters in our repertoire: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9. Meanwhile, computers use a base-2 number system, which means they have only 2 characters, 0 and 1.

If you think about how our counting system works, you just start with the first character (0) and use them until you run out of numbers. This normally would mean we cannot count past 9. But we do a trick where we start over, and set a new column beside the first. The number after "9" is "10". The number after "99" is "100".

Suppose we had 12 fingers, or for whatever reason we decided a long time ago to use a base-12 number system. We would have to make up a couple of new characters. Since I'm stuck on a keyboard, I will use letters. So our available digits would be: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b. So we would write "eleven" as "a". And "twelve" would be "b". And then finally it's time to wrap around, so "thirteen" is "10". Weird, I know!

It's possible to have a number system of any base. If your numbering system was base 7, then you would have never heard of these characters: 7, 8, 9. You would only have 0, 1, 2, 3, 4, 5, 6. So "seven" would be written as "10".

This is the case of the extremely limited computer counting system, which is only base 2. So "one" is "1", "two" is "10", and "three" is "11". And so on, so that a seemingly small number like "99" in base 10 has to be written with 7 characters: "1100011".

Now to answer your question. It is the number of columns, not the number that the characters represent, that takes up space. You can see this in our own numbering system: "99" takes up only 2 spaces. It is 3 times as big as "33", but both values are stored with just 2 spaces. It is the number of "spaces" that take up space. This is true in base 2 as well as in base 10, or in any numbering system.

Computers chose the wildly inefficient base-2 numbering system (I mean, come on, 7 spaces to represent 99?) because they had other concerns. For one thing, it's very hard to corrupt data. If a 1 is written to disc, and it gets smudged, it's still something, so it still counts as 1. It would have to be completely erased (or erased below a certain threshold) to become a 0. Contrast that with if computers used a base-10 numbering system. Then on a magnetic disk, let's say, 0 would be no magnetism, 1 would be a little bit of magnetism, 2 would be a little bit more magnetism, etc. And so if it gets smudged or squelched or muted as it travels across the wire, a 2 might turn into a 1, or a 5 might get spiked up to a 7, fairly easily. It would be like analog, only with more serious repercussions. This is why you don't get generation loss copying a digital file. On the downside, if there is a major error, then you don't get the smooth degradatoin of analog, you get like totally missing frames, or big blocks of alienspeak.

BONUS QUESTION: How many unique license plates can a state have, if each one is 6 spaces long, with a repertoire of all numbers and uppercase letters?
 
Last edited:
aha so 12 bit would take up 1.5 times as much space as 8-bit because that is the ratio of 12/8

personally I feel like I would prefer to have the jiggery-pokery magic compression with a 444 12 bit file. Why do they need to give you a compressed file with also 8-bit 420. Which I guess is the approach of black magic and red to give you a high degree of compression but high bit depth and full color sampling
 
Suppose we had 12 fingers, or for whatever reason we decided a long time ago to use a base-12 number system. We would have to make up a couple of new characters. Since I'm stuck on a keyboard, I will use letters. So our available digits would be: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b. So we would write "eleven" as "a". And "twelve" would be "b". And then finally it's time to wrap around, so "thirteen" is "10". Weird, I know!

As a small, random aside in this very fascinating and informative thread: the ancient Babylonians and Sumerians used a base 60 number system (sexagesimal). I've also seen it suggested that prior to this there would have been civilizations with base 12 numbering systems, in which, e.g., the joints of the four fingers were used in counting. I don't know how likely the latter is, but plenty of base 12 elements persist today in measurements of time and length, e.g.
 
Back
Top