In June 2015 I worked on an Android project that involved processing images decoded by the MediaCodec video decoder. The hardest part was finding information about the video formats produced by the codecs on different phones. I’ve collected information on the most common formats from a sample of about a hundred thousand phones and present it here. If you find errors or want to contribute to this page send me email at the link at the bottom of this page..

I will assume you are using the Android MediaCodec and MediaExtractor classes to read frames from an h.264 encoded video. You can read about these classes
here, here and here. The following will explain how to decode the contents of a frame to get RGB values in an OpenGL-ES shader. Assume the variables decoderStatus, outputBuffer and format are defined as in the MediaCodec example. Be sure to copy outputBuffer as follows (this should be in the example):



MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
int decoderStatus = codec.dequeueOutputBuffer(info, timeoutUs);
if(decoderStatus >= 0){
...
ByteBuffer outputBuffer = decoderOutputBuffers[decoderStatus];
outputBuffer.position(info.offset);
outputBuffer.limit(info.offset + info.size);
ByteBuffer imageBuffer = new ByteBuffer.allocate(info.size);
imageBuffer.put(outputBuffer);
codec.releaseOutputBuffer(decoderStatus, true);

It is important to copy the contents of outputBuffer before releasing it back to the codec. Any attempt to access outputBuffer after release could lead to a crash due to segment violation.

The video format information is contained in the format variable which has type
MediaFormat. To get the integer value of the color-format field and image dimensions do this:

int colorFormat = format.getInteger(MediaFormat.KEY_COLOR_FORMAT);
int width = format.getInteger(MediaFormat.KEY_WIDTH);
int height = format.getInteger(MediaFormat.KEY_HEIGHT);

The following chart shows the distribution of Android color formats among a worldwide sample of roughly 100,000 phones as of June 2015. Four formats account for 95% of the phones. These are the generic YUV420 formats (hex 13 and 15) and the two Qualcomm formats. The Qualcomm formats correspond to the Snapdragon 400 and 800 chipsets. These two chipsets account for over 85% of the phones in the sample. Note that this sample was taken in mid-2015 and the picture could be different in a couple of years.

percent
color-format-hex
color-format-decimal
name

0.00%

3
3

COLOR_Format12bitRGB444

0.79%

13
19

COLOR_FormatYUV420Planar

7.42%

15
21

COLOR_FormatYUV420SemiPlanar

0.67%

19
25

COLOR_FormatYCbYCr

0.00%

21
33

COLOR_FormatL2

0.01%

22
34

COLOR_FormatL4

0.33%

100
256


0.02%

102
258


0.77%

105
261


0.03%

106
262


2.53%

107
263


0.00%

108
264


0.03%

7F000001

2130706433


0.01%

7FA00000

2141192192


0.19%

7FA00E00

2141195776


0.06%

7FA00F00

2141196032


24.16%

7FA30C03

2141391875

OMX_QCOM_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka

61.79%

7FA30C04

2141391876

COLOR_QCOM_FormatYUV420SemiPlanar32m

1.17%

7FC00002

2143289346


0.00%

7FF00001

2146435073




Colorspace


A color format is a combination of a colorspace (YUV, YCrCb or RGB) and a layout (planar, semi planar or tiled). A colorspace is simply a system for parameterizing colors. An example is RGB where a color is described by three parameters (R,G,B) that correspond to the natural chromaticities of the human retina. Another example is YUV in which Y corresponds roughly to luminance and UV parameterize chrominance. A third colorspace YCrCb is very similar to YUV with parameters Cr, Cb based on the original chromaticities of color television cameras.

These three colorspaces are all related by linear transforms. Because of this it is possible to convert among them with a simple matrix-vector multiply. Given vec3 vectors
yuv_, rgb_, ycrcb_ there exist real 3x3 matrices M1 and M2 such that M1 yuv_T = rgb_ and M2 rgb_T = ycrcb_. Definitions of M1 and M2-1, the transforms to convert YUV and YCrCb to RGB, are given below after the descriptions of color formats. In general if c1, c2 and c3 are vectors in three-dimensional colorspaces S1, S2 and S3 and if M c1T = c2 then M-1 c2T = c1. In addition if M1 c1T = c2 and M2 c2T = c3 then M12 c1T = c3 where M12 is the matrix product of M1 and M2. Use these relationships to derive new matrices M’ for missing conversions.

There are
many colorspaces beyond these three. One frequently useful example is Hue Saturation Value (HSV). Not all colorspaces can be converted using a linear transform. An example is CIELAB that uses exponential and Dirac delta functions. The purpose of CIELAB is to provide a colorspace that is perceptually linear, in which a doubling of value is perceived as a doubling of intensity. Since the retinal response function is nonlinear it follows that a nonlinear colorspace like CIELAB is required to provide the illusion of perceptual linearity.

Layout

We will consider only YUV420 formats since that is the overwhelming majority of phones. In YUV420 there is one chroma pixel UV for every block of four luminance pixels Y. The Y plane is four times as large as the U or V planes. Different versions of YUV420 layout the UV planes differently or have special alignment requirements.

In YUV420Planar the three planes are packed contiguously with no intervening padding. The total amount of space in memory is 1.5 bytes for every pixel.

In YUV420SemiPlanar the U and V planes are interleaved to make a single UV plane that is half the size of the Y plane. The total space is 1.5 bytes per pixel.

In
OMX_QCOM_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka the data is arranged in tiles of size 64x32 pixels. The UV plane is interleaved. The tiles are distributed according to a zigzag pattern that is described below. Data is aligned to 64 pixels horizontally and 32 pixels vertically. In addition each plane is aligned on an 8 KB boundary.

In
COLOR_QCOM_FormatYUV420SemiPlanar32m the data is aligned to 64 pixels horizontally and 32 pixels vertically.

The Qualcomm formats introduce padding and as a result usually require more than 1.5 bytes per pixel.

All formats that are decoded from h.264 are aligned to at least 16 pixels horizontally and vertically. This is the block size for baseline h.264 on Android..


Chipsets


Most phones are based on a small number of chipsets or chipset families. Each chipset supports only a few color formats, usually two.
To find the chipset of your phone execute the following command in a shell: ‘adb shell “cat /proc/cpuinfo” | grep Hardware ’. For example on a Sony Experia Z3:

Alans-MacBook-Pro $ adb shell "cat /proc/cpuinfo"| grep Hardware
Hardware : Qualcomm MSM8974PRO-AC

This shows that this phone uses a Qualcomm chipset. A Google search
reveals this is a variant of the Snapdragon 800, one of the most popular chipsets for current generation phones. The vast majority of phones in our sample were based on either Qualcomm or Samsung chipsets. Other vendors include MediaTek, Kirin, Universal, Sony.

OpenGL-ES shader code.


The following examples are for OpenGL-ES 2.0. In order to render a YUV image as RGBA in a visible GLSurfaceView requires these steps:
  1. on the host: load the Y and UV planes as textures
  2. in the shader: sample Y,U,V for a pixel
  3. in the shader: convert Y,U,V to RGB

Load Y and UV planes


Before you can load the Y and UV data into the GPU you have to calculate some sizes and taken alignment into account. Here is some Java code to do those calculations:

private int roundUp(int x, int factor) {
return (((x - 1) / factor) + 1) * factor;
}

private void semiPlanarTiled(YUVUploadDescriptor in, int horizontalAlignment, int verticalAlignment) {
int blockSize = horizontalAlignment * verticalAlignment;
int blockGroupSize = blockSize * 4;

in.yBufferOffset =
0;

int absoluteTilesX = (in.width - 1) / horizontalAlignment + 1;
in.numTilesX = roundUp(absoluteTilesX,
2);
in.numTilesY_Y = (in.height -
1) / verticalAlignment + 1;
in.numTilesY_UV = (in.height /
2 - 1) / verticalAlignment + 1;

in.yBufferSize = numTilesY_Y * numTilesX * blockSize;
if(in.yBufferSize % blockGroupSize != 0){
in.yBufferSize = ((in.yBufferSize -
1) / blockGroupSize + 1) * blockGroupSize;
}
in.yHorizontalStride = numTilesX * horizontalAlignment;
in.yVerticalSpan = in.yBufferSize / in.yHorizontalStride;
in.uvVerticalSpan = roundUp(in.height /
2, verticalAlignment);

in.uvBufferSize = numTilesX * numTilesY_UV * blockSize;
in.uvBufferOffset = in.yBufferSize;

in.expectedFileLength = in.yBufferSize + in.uvBufferSize;
in.orderVU =
0;
in.layout = layoutSemiplanarTiled;
in.colorspace = colorspaceYUV;
}

private void semiPlanarAligned(YUVUploadDescriptor in, int horizontalAlignment, int verticalAlignment) {
in.yHorizontalStride = roundUp(in.width, horizontalAlignment);
in.yVerticalSpan = roundUp(in.height, verticalAlignment);
in.uvHorizontalStride = in.yHorizontalStride;
in.uvVerticalSpan = in.yVerticalSpan /
2;
in.yBufferSize = in.yHorizontalStride * in.yVerticalSpan;
in.uvBufferSize = in.uvHorizontalStride * in.uvVerticalSpan;
in.yBufferOffset =
0;
in.uvBufferOffset = in.yBufferSize;
in.expectedFileLength = in.yBufferSize + in.uvBufferSize;
in.orderVU =
0;
in.layout = layoutSemiplanar;
in.colorspace = colorspaceYUV;
}

private void planar(YUVUploadDescriptor in){
in.yHorizontalStride = in.width;
in.
yVerticalSpan = in.height;
in.
yBufferSize = yHorizontalStride * yVerticalSpan;
in.
yVerticalSpan = in.height;
in.
uvHorizontalStride = in.width / 2;
in.
uvVerticalSpan = in.height;
in.
yBufferOffset = 0;
in.
uvBufferSize = in.uvHorizontalStride * in.uvVerticalSpan;
in.
uvBufferOffset = yBufferSize;
in.
expectedFileLength = yBufferSize + uvBufferSize;
in.
orderVU = 0;
in.
layout = layoutPlanar;
in.
colorspace = colorspaceYUV;
}

YUVUploadDescriptor(MediaFormat mediaFormat) {

int width = mediaFormat.getInteger(MediaFormat.KEY_WIDTH);
int height = mediaFormat.getInteger(MediaFormat.KEY_HEIGHT);
HardwareAbstractionLayer.ColorFormatEnum format = HardwareAbstractionLayer.colorFormat(mediaFormat);

colorFormat =
-1;

if (format != null) {
colorFormat = format.getValue();

switch (format) {

case COLOR_FormatYUV420Planar:
planar(
this);
break;

case COLOR_FormatYUV420SemiPlanar:
semiPlanarAligned(
this, 1, 1);
break;

case COLOR_FormatYUV420PlanarVU:
planar(
this);
orderVU =
0;
break;

case OMX_QCOM_COLOR_FormatYUV420PackedSemiPlanar64x32Tile2m8ka:
semiPlanarTiled(this, 64, 32);
break;

case OMX_QCOM_COLOR_FormatYUV420PackedSemiPlanar32m:
semiPlanarAligned(
this, 64, 32);
break;

default:
planar(
this);
break;
}
}
else {
planar(
this);
}
}


With these calculations in hand it is easy to upload the textures. Note that planar and semi-planar UV textures are uploaded differently.

GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, plane0_texture);
image.getBytes().rewind();

GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D,
0, GLES20.GL_LUMINANCE, uploadDescriptor.yHorizontalStride, uploadDescriptor.yVerticalSpan, 0,
GLES20.GL_LUMINANCE, GL_UNSIGNED_BYTE, image.getBytes());

GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, plane1_texture);

image.getBytes().position(uploadDescriptor.uvBufferOffset);

if (uploadDescriptor.layout == YUVUploadDescriptor.layoutSemiplanar || uploadDescriptor.layout == YUVUploadDescriptor.layoutSemiplanarTiled) {
GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D,
0, GLES20.GL_LUMINANCE_ALPHA, uploadDescriptor.uvHorizontalStride / 2, uploadDescriptor.uvVerticalSpan, 0,
GLES20.GL_LUMINANCE_ALPHA, GLES20.GL_UNSIGNED_BYTE, image.getBytes());
}
else {//planar
GLES20.glTexImage2D(GLES20.GL_TEXTURE_2D, 0, GLES20.GL_LUMINANCE, uploadDescriptor.uvHorizontalStride, uploadDescriptor.uvVerticalSpan, 0, GLES20.GL_LUMINANCE, GLES20.GL_UNSIGNED_BYTE, uvBuffer);
}

After the image data are uploaded to the GPU render an image to the unit square. You can use any vertex shader you like, including a pass-through. The following code occurs in the fragment shader. The uniforms are loaded from values in the uploadDescriptor.

Sample Y,U,V for each pixel

"#version 100\n" +
"uniform int orderVU;" + // 0==UV, 1==VU
"uniform int layout_;" + // 0==planar, 1=semiplanar, 2=tiledSemiplanar
"uniform int colorspace;" + // 0==rgba, 1==yuv, 2==yCrCb
"uniform int numTilesX;" +
"uniform int numTilesY_Y;" +
"uniform int numTilesY_UV;" +
"uniform int frameWidth;" +
"uniform int frameHeight;" +

"vec4 YUV(sampler2D plane0, sampler2D plane1, vec2 coordinate){" +
" vec4 yuv;" +
" if(layout_ == 0) yuv = planarYUV(plane0, plane1, coordinate);" +
" if(layout_ == 1) yuv = semiplanarYUV(plane0, plane1, coordinate);" +
" if(layout_ == 2) yuv = tiledSemiplanarYUV(plane0, plane1, coordinate);" +
" if(orderVU == 0) return yuv;" +
" return vec4(yuv.x, yuv.z, yuv.y, yuv.w);" +
"}" +

"vec4 planarYUV(sampler2D plane0, sampler2D plane1, vec2 coordinate){" +
" float Y = texture2D(plane0, coordinate).r;" +
" float U = texture2D(plane1, coordinate * vec2(1.0, 0.5)).r;" +
" float V = texture2D(plane1, coordinate * vec2(1.0, 0.5) + vec2(0.0, 0.5)).r;" +
" return vec4(Y, U, V, 1.0);" +
"}" +

"vec4 semiplanarYUV(sampler2D plane0, sampler2D plane1, highp vec2 coordinate){" +
" float Y = texture2D(plane0, coordinate).r;" +
" vec4 UV = texture2D(plane1, coordinate);" +
" return vec4(Y, UV.r, UV.a, 1.0);" +
"}" +

The tiled semi planar format is one of the most difficult shaders I have written. This format is used in Qualcomm 400 and 600 series chipsets. The shader starts with a sample point in the target image, looks up a tile index based on that sample point, locates the base of that tile in the source image, and finally indexes into the tile to find the source sample point. The following algorithm works correctly running on a laptop with 32 bit floating point and 16 bit (int16_t) integers. It fails on the Qualcomm chipsets I have tried it with due to the limited 24-bit floating point precision. More specifically when I read back the value of target in tiledSemiplanarCoordinate there is a discrepancy between the Qualcomm and Intel (laptop) implementations, apparently due to limited precision in the Qualcomm vertex shader. The result of this is that some tiles on the screen are indexed incorrectly producing horrible artifacts. If you are able to make this shader work on the phone please let me know about it.

// tiled semiplanar YUV420 format
// based on /media/libstagefright/colorconversion/ColorConverter.cpp

// Due to the zigzag pattern we have that blocks are numbered like:
//
// | Column (by)
// | 0 1 2 3 4 5 6 7
// -------|---------------------------------------
// 0 | 0 1 6 7 8 9 14 15
// R 1 | 2 3 4 5 10 11 12 13
// o 2 | 16 17 22 23 24 25 30 31
// w 3 | 18 19 20 21 26 27 28 29
// 4 | 32 33 38 39 40 41 46 47
// (bx) 5 | 34 35 36 37 42 43 44 45
// 6 | 48 49 50 51 52 53 54 55

// From this we can see that:

// For even rows:
// - The first block in a row is always mapped to memory block by*nbx.
// - For all even rows, except for the last one when nby is odd, from the first
// block number an offset is then added to obtain the block number for
// the other blocks in the row. The offset is bx plus the corresponding
// number in the series [0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, ...], which can be
// expressed as ((bx+2) & ~(3)).
// - For the last row when nby is odd the offset is simply bx.
//
// For odd rows:
// - The first block in the row is always mapped to memory block
// (by & (~1))*nbx + 2.
// - From the first block number an offset is then added to obtain the block
// number for the other blocks in the row. The offset is bx plus the
// corresponding number in the series [0, 0, 0, 0, 4, 4, 4, 4, 8, 8, 8, 8, 12, ...],
// which can be expressed as (bx & ~(3)).

// The above describes the layout of the decoded h.264 frames for a fyuse which are
// in landscape orientation. The app always renders images in portrait orientation.
// This means the x coordinate in rendering is mapped to the y coordinate in the frame
// and vice versa.

"precision highp int;" +
"precision highp float;" +
"precision highp sampler2D;" +

"int truncate(int x, int powerOfTwo){ return ((x / powerOfTwo) * powerOfTwo); }\n" +

"bool isEven(int x){ return x == truncate(x, 2); }\n" +

"int getTiledMemoryBlockIndex(int tileX, int tileY, int numTilesX, int numTilesY){\n" +
" int base;\n" +
" int offset;\n" +
" if(isEven(tileY)) {\n" +
" base = tileY * numTilesX;\n" +
" if(!isEven(numTilesY) && (tileY == (numTilesY - 1))) {\n" +
" offset = tileX;\n" +
" } else {\n" +
" offset = tileX + truncate(tileX + 2, 4);\n" +
" }\n" +
" } else {\n" +
" base = truncate(tileY, 2) * numTilesX + 2;\n" +
" offset = tileX + truncate(tileX, 4);\n" +
" }\n" +
" return base + offset;\n" +
"}\n" +


"vec2 tiledSemiplanarCoordinate(vec2 coordinate, int numTilesX, int numTilesY){\n" +

// note this shader is written for a portrait orientation render.

" // target is the pixel location in the rendered image\n" +
" vec2 target = floor(coordinate * vec2(float(frameWidth), float(frameHeight)));\n" +

" // index of target tile in the rendered image\n" +
" vec2 tileId = floor(target / vec2(64.0, 32.0));\n" +

" // index of target tile in the source image\n" +
" int sourceTileIndex = getTiledMemoryBlockIndex(int(tileId.x), int(tileId.y), numTilesX, numTilesY);\n" +

" // row of tile base in source image\n" +
" // (don't rearrange operations, floating point errors would occur)\n" +
" float rowBase = (float(sourceTileIndex) / float(frameWidth)) * 2048.0;\n" +
" \n" +
" // base of target tile in the rendered image\n" +
" vec2 tileBase = tileId * vec2(64.0, 32.0);\n" +
" \n" +
" // additional rows due to offset from tile base\n" +
" vec2 offset = target - tileBase;\n" +
" float totalOffset = floor(offset.y * 64.0 + offset.x);\n" +
" float rowOffset = totalOffset / float(frameWidth);\n" +
" \n" +
" // location of source pixel in source image\n" +
" vec2 source = vec2(floor(fract(rowBase + rowOffset) * float(frameWidth)) / float(frameWidth)," +
" floor(rowBase + rowOffset) / float(frameHeight));\n" +
" return source;\n" +

"}\n" +

"vec4 tiledSemiplanarYUV(sampler2D plane0, sampler2D plane1, vec2 coordinate){\n" +

" float Y = texture2D(plane0, tiledSemiplanarCoordinate(coordinate, numTilesX, numTilesY_Y)).x;\n"
" vec4 UV = texture2D(plane0, tiledSemiplanarCoordinate(coordinate, numTilesX, numTilesY_UV));\n" +
" return vec4(Y, UV.r, UV.a, 1.0);" +
"}" +

Convert Y,U,V to R,G,B

Now the fragment shader returns the color as an RGBA vector. This example shows how to convert both from YUV and from YCrCb.

"vec4 RGBA(sampler2D plane0, sampler2D plane1, vec2 coordinate){" +
" if(colorspace == 0) return vec4(texture2D(plane0, coordinate).rgb, 1.0);" + // RGB colorspace
" vec4 yuv = YUV(plane0, plane1, coordinate);" +
" if(colorspace == 1){" + // YUV
" float R = (1.1643835616 * (yuv.x - 0.0625) + 1.5958 * (yuv.z - 0.5));" +
" float G = (1.1643835616 * (yuv.x - 0.0625) - 0.8129 * (yuv.z - 0.5) - 0.39173 * (yuv.y - 0.5));" +
" float B = (1.1643835616 * (yuv.x - 0.0625) + 2.017 * (yuv.y - 0.5));" +
" return vec4(R, G, B, 1.0);" +
" }" +
" if(colorspace == 2){" + // YCrCb
" float R = (1.1643835616 * (yuv.x - 0.0625) + 2.2767857143 * 0.701 * (yuv.y - 0.5));" +
" float G = (1.1643835616 * (yuv.x - 0.0625) - 2.2767857143 * 0.886 * (0.114 / 0.587) * " +
" (yuv.z - 0.5) - 2.2767857143 * 0.701 * (0.299 / 0.587) * (yuv.y - 0.5));" +
" float B = (1.1643835616 * (yuv.x - 0.0625) + 2.2767857143 * 0.886 * (yuv.z - 0.5));" +
" return vec4(R, G, B, 1.0);" +
" }" +
" return vec4(0.333, 0.333, 0.333, 1.0);" + // default
"}" +

Finally the main.

"uniform sampler2D image_plane0_texture;" +
"uniform sampler2D image_plane1_texture;" +

"varying vec2 texture_coordinate;" +

"void main ()" +
"{" +
" gl_FragColor = RGBA(image_plane0_texture, image_plane1_texture, texture_coordinate);" +
"}" +

I hope this helps.