Wednesday, January 08, 2014

Accessing Depth data with Kinect v2

This is an early preview of the new Kinect for Windows, so the device, software and documentation are all preliminary and subject to change.

Accessing Depth Data for the new Kinect for Windows API v2 is easy and very similar to the previous version of the API. We’ll do some more with this in a bit, but here’s a code snippet that’s been paraphrased from the v2 samples:

// Get the Sensor
this.sensor = KinectSensor.Default;

if (this.sensor! = null)
    // If we have a sensor, open it

    // this example will use a BGR format image...    
    this.bitsPerPixel = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;

    // initialize some arrays based on the depth frame size
    FrameDescription desc = this.sensor.DepthFrameSource.FrameDescription;
    int size = desc.Width * desc.Height;
    this.frameData = new ushort[size];
    this.pixels = new byte[size * bitsPerPixel];

    this.bitmap = new WriteableBitmap(desc.Width, desc.Height, 96.0, 96.0, PixelFormats.Bgra32, null);
    // obtain a reader and subscribe to events
    this.reader = this.sensor.DepthFrameSource.OpenReader();
    this.reader.FrameArrived += FrameArrived;

void FrameArrived(object sender, DepthFrameArrivedEventArgs e)
    DepthFrameReference frameReference = e.FrameReference;

        DepthFrame frame = frameReference.AcquireFrame();

        if (frame != null)
            // frame is disposable

                var desc = frame.FrameDescription;


                int colorPixelIndex = 0;

                // loop through the frame data
                for(int i = 0; this.frameData.Length; ++i)
                    ushort depth = this.frameData[i];

                    // do something interesting with depth data here
                    // see examples below...
                    // Color color = this.colorMap.GetColorForDepth(depth);
                    Color color = new Color();

                    // assign values to individual pixels
                    // BGRA format
                    this.pixels[colorPixelIndex++] = color.B; 
                    this.pixels[colorPixelIndex++] = color.G;
                    this.pixels[colorPixelIndex++] = color.R;
                    ++colorPixelIndex; // skip Alpha color

                    new Int32Rect(0, 0, desc.Width, desc.Height),
                    desc.Width * this.bitsPerPixel,

        // frame might no longer be available

Fairly straight-forward, right? The biggest difference from v1 is that we must access the data from an event on the reader rather than an event from the sensor.


The example in the SDK truncates the depth data down to the last byte, which produces a grayscale effect that “wraps” every 255 millimeters. It doesn’t look very nice, but the key thing to observe is just how detailed the depth data is. If you look closely, you can clearly make out facial features and other fine details such as the lines in my hands.


// paraphrased from the SDK examples..
// truncates the depth data down to the lowest byte, which
// produces an "intensity" value between 0-255

ushort minDepth = desc.DepthMinReliableDistance;
ushort maxDepth = desc.DepthMaxReliableDistance;

// To convert to a byte, we're discarding the most-significant
// rather than the least-significant bits.
// We're preserving detail, although the intensity will "wrap."
// Values outside the reliable depth range are mapped to 0 (black).
byte intensity = (byte)(depth >= minDepth && depth <= maxDepth) ? depth : 0;

// convert intensity to a color (to fit into our example above)
var color = new Color()
            R = intensity,
            G = intensity,
            B = intensity

Prettier examples

Here’s a few examples I’ve cooked up. From past experience with the Kinect v1 we observed that you can improve performance by reducing the amount of calculations per frame. One approach I’ve used it to pre-calculate the results into a hash table between 500 and 4500 milimeters. There are likely other optimizations that can be made, but this is easy.

For fun, I’ve created an abstraction called a DepthMap:

abstract class DepthMap
    protected ushort minDepth;
    protected ushort maxDepth;

    protected DepthMap(ushort minDepth, ushort maxDepth)
        this.minDepth = minDepth;
        this.maxDepth = maxDepth;

    public abstract Color GetColorForDepth(ushort depth);

HSV Color Wheel

Rather than a grayscale effect, I found a few great online samples on how to implement a HSV color wheel which produces this fun rainbow effect. I’m gradually transitioning between colors by looking at the most significant bits.


double h, s, v;
int r, g, b;

h = ((depth - minDepth) / (maxDepth - minDepth)) * 360;
v = 1;
s = 1;

Hsv.HsvToRg(h, v, s, out r, out g, out b);

return new Color {
        R = (byte)r,
        G = (byte)g,
        B = (byte)b

Color Gradients

The formula to produce a gradient is rather simple, you take a percentage of one color and a remaining percentage of another color. This color map allows me to pick the colors I want to use, which produces some interesting effects. Using an array of colors (Red, Yellow, Green, Yellow, Red) I can color code depth to reflect an ideal position from the camera.


var colorMap = new GradientColorMap(
    new[] { 

I can also create a boring grayscale effect by just using Black and White.

var colorMap = new GradientColorMap(
    new[] {

You can see some of them in action here:

Happy coding!


Bernard Harris said...

Thanks for the interesting examples. We currently use the V1 sensor from primesense, and were interested in the minimum usable distance on the V2. I saw that you mapped distances from 500mm - is it possible to get a decent image at less than 500mm? Also, what is the depth resolution at minimum depth?


bryan said...


There are really two different flavours of depth camera: short range and long range. The Kinect, both v1 and v2, are long range cameras suited to 0.5m to 4m ranges, though the upper range isn't capped and can be higher given the right environmental conditions. Though the technology between v1 and v2 has changed (the concept is similar: bounce IR off the subject and measure) it simply cannot see or triangulate distances that are too close to the sensor. In the API, objects that are less than 500mm from the camera are given a depth value of zero.

Other cameras, like the LeapMotion and SoftKinetic DepthSense are near range cameras that are better suited up to distances of 3 feet. Those API are obviously targeting hands or face whereas the Kinect is more body and environment.