Wednesday, March 26, 2014

Head Tracking with Kinect v2

This is yet another post in my series about the new Kinect using the November 2013 developer preview SDK. Today we’re going to have some fun by combining the color, depth and body data streams (mentioned in my last few posts, here, here and here) and some interesting math to create an image that magically tracks the user’s head.

This is an early preview of the new Kinect for Windows, so the device, software and documentation are all preliminary and subject to change.


If you recall from my last post, I used the CoordinateMapper to translate the coordinates of the user’s joint information on top of the HD color image. The magic ingredient converts the Joint’s Position to a ColorSpacePoint.

Joint headJoint = body.Joints[JointType.Head];

ColorSpacePoint colorSpacePoint = 

If we take the X & Y coordinates from this ColorSpacePoint and the wonderful extension methods of the WriteableBitmapEx project, we can quickly create a cropped image of that joint.

int x = (int)Math.Floor(colorSpacePoint.X + 0.5);
int y = (int)Math.Floor(colorSpacePoint.Y + 0.5);

int size = 200;

WriteableBitmap faceImage = _bmp.Crop(new Rect(x,y,size,size));


Wow, that was easy! Although this produces an image that accurately tracks my head, the approach is somewhat flawed as it doesn’t scale based on the user’s position from the camera: if you stand too close to the camera you’ll only see a portion of your face; Stand too far and you’ll see my face and torso. We can fix this by calculating the desired size of the image based on the depth of the joint. To do this, we’ll need to obtain a DepthSpacePoint for the Joint and a simple trigonometric formula…

The DepthSpacePoint by itself doesn’t contain the depth data. Instead, it contains the X & Y coordinates from the depth image which we can use to calculate the index in the array of depth data. I’ve outlined this in a previous post, but for convenience sake here’s that formula again:

// get the depth image coordinates for the head
DepthSpacePoint depthPoint =

// use the x & y coordinates to locate the depth data
FrameDescription depthDesc = _sensor.DepthFrameSource.FrameDescription;
int depthX = (int)Math.Floor(depthPoint.X + 0.5);
int depthY = (int)Math.Floor(depthPoint.Y + 0.5);
int depthIndex = (depthY * depthDesc.Width) + depthX;

ushort depth = _depthData[depthIndex];

To calculate the desired size of the image, we need to determine the width of the joint's pixel in millimeters. We do this using a blast from the past, our best friend from high-school trigonometry, Soh-Cah-Toa.


Given that the Kinect’s Horizontal Field of View is 70.6°, we bisect this in half to form a right-angle triangle. We then take the depth value as the length of the adjacent side in millimeters. Our goal is to calculate the opposite side in millimeters, which we can accomplish using the TOA portion of the mnemonic:

tan(0) = opposite / adjacent
opposite = tan(0) * adjacent

Once we have the length of the opposite, we divide it by the number of pixels in the frame which gives us the length in millimeters for each pixel. The algorithm for calculating pixel width is shown here:

private double CalculatePixelWidth(FrameDescription description, ushort depth)
    // measure the size of the pixel
    float hFov = description.HorizontalFieldOfView / 2;
    float numPixels = description.Width / 2;

    /* soh-cah-TOA
     * TOA = tan(0) = O / A
     *   T = tan( (horizontal FOV / 2) in radians )
     *   O = (frame width / 2) in mm
     *   A = depth in mm
     *   O = A * T
    double T = Math.Tan((Math.PI * 180) / hFov);
    double pixelWidth = T * depth;

    return pixelWidth / numPixels;

Now that we know the length of each pixel, we can adjust the size of our head-tracking image to be a consistent “length”. The dimensions of the image will change as I move but the amount of space around my head remains consistent. The following calculates a 50 cm (~19”) image around the tracked position of my head:

double imageSize = 500 / CalculatePixelWidth(depthDesc, depth);

int x = (int)(Math.Floor(colorPoint.X + 0.5) - (imageSize / 2));
int y = (int)(Math.Floor(colorPoint.Y + 0.5) - (imageSize / 2));

WriteableBitmap faceImage = _bmp.Crop(new Rect(x,y, imageSize, imageSize));

Happy Coding.

Monday, March 24, 2014

Drawing Kinect V2 Body Joints

So far the posts in my Kinect for Windows v2 series have concentrated on the Depth, Color and BodyIndex data sources. Today I want to highlight how to access the Body data stream.


This is an early preview of the new Kinect for Windows, so the device, software and documentation are all preliminary and subject to change.

The new Kinect sensor and SDK has improved skeleton tracking so significantly that the Microsoft team has changed the name of their Skeleton class to Body. In addition to being able to track six skeletons (instead of 2), the API is shaping up to include features for Activities (facial features such as left eye open), Expressions (happy or neutral), Appearance (wearing glasses), Leaning (left or right) and the ability to track if the user is actively looking at the sensor. I’ll dive into those in up coming posts, but for now I want to focus on Joint information.

Generally speaking, the body tracking capability and joint positions are improved over previous version of the SDK. Specifically, the positions for hips and shoulders are more accurate. Plus version 2 has introduced 5 new joints (Neck, Hand-Tip-Left, Thumb-Left, Hand-Tip-Right, Thumb-Right) bringing the total number of joints to 25.


Getting the Body data from the sensor is slightly different than the other data streams. First you must initialize an array of Body with a specific size and then pass it into the GetAndRefreshBodyData method on the BodyFrame. The SDK uses a memory saving optimization that updates the items in this array rather than creating a new set each time. So if you want to hang onto an instance of Body between frames, you need to copy it to another variable and replace the item in the Array with a null value.

The following shows how setup and populate an array of Body objects per frame:

private void SetupCamera()
    _sensor = Sensor.Default;

    _bodies = new Body[_sensor.BodySouceFrame.BodyCount];

    _reader = _sensor.BodySourceFrame.OpenReader();
    _reader.FrameArrived += FrameArrived;

private void FrameArrived(object sender, BodyFrameArrivedEventArgs e)
    using (BodyFrame frame = e.Frame.AcquireFrame())
        if (frame == null)


Once you have a Body to work with, the joints and other features are provided to us as dictionaries which allows us to access the joints by name. For example:

Joint head = body.Joints[JointType.Head];

Each Body and Joint is also equipped with tracking confidence. Given this information, we can loop through the joints and only display the items that we know are actively being tracked. The following example iterates over the Body and Joint collections, and uses the CoordinateMapper to translate the joint position to dimensions on our color image. For simplicity sake, I’m simply coloring the surrounding pixels to illustrate their position.

private void DrawBodies(BodyFrame bodyFrame, int colorWidth, int colorHeight)


    foreach(Body body in _bodies)
        if (!body.IsTracked)

        IReadOnlyDictionary<JointType, Joint> joints = body.Joints;

        var jointPoints = new Dictionary<JointType, Point>();

        foreach(JointType jointType in joints.Keys)
            Joint joint = joints[jointType];
            if (joint.TrackingState == TrackingState.Tracked)
                ColorSpacePoint csp = _coordinateMapper.MapCameraPointToColorSpace(joint.Position);
                jointPoints[jointType] = new Point(csp.X, csp.Y);                                                                        

        foreach(Point point in jointPoints.Values)
            DrawJoint(ref _colorData, point, colorWidth, colorHeight, 10);

private void DrawJoint(ref byte[] colorData, Point point, int colorWidth, int colorHeight, int size = 10)
    int colorX = (int)Math.Floor(point.X + 0.5);
    int colorY = (int)Math.Floor(point.Y + 0.5);

    if (!IsWithinColorFrame(colorX, colorY, colorWidth, colorHeight))

    int halfSize = size/2;

    // loop through pixels around the point and make them red
    for (int x = colorX - halfSize; x < colorX + halfSize; x++)
        for(int y = colorY - halfSize; y < colorY + halfSize; y++)
            if (IsWithinColorFrame(x,y, colorWidth, colorHeight))
                int index = ((colorWidth * y) + x) * bytesPerPixel;
                colorData[index + 0] = 0;
                colorData[index + 1] = 0;
                colorData[index + 2] = 255;

private bool IsWithinColorFrame(int x, int y, int width, int height)
    return (x >=0 && x < width && y >=0 && y < height);

Happy Coding.

Wednesday, March 19, 2014

Mapping between Kinect Color and Depth

Continuing my series of blog posts about the new Kinect v2, I’d like to build upon my last post about the HD color stream with some of the depth frame concepts that I've used in the last few posts. Unlike the previous version of the Kinect, the Kinect v2 depth stream is not the same dimensions as the new color stream. This post will illustrate how the two are related and how you can map depth data to the color stream using the CoordinateMapper.

This is an early preview of the new Kinect for Windows, so the device, software and documentation are all preliminary and subject to change.

While the Kinect’s color and depth streams are represented as arrays of information, you simply cannot compare the x & y coordinates between the sets of data equally: the bytes of the color frame represent color pixels from the color camera; the bits of the depth frame represent Cartesian distance from the depth camera. Fortunately, the Kinect SDK ships with a mapping utility that can convert data between the different “spaces”.

For this post I’m going to use the MultiFrameSourceReader to access both the depth and color frames at the same time and we’ll use the CoordinateMapper.MapDepthFrameToColorSpace method to project the depth data into an array of ColorSpacePoint. Here’s the skeleton for processing frames as they arrive:

private void FrameArrived(object sender, MultiSourceFrameArrivedEventArgs e) 
    var reference = e.FrameReference; 

    MultiSourceFrame multiSourceFrame = null;
    ColorFrame colorFrame = null; 
    DepthFrame depthFrame = null; 

        using (_frameCounter.Increment()) 
            multiSourceFrame = reference.AcquireFrame(); 
            if (multiSourceFrame == null) 

            using (multiSourceFrame) 
                colorFrame = multiSourceFrame.ColorFrameReference.AcquireFrame();
                depthFrame = multiSourceFrame.DepthFrameReference.AcquireFrame(); 

                if (colorFrame == null | depthFrame == null) 

                // initialize color frame data 
                var colorDesc = colorFrame.FrameDescription; 
                int colorWidth = colorDesc.Width; 
                int colorHeight = colorDesc.Height; 

                if (_colorFrameData == null) 
                    int size = colorDesc.Width * colorDesc.Height; 
                    _colorFrameData = new byte[size * bytesPerPixel]; 

                // initialize depth frame data 
                var depthDesc = depthFrame.FrameDescription; 

                if (_depthData == null) 
                    uint depthSize = depthDesc.LengthInPixels; 
                    _depthData = new ushort[depthSize]; 
                    _colorSpacePoints = new ColorSpacePoint[depthSize]; 

                // load color frame into byte[] 
                colorFrame.CopyConvertedFrameDataToArray(_colorFrameData, ColorImageFormat.Bgra); 

                // load depth frame into ushort[] 

                // map ushort[] to ColorSpacePoint[] 
                _sensor.CoordinateMapper.MapDepthFrameToColorSpace(_depthData, _colorSpacePoints); 

                // TODO: do something interesting with depth frame 

                // render color frame 
                    new Int32Rect(0, 0, colorDesc.Width, colorDesc.Height), 
                    colorDesc.Width * bytesPerPixel, 
    catch { } 
        if (colorFrame != null) 

        if (depthFrame != null) 


The MapDepthFrameToColorSpace method copies the depth data into an array of ColorSpacePoint where each item in the array corresponds to the items in the depth data. We can use the X & Y coordinates of the ColorSpacePoint to find the color data as demonstrated below. There’s one caveat: not all points in the depth array contain data that can be mapped to color pixels. Some points might be too close or too far, or there’s no depth data because it’s a shadow or reflective material.

The following snippet shows us how to locate the color bytes from the ColorSpacePoint:

// we need a starting point, let's pick 0 for now
int index = 0;

ushort depth = _depthData[index];
ColorSpacePoint point = _colorSpacePoints[index];

// round down to the nearest pixel
int colorX = (int)Math.Floor(point.X + 0.5);
int colorY = (int)Math.Floor(point.Y + 0.5);

// make sure the pixel is part of the image
if ((colorX >= 0 && (colorX < colorWidth) && (colorY >= 0) && (colorY < colorHeight))

    int colorImageIndex = ((colorWidth * colorY) + colorX) * bytesPerPixel;

    byte b = _colorFrameData[colorImageIndex];
    byte g = _colorFrameData[colorImageIndex + 1];
    byte r = _colorFrameData[colorImageIndex + 2];
    byte a = _colorFrameData[colorImageIndex + 3];


If we loop through the depth data and use the above technique we can draw our depth data on top of our color frame. For this image, I’m drawing the depth data using an intensity technique described in an earlier post. It looks like this:


You may notice that the pixels are far apart and don’t go to the entire edge of the image. This makes sense because our depth data is a smaller resolution (424 x 512 compared to 1080 x 1920) and tighter viewing angle (70.6° compared to 84°). The mapping also isn’t perfect in this release (remember this is a developer preview!)

We can use the same technique to draw each pixel of the depth frame using values from the color frame, like so:

// clear the pixels before we color them
Array.Clear(_pixels, 0, _pixels.Length);

for (int depthIndex = 0; depthIndex < _depthData.Length; ++depthIndex)
    ColorSpacePoint point = _colorSpacePoints[depthIndex];

    int colorX = (int)Math.Floor(point.X + 0.5);
    int colorY = (int)Math.Floor(point.Y + 0.5);
    if ((colorX >= 0) && (colorX < colorWidth) && (colorY >= 0) && (colorY < colorHeight))
        int colorImageIndex = ((colorWidth * colorY) + colorX) * bytesPerPixel;
        int depthPixel = depthIndex * bytesPerPixel;

        _pixels[depthPixel] = _colorData[colorImageIndex];
        _pixels[depthPixel + 1] = _colorData[colorImageIndex + 1];
        _pixels[depthPixel + 2] = _colorData[colorImageIndex + 2];
        _pixels[depthPixel + 3] = 255;

…which results in the following image:


So as you can see, we can easily map between the two coordinate spaces. I intended to build upon this further in upcoming posts, so if you haven’t already, add me to your favorite RSS reader.

Happy coding.

Tuesday, March 18, 2014

Accessing Kinect Color Data

One of the new features of the Kinect v2 is the upgraded color camera which now provides images with 1920 x 1080 resolution. This post will illustrate how you can get access to this data stream.

This is an early preview of the new Kinect for Windows, so the device, software and documentation are all preliminary and subject to change.

Much like the other data streams, accessing the color image is done using a reader:

private void SetupCamera();

    var sensor = KinectSensor.Default;
    ColorFrameReader reader = sensor.ColorFrameSource.OpenReader();
    reader.FrameArrived += ColorFrameArrived;

In this example, I'm simply copying the frames to a WriteableBitmap which is bound to the view.

private readonly int bytesPerPixel = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
private readonly WriteableBitmap _bmp = new WriteableBitmap(1920, 1080, 96, 96, PixelFormats.Bgra32, null);
Byte[] _frameData = null;

private void ColorFrameArrived(object sender, ColorFrameArrivedEventArgs e)
    var reference = e.FrameReference;

        var frame = reference.AcquireFrame();
        if (frame == null) return;

            FrameDescription desc = frame.FrameDescription;
            var size = desc.Width * desc.Height;
            if (_frameData == null)
                _bmp = new WriteableBitmap(desc.Width, desc.Height, 96, 96, PixelFormats.Bgr32, null);
                _frameData = new byte[size * bytesPerPixel];

            frame.CopyConvertedFrameDataToArray(_frameData, ColorImageFormat.Bgra);

                new Int32Rect(0, 0, desc.Width, desc.Height),
                desc.Width * bytesPerPixel,


As you’d expect, this updates the view every 30 frames per second with a HD image. The wide-angle lens of the camera (84°) captures a fair amount of my office.


Happy Coding.