Friday, January 10, 2014

Finding Pixels for Bodies in Kinect for Windows v2

In this post, I’m using Depth and BodyIndex data to create a depth visualization that highlights the active users.

This is an early preview of the new Kinect for Windows, so the device, software and documentation are all preliminary and subject to change.

KinectScreenshot-ColorMap-09-29-48

This post illustrates a few breaking changes from v1 of the Kinect for Windows API.

First off, the event signature to receive all data from the sensor has changed. Rather than getting data from the AllFramesReady event of the sensor, version 2 now uses a MultiSourceFrameReader that has a MultiSourceFrameArrived event. The MultiSourceFrameReader allows you to specify which data streams you wish to receive. This is a relatively minor change.

The other minor change this post illustrates is that the depth data no longer contains a byte to indicate the player index. Rather than applying a bitmask to the depth data, it is now exposed as its own data stream.

this.sensor = KinectSensor.Default;
this.sensor.Open();
this.reader = sensor.OpenMultiFrameSourceReader(
                        FrameSourceTypes.BodyIndex |
                        FrameSourceTypes.Depth
                        );
this.reader.MultiSourceFrameArrived += MultiSourceFrameArrived;

DepthFrameDescription desc = sensor.DepthFrameSource.FrameDescription;

this.bytesPerPixel = (PixelFormats.Bgr32.BitsPerPixel + 7) / 8;
this.depthFrameData = new ushort[desc.Width * desc.Height];
this.bodyIndexData = new byte[desc.Width * desc.Height];

this.pixels = new byte[desc.Width * desc.Height * this.bytesPerPixel];
this.bitmap = new WriteableBitMap(desc.Width, desc.Height, 96, 96, PixelFormat.Bgr32, null);

When the frame arrives, I fill both the depth and body index arrays with the available data. All that's left is to loop through the depth data and color the pixels. In this case, I'm using the GradientColorMap from my last post to decorate color pixels differently than background pixels.

void MultiSourceFrameArrived(object sender, MultiSourceFrameArrivedEventArgs e)
{
    MultiSourceFrameReference frameReference = e.FrameReference;

    MutliSourceFrame multiSourceFrame = null;
    DepthFrame depthFrame = null;
    BodyIndexFrame bodyIndexFrame = null;

    try
    {
        multiSourceFrame = frameReference.AcquireFrame();
        if (multiSourceFrame != null)
        {
            var depthFrameReference = multiSourceFrame.DepthFrameReference;
            var bodyIndexFrameReference = multiSourceFrame.BodyIndexFrameReference;

            depthFrame = depthFrameReference.AcquireFrame();
            bodyIndexFrame = bodyIndexFrameReference.AcquireFrame();

            if ((depthFrame != null) && (bodyIndexFrame != null))
            {
                depthFrame.CopyFrameDataToArray(this.depthFrameData);
                bodyIndexFrame.CopyFrameDataToArray(this.bodyIndexFrameData);

                int pixelIndex = 0;
                for (int i = 0; i < depthFrameData.Length; ++i)
                {
                    ushort depth = depthFrameData[i];
                    byte player = this.bodyIndexFrameData[i];
                    Color pixelColor;

                    if (player == 0xff) // nobody here
                    {
                        pixelColor = this.grayscaleMap.GetColorForDepth(depth);
                    }
                    else
                    {
                        pixelcolor = this.playerMap.GetColorForDepth(depth);
                    }
                }
                
                pixels[pixelIndex++] = pixelColor.B;
                pixels[pixelIndex++] = pixelColor.G;
                pixels[pixelIndex++] = pixelColor.R;
                pixelIndex++;
             }

             this.bitmap.WritePixels(
                 new Int32Rect(0, 0, depthWidth, depthHeight),
                 this.pixels,
                 depthWidth * this.cbytesPerPixel,
                 0);
            }

        }

    }
    catch
    {
    }
    finally
    {
        // MultiSourceFrame, DepthFrame, ColorFrame, BodyIndexFrame are IDispoable
        if (depthFrame != null)
        {
           depthFrame.Dispose();
           depthFrame = null;
        }

        if (bodyIndexFrame != null)
        {
            bodyIndexFrame.Dispose();
            bodyIndexFrame = null;
        }

        if (multiSourceFrame != null)
        {
            multiSourceFrame.Dispose();
            multiSourceFrame = null;
        }
    }

}

Easy peasy! Happy Coding!

4 comments:

Erik said...

Hi!
I'm new to V2 program and am trying to put a DepthSpacePoint on the physically hightest point of the head of the user. Having troubles accessing the BodyIndexFrame the way you have done it. I'm sure it is some rookie mistake I've done. Could you please push me in the right direction?

THX!
//Erik

using (var frame = reference.BodyIndexFrameReference.AcquireFrame())
{
if (frame != null)
{
frame.CopyFrameDataToArray(bodyIndexFrameData);
var dframe = reference.DepthFrameReference.AcquireFrame();
dframe.CopyFrameDataToArray(depthFrameData);

for (int i = 0; i < depthFrameData.Length; ++i)
{
ushort depth = depthFrameData[i];
byte player = bodyIndexFrameData[i];

if (player != 0xff) //player noticed
{
int temp = (1+(depthFrameData[i] / _sensor.DepthFrameSource.FrameDescription.Width));
int playerheadposY = _sensor.DepthFrameSource.FrameDescription.Height - temp;
float playerheadposX = (i - (playerheadposY * _sensor.DepthFrameSource.FrameDescription.Width));

// To DepthSpacePoint Coordinates
DSP.X = playerheadposX;
DSP.Y = playerheadposY;
}
}
}
}

bryan said...

Hey Erik,

Are you having issues accessing the BodyIndex data or with the algorithm to find X/Y coordinates?

Erik said...

The frames seems to arrive in ok shape since it passes the

if ((dframe != null) && (bIframe != null)).

The trouble is def tring to extract the coordinates. Sorry about the noobish question.

After some thinking I rewrote the function as:

public CameraSpacePoint TopOfTheHead(KinectSensor _sensor, DepthFrame dframe, BodyIndexFrame bIframe, Body body)
{
CameraSpacePoint CSP = default(CameraSpacePoint);

ushort scanDepth = Convert.ToUInt16(Math.Round(1000*body.Joints[JointType.Head].Position.Z, 0)); // Where is the head?

int size = _sensor.DepthFrameSource.FrameDescription.Height * _sensor.DepthFrameSource.FrameDescription.Width;
byte[] bodyIndexFrameData = Enumerable.Repeat(default(byte), size).ToArray();
ushort[] depthFrameData = Enumerable.Repeat(default(ushort), size).ToArray();
var pos = body.Joints[JointType.Head].Position;

if ((dframe != null) && (bIframe != null))
{
bIframe.CopyFrameDataToArray(bodyIndexFrameData);
dframe.CopyFrameDataToArray(depthFrameData);

for (int i = 0; i < depthFrameData.Length; i++)
{
ushort depth = depthFrameData[i];
byte user = bodyIndexFrameData[i];

if ((depth > (scanDepth - 10)) && (depth < (scanDepth + 10))) // Are we on the right depth?
{
if (user != 0xff) // User noticed?
{
//Where is she?
int temp = i / _sensor.DepthFrameSource.FrameDescription.Width;
int line = temp + 1;
int itemp = line * _sensor.DepthFrameSource.FrameDescription.Width;
int column = (itemp - i)/4;
int playerheadposX = column;
int playerheadposY = line;

// To DepthSpacePoint Coordinates
DepthSpacePoint DSP;
DSP.X = playerheadposX;
DSP.Y = playerheadposY;

// CameraSpacePoint Coordinates
CSP = _sensor.CoordinateMapper.MapDepthPointToCameraSpace(DSP, depth);
i = depthFrameData.Length;
}
}
}
}
return CSP;
}

Still no luck, but I guess I'm getting closer. As you might suspect, since I'm converting to CameraSpace, my plan is to display the "top-of-the-head"-point as a as part of the skeleton.

Just lacking the skill to do it so far :D

Any suggestions?

bryan said...

Hmmm, ok. I think I see what you're doing there, though I don't think the 'Z' value of the CameraSpacePoint is equivalent to depth data at the same position -- simply converting from millimeters to meters might not be valid. You might want to check out one of my more recent posts (http://www.bryancook.net/2014/03/head-tracking-with-kinect-v2.html) where I'm displaying joints and mapping between depth/body/color. You might also want to use my head-track example to only examine the depth pixels around the user's head.

The other thing to consider is that the dimensions of the human body are somewhat predictable. Assuming that the 'head' joint is the center of the user's head, you'd simply need to know the size of the head to calculate the top position. Eg, distance between 'head' and 'neck' joints with the length of the neck removed (distance between 'neck' and 'center-shoulder').

Of course, the major issue with both of these approaches is that it doesn't take the rotation of the user's head into consideration. Simply inspecting pixels won't work if the user tilts the head to either side; detecting a forward tilt certainly won't work this way. Each joint has orientation data and that's my current area of study.