You have certainly not missed (as a regular reader of this blog Sourire ) that the Kinect for Windows SDK is out!

For now, however, no gestures recognition services are available. So throughout this paper we will create our own library that will automatically detect simple movements such as swipe but also movements more complex such as drawing a circle with your hand.

The detection of such gestures enable Powerpoint control the Jedi way ! (similar to the Kinect Keyboard Simulator demo).

If you are not familiar with the Kinect for Windows SDK, you should read a previous post that addressed the topic:http://blogs.msdn.com/b/eternalcoding/archive/2011/06/13/unleash-the-power-of-kinect-for-windows-sdk.aspx

如何检测手势 ?

检测手势有很多解决方案，文章中将介绍两种:

算法搜索
基于模板搜索

Note that these two techniques have many variants and refinements.

You can find the code used in this article just here: http://kinecttoolbox.codeplex.com

GestureDetector class

To standardize the use of our gestures system, we will therefore propose an abstract class GestureDetector inherited by all gesture classes:

This class provides the Add method used to record the different positions of the skeleton’s joints.

It also provides the abstract method LookForGesture implemented by the children.

It stores a list of Entry in the property Entries whose role is to save the properties and timing of each recorded position.

描画已存储的位置点

The Entry class also stores an WPF ellipse that will be used to draw the stored position:

Via the TraceTo method of the GestureDetector class, we will indicate which canvas will be used to draw the stored positions.

In the end, all the work is done in the Add method:

public virtual void Add(Vector position, SkeletonEngine engine)
{
    Entry newEntry = new Entry {Position = position.ToVector3(), Time = DateTime.Now};
    Entries.Add(newEntry);
 
    if (displayCanvas != null)
    {
        newEntry.DisplayEllipse = new Ellipse
        {
            Width = 4,
            Height = 4,
            HorizontalAlignment = HorizontalAlignment.Left,
            VerticalAlignment = VerticalAlignment.Top,
            StrokeThickness = 2.0,
            Stroke = new SolidColorBrush(displayColor),
            StrokeLineJoin = PenLineJoin.Round
        };
 
 
        float x, y;
 
        engine.SkeletonToDepthImage(position, out x, out y);
 
        x = (float)(x * displayCanvas.ActualWidth);
        y = (float)(y * displayCanvas.ActualHeight);
 
        Canvas.SetLeft(newEntry.DisplayEllipse, x - newEntry.DisplayEllipse.Width / 2);
        Canvas.SetTop(newEntry.DisplayEllipse, y - newEntry.DisplayEllipse.Height / 2);
 
        displayCanvas.Children.Add(newEntry.DisplayEllipse);
    }
 
    if (Entries.Count > WindowSize)
    {
        Entry entryToRemove = Entries[0];
        
        if (displayCanvas != null)
        {
            displayCanvas.Children.Remove(entryToRemove.DisplayEllipse);
        }
 
        Entries.Remove(entryToRemove);
    }
 
    LookForGesture();
}

注意SkeletonToDepthImage 方法是将三维坐标转成二维坐标，且在每个坐标轴上的值介于0和1之间。

除了保存节点位置之外，GestureDetector 类还可以将他们在屏幕上画出来以使开发和调试更加简化和直观:

As we can see above, the positions being analyzed are shown in red above the Kinect image. To activate this service, the developer just needs to put a canvas over the image that shows the stream of the Kinect camera and pass this canvas to the GestureDetector.TraceTo method:

<Viewbox Margin="5" Grid.RowSpan="5">
    <Grid Width="640" Height="480" ClipToBounds="True">
        <Image x:Name="kinectDisplay"></Image>
        <Canvas x:Name="kinectCanvas"></Canvas>
        <Canvas x:Name="gesturesCanvas"></Canvas>
        <Rectangle Stroke="Black" StrokeThickness="1"/>
    </Grid>
</Viewbox>

The Viewbox is used to keep the image and the canvas at the same size. The second canvas (kinectCanvas) is used to display the green skeleton (using a class available in the sample : SkeletonDisplayManager).

Event-based approach

GestureDetector class provides one last service to his children: the RaiseGestureDetected method that reports the detection of a new gesture via anOnGestureDetected event. The SupportedGesture argument of this event contains the following values:

SwipeToLeft
SwipeToRight
Circle

Obviously the solution is extensible and I encourage you to add new gestures to the system.

The RaiseGestureDetected method (with MinimalPeriodBetweenGestures property) also guarantee that a certain time will elapse between two gestures (in order to remove badly executed gestures).

Now that our foundations are laid, we can develop our algorithms.

Algorithmic search

The algorithmic search browses the list of positions and checks that predefined constraints are always valid.

The SwipeGestureDetector class is responsible for this search:

以SwipeToRight挥到右边手势为例, 我们使用了以下限制条件:

每个新的位置需在上个位置的右边
每个位置在高度上不能超过第一个位置指定的长度(20cm)
首末位置点的间隔时间必须在250ms和1500ms之间
手势至少要有40cm长

The SwipeToLeft gesture is based on the same constraints except for the direction of the movement of course.

To effectively manage these two gestures, we use a generic algorithm that checks the four constraints mentioned above:

bool ScanPositions(Func<Vector3, Vector3, bool> heightFunction, Func<Vector3, Vector3, bool> directionFunction, Func<Vector3, Vector3, bool> lengthFunction, int minTime, int maxTime)
{
    int start = 0;
 
    for (int index = 1; index < Entries.Count - 1; index++)
    {
        if (!heightFunction(Entries[0].Position, Entries[index].Position) || !directionFunction(Entries[index].Position, Entries[index + 1].Position))
        {
            start = index;
        }
 
        if (lengthFunction(Entries[index].Position, Entries[start].Position))
        {
            double totalMilliseconds = (Entries[index].Time - Entries[start].Time).TotalMilliseconds;
            if (totalMilliseconds >= minTime && totalMilliseconds <= maxTime)
            {
                return true;
            }
        }
    }
 
    return false;
}

为了使用以上方法，我们必须提供三个函数以及延迟时间来检测，

为了检测这两个手势，只要简单调用如下代码:

protected override void LookForGesture()
{
    // Swipe to right
    if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight, // Height
        (p1, p2) => p2.X - p1.X > -0.01f, // Progression to right
        (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
        SwipeMininalDuration, SwipeMaximalDuration)) // Duration
    {
        RaiseGestureDetected(SupportedGesture.SwipeToRight);
        return;
    }
 
    // Swipe to left
    if (ScanPositions((p1, p2) => Math.Abs(p2.Y - p1.Y) < SwipeMaximalHeight,  // Height
        (p1, p2) => p2.X - p1.X < 0.01f, // Progression to right
        (p1, p2) => Math.Abs(p2.X - p1.X) > SwipeMinimalLength, // Length
        SwipeMininalDuration, SwipeMaximalDuration))// Duration
    {
        RaiseGestureDetected(SupportedGesture.SwipeToLeft);
        return;
    }
}

骨骼检测的稳定性

为了确保检测的正确性，我们必须确认骨骼是静态的以免检测生成错误的手势(例如检测整个身体都在移动的情况)

为了做以上工作，我们使用BarycenterHelper 类：

public class BarycenterHelper
{
    readonly Dictionary<int, List<Vector3>> positions = new Dictionary<int, List<Vector3>>();
    readonly int windowSize;
 
    public float Threshold { get; set; }
 
    public BarycenterHelper(int windowSize = 20, float threshold = 0.05f)
    {
        this.windowSize = windowSize;
        Threshold = threshold;
    }
 
    public bool IsStable(int trackingID)
    {
        List<Vector3> currentPositions = positions[trackingID];
        if (currentPositions.Count != windowSize)
            return false;
 
        Vector3 current = currentPositions[currentPositions.Count - 1];
 
        for (int index = 0; index < currentPositions.Count - 2; index++)
        {
            Debug.WriteLine((currentPositions[index] - current).Length());
 
            if ((currentPositions[index] - current).Length() > Threshold)
                return false;
        }
 
        return true;
    }
 
    public void Add(Vector3 position, int trackingID)
    {
        if (!positions.ContainsKey(trackingID))
            positions.Add(trackingID, new List<Vector3>());
 
        positions[trackingID].Add(position);
 
        if (positions[trackingID].Count > windowSize)
            positions[trackingID].RemoveAt(0);
    }
}

通过IsStable方法可以得知骨骼是正在移动着的还是静止的。

我们利用此信息，只有当骨骼不是移动着时才告知检测系统节点位置信息。

void ProcessFrame(ReplaySkeletonFrame frame)
{
    Dictionary<int, string> stabilities = new Dictionary<int, string>();
    foreach (var skeleton in frame.Skeletons)
    {
        if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
            continue;
 
        barycenterHelper.Add(skeleton.Position.ToVector3(), skeleton.TrackingID);
 
        stabilities.Add(skeleton.TrackingID, barycenterHelper.IsStable(skeleton.TrackingID) ? "Stable" :"Unstable");
        if (!barycenterHelper.IsStable(skeleton.TrackingID))
            continue;
 
        foreach (Joint joint in skeleton.Joints)
        {
            if (joint.Position.W < 0.8f || joint.TrackingState != JointTrackingState.Tracked)
                continue;
 
            if (joint.ID == JointID.HandRight)
            {
                swipeGestureRecognizer.Add(joint.Position, kinectRuntime.SkeletonEngine);
                circleGestureRecognizer.Add(joint.Position, kinectRuntime.SkeletonEngine);
            }
        }
 
        postureRecognizer.TrackPostures(skeleton);
    }
 
    skeletonDisplayManager.Draw(frame);
 
    stabilitiesList.ItemsSource = stabilities;
 
    currentPosture.Text = "Current posture: " + postureRecognizer.CurrentPosture.ToString();
}

回放&记录工具

在用Kinect SDK开发过程中另一个重要的一点是有效的测试。为了测试你必须站起来，到传感器面前摆出合适的姿势。除非你有一个助手，否则会变得很痛苦。

因此我们有一个kinect信息的记录和回放服务。

记录

记录部分非常地简单，因为我们只需要得到SkeletonFrame且遍历每一个骨骼将其中内容序列化：

public void Record(SkeletonFrame frame)
{
    if (writer == null)
        throw new Exception("You must call Start before calling Record");
 
    TimeSpan timeSpan = DateTime.Now.Subtract(referenceTime);
    referenceTime = DateTime.Now;
    writer.Write((long)timeSpan.TotalMilliseconds);
    writer.Write(frame.FloorClipPlane);
    writer.Write((int)frame.Quality);
    writer.Write(frame.NormalToGravity);
 
    writer.Write(frame.Skeletons.Length);
 
    foreach (SkeletonData skeleton in frame.Skeletons)
    {
        writer.Write((int)skeleton.TrackingState);
        writer.Write(skeleton.Position);
        writer.Write(skeleton.TrackingID);
        writer.Write(skeleton.EnrollmentIndex);
        writer.Write(skeleton.UserIndex);
        writer.Write((int)skeleton.Quality);
 
        writer.Write(skeleton.Joints.Count);
        foreach (Joint joint in skeleton.Joints)
        {
            writer.Write((int)joint.ID);
            writer.Write((int)joint.TrackingState);
            writer.Write(joint.Position);
        }
    }
}

回放

回放机制主要的问题是关于数据结构。事实上，kinect类是封装好的、不对外暴露它的构造函数。为了解决这个问题，我们复制模仿了kinect类层次，且增加了隐式强制操作符:

SkeletonReplay 类中的Start 方法来负责回放:

public void Start()
{
    context = SynchronizationContext.Current;
 
    CancellationToken token = cancellationTokenSource.Token;
 
    Task.Factory.StartNew(() =>
    {
        foreach (ReplaySkeletonFrame frame in frames)
        {
            Thread.Sleep(TimeSpan.FromMilliseconds(frame.TimeStamp));
 
            if (token.IsCancellationRequested)
                return;
                                      
            ReplaySkeletonFrame closure = frame;
            context.Send(state =>
                            {
                                if (SkeletonFrameReady != null)
                                    SkeletonFrameReady(this, new ReplaySkeletonFrameReadyEventArgs{SkeletonFrame = closure});
                            }, null);
        }
    }, token);
}

最终，我们可以记录和回放手势来调试我们的应用:

你可以在这里下载到回放的例子: http://www.catuhe.com/msdn/davca.replay.zip

知道何时开始

最后一个剩下需要解决的问题就是决定何时开始手势的分析。我们已经知道需在身体静止的时候，但这还不够。

即使我保持静止，我说话时配合了大量手势，也会无意中触发了某个手势。

为了防止这种情况，需要增加一个姿势的条件。

这是我们为什么打算用PostureDetector类:

public class PostureDetector
{
    const float Epsilon = 0.1f;
    const float MaxRange = 0.25f;
    const int AccumulatorTarget = 10;
 
    Posture previousPosture = Posture.None;
    public event Action<Posture> PostureDetected;
    int accumulator;
    Posture accumulatedPosture = Posture.None;
 
    public Posture CurrentPosture
    {
        get { return previousPosture; }
    }
 
    public void TrackPostures(ReplaySkeletonData skeleton)
    {
        if (skeleton.TrackingState != SkeletonTrackingState.Tracked)
            return;
 
        Vector3? headPosition = null;
        Vector3? leftHandPosition = null;
        Vector3? rightHandPosition = null;
 
        foreach (Joint joint in skeleton.Joints)
        {
            if (joint.Position.W < 0.8f || joint.TrackingState != JointTrackingState.Tracked)
                continue;
 
            switch (joint.ID)
            {
                case JointID.Head:
                    headPosition = joint.Position.ToVector3();
                    break;
                case JointID.HandLeft:
                    leftHandPosition = joint.Position.ToVector3();
                    break;
                case JointID.HandRight:
                    rightHandPosition = joint.Position.ToVector3();
                    break;
            }
        }
 
        // HandsJoined
        if (CheckHandsJoined(rightHandPosition, leftHandPosition))
            return;
 
        // LeftHandOverHead
        if (CheckHandOverHead(headPosition, leftHandPosition))
        {
            RaisePostureDetected(Posture.LeftHandOverHead);
            return;
        }
 
        // RightHandOverHead
        if (CheckHandOverHead(headPosition, rightHandPosition))
        {
            RaisePostureDetected(Posture.RightHandOverHead);
            return;
        }
 
        // LeftHello
        if (CheckHello(headPosition, leftHandPosition))
        {
            RaisePostureDetected(Posture.LeftHello);
            return;
        }
 
        // RightHello
        if (CheckHello(headPosition, rightHandPosition))
        {
            RaisePostureDetected(Posture.RightHello);
            return;
        }
 
        previousPosture = Posture.None;
        accumulator = 0;
    }
 
    bool CheckHandOverHead(Vector3? headPosition, Vector3? handPosition)
    {
        if (!handPosition.HasValue || !headPosition.HasValue)
            return false;
 
        if (handPosition.Value.Y < headPosition.Value.Y)
            return false;
 
        if (Math.Abs(handPosition.Value.X - headPosition.Value.X) > MaxRange)
            return false;
 
        if (Math.Abs(handPosition.Value.Z - headPosition.Value.Z) > MaxRange)
            return false;
 
        return true;
    }
 
 
    bool CheckHello(Vector3? headPosition, Vector3? handPosition)
    {
        if (!handPosition.HasValue || !headPosition.HasValue)
            return false;
 
        if (Math.Abs(handPosition.Value.X - headPosition.Value.X) < MaxRange)
            return false;
 
        if (Math.Abs(handPosition.Value.Y - headPosition.Value.Y) > MaxRange)
            return false;
 
        if (Math.Abs(handPosition.Value.Z - headPosition.Value.Z) > MaxRange)
            return false;
 
        return true;
    }
 
    bool CheckHandsJoined(Vector3? leftHandPosition, Vector3? rightHandPosition)
    {
        if (!leftHandPosition.HasValue || !rightHandPosition.HasValue)
            return false;
 
        float distance = (leftHandPosition.Value - rightHandPosition.Value).Length();
 
        if (distance > Epsilon)
            return false;
 
        RaisePostureDetected(Posture.HandsJoined);
        return true;
    }
 
    void RaisePostureDetected(Posture posture)
    {
        if (accumulator < AccumulatorTarget)
        {
            if (accumulatedPosture != posture)
            {
                accumulator = 0;
                accumulatedPosture = posture;
            }
            accumulator++;
            return;
        }
 
        if (previousPosture == posture)
            return;
 
        previousPosture = posture;
        if (PostureDetected != null)
            PostureDetected(posture);
 
        accumulator = 0;
    }
}

PostureDetector 类基于对每个节点位置的比较。例如，为了触发Hello 姿势, 我们必须知道是否手与头在同一高度，且手在一侧至少25cm.

此外，系统使用了一个累加器来确保在指定的几帧内姿势是静止的。

此外，此类是极富可扩展性的。

基于模版的搜索

基于算法搜索的主要确定就是所有的手势不都是易于用限制条件描述的。我们因此需考虑更加通用的方法。

我们假定一个手势是可以被记录的，且随后系统会决定当前的手势是否是已知的。

最终，我们的目标是有效地比较两个手势。

Compare the comparable

在我们开始写一个比较算法之前，我们必须先标准化数据。

事实上，一个手势是一系列点的集合Indeed（在这篇文章中我们将简单地比较2D手势如画圆)。点的坐标决定了离传感器的距离，我们把他们一起放到一个通用的参考，为此需做以下工作:

用定义了数量的点集合生成一个新的手势
旋转手势使得第一个点在0度位置
缩放手势使得其在坐标轴上的值介于0和1之间
将手势居中到(0,0)点

做了这些变化后，我们就能够比较点数组了。

为了使用这些技术来包装点数组，我们用以下代码：

public static List<Vector2> Pack(List<Vector2> positions, int samplesCount)
{
    List<Vector2> locals = ProjectListToDefinedCount(positions, samplesCount);
 
    float angle = GetAngleBetween(locals.Center(), positions[0]);
    locals = locals.Rotate(-angle);
 
    locals.ScaleToReferenceWorld();
    locals.CenterToOrigin();
 
    return locals;
}

GoldenSectionExtensions 静态类有一些用到的方法:

public static class GoldenSectionExtensions
{
    // Get length of path
    public static float Length(this List<Vector2> points)
    {
        float length = 0;
 
        for (int i = 1; i < points.Count; i++)
        {
            length += (points[i - 1] - points[i]).Length();
        }
 
        return length;
    }
 
    // Get center of path
    public static Vector2 Center(this List<Vector2> points)
    {
        Vector2 result = points.Aggregate(Vector2.Zero, (current, point) => current + point);
 
        result /= points.Count;
 
        return result;
    }
 
    // Rotate path by given angle
    public static List<Vector2> Rotate(this List<Vector2> positions, float angle)
    {
        List<Vector2> result = new List<Vector2>(positions.Count);
        Vector2 c = positions.Center();
 
        float cos = (float)Math.Cos(angle);
        float sin = (float)Math.Sin(angle);
 
        foreach (Vector2 p in positions)
        {
            float dx = p.X - c.X;
            float dy = p.Y - c.Y;
 
            Vector2 rotatePoint = Vector2.Zero;
            rotatePoint.X = dx * cos - dy * sin + c.X;
            rotatePoint.Y = dx * sin + dy * cos + c.Y;
 
            result.Add(rotatePoint);
        }
        return result;
    }
 
    // Average distance betweens paths
    public static float DistanceTo(this List<Vector2> path1, List<Vector2> path2)
    {
        return path1.Select((t, i) => (t - path2[i]).Length()).Average();
    }
 
    // Compute bounding rectangle
    public static Rectangle BoundingRectangle(this List<Vector2> points)
    {
        float minX = points.Min(p => p.X);
        float maxX = points.Max(p => p.X);
        float minY = points.Min(p => p.Y);
        float maxY = points.Max(p => p.Y);
 
        return new Rectangle(minX, minY, maxX - minX, maxY - minY);
    }
 
    // Check bounding rectangle size
    public static bool IsLargeEnough(this List<Vector2> positions, float minSize)
    {
        Rectangle boundingRectangle = positions.BoundingRectangle();
 
        return boundingRectangle.Width > minSize && boundingRectangle.Height > minSize;
    }
 
    // Scale path to 1x1
    public static void ScaleToReferenceWorld(this List<Vector2> positions)
    {
        Rectangle boundingRectangle = positions.BoundingRectangle();
        for (int i = 0; i < positions.Count; i++)
        {
            Vector2 position = positions[i];
 
            position.X *= (1.0f / boundingRectangle.Width);
            position.Y *= (1.0f / boundingRectangle.Height);
 
            positions[i] = position;
        }
    }
 
    // Translate path to origin (0, 0)
    public static void CenterToOrigin(this List<Vector2> positions)
    {
        Vector2 center = positions.Center();
        for (int i = 0; i < positions.Count; i++)
        {
            positions[i] -= center;
        }
    }
}

黄金分割

比较我们的数据可以通过一个简单的比较每个点之间的平均距离方法。但是，这种解决方案不够精确。

所以我使用了一个更加好的算法:黄金分割法。http://www.math.uic.edu/~jan/mcs471/Lec9/gss.pdf

JavaScript版本的实现如下: http://depts.washington.edu/aimgroup/proj/dollar/

以下是C#版本的实现:

public static float Search(List<Vector2> current, List<Vector2> target, float a, float b, float epsilon)
{
    float x1 = ReductionFactor * a + (1 - ReductionFactor) * b;
    List<Vector2> rotatedList = current.Rotate(x1);
    float fx1 = rotatedList.DistanceTo(target);
 
    float x2 = (1 - ReductionFactor) * a + ReductionFactor * b;
    rotatedList = current.Rotate(x2);
    float fx2 = rotatedList.DistanceTo(target);
 
    do
    {
        if (fx1 < fx2)
        {
            b = x2;
            x2 = x1;
            fx2 = fx1;
            x1 = ReductionFactor * a + (1 - ReductionFactor) * b;
            rotatedList = current.Rotate(x1);
            fx1 = rotatedList.DistanceTo(target);
        }
        else
        {
            a = x1;
            x1 = x2;
            fx1 = fx2;
            x2 = (1 - ReductionFactor) * a + ReductionFactor * b;
            rotatedList = current.Rotate(x2);
            fx2 = rotatedList.DistanceTo(target);
        }
    }
    while (Math.Abs(b - a) > epsilon);
 
    float min = Math.Min(fx1, fx2);
 
    return 1.0f - 2.0f * min / Diagonal;
}

利用此算法，我们可以简单将一个模板与当前手势进行比较并且得到一个分数(0到1之间)

学习机

我们改善我们的成功比率，我们需要多个模板。为此，我们开发了LearningMachine 类，它的职责是存储我们的模板(例如：学习新的模型)并且与当前的姿势进行比较:

public class LearningMachine
{
    readonly List<RecordedPath> paths;
 
    public LearningMachine(Stream kbStream)
    {
        if (kbStream == null || kbStream.Length == 0)
        {
            paths = new List<RecordedPath>();
            return;
        }
 
        BinaryFormatter formatter = new BinaryFormatter();
 
        paths = (List<RecordedPath>)formatter.Deserialize(kbStream);
    }
 
    public List<RecordedPath> Paths
    {
        get { return paths; }
    }
 
    public bool Match(List<Vector2> entries, float threshold, float minimalScore, float minSize)
    {
        return Paths.Any(path => path.Match(entries, threshold, minimalScore, minSize));
    }
 
    public void Persist(Stream kbStream)
    {
        BinaryFormatter formatter = new BinaryFormatter();
 
        formatter.Serialize(kbStream, Paths);
    }
 
    public void AddPath(RecordedPath path)
    {
        path.CloseAndPrepare();
        Paths.Add(path);
    }
}

每个RecordedPath 有一种Match 方法的实现，这里会调用黄金分割搜索:

public bool Match(List<Vector2> positions, float threshold, float minimalScore, float minSize)
{
    if (positions.Count < samplesCount)
        return false;
 
    if (!positions.IsLargeEnough(minSize))
        return false;
 
    List<Vector2> locals = GoldenSection.Pack(positions, samplesCount);
 
    float score = GoldenSection.Search(locals, points, -MathHelper.PiOver4, MathHelper.PiOver4, threshold);
 
    return score > minimalScore;
}

多亏了我们的检测算法和状态机，我们可以用一个可靠且对kinect提供的信息以来非常小的系统来处理。

你可以找到一个以以上为基础的识别画圆的例子: http://www.catuhe.com/msdn/circleKB.zip.

总结

So we have at our disposal a set of tools for working with Kinect. In addition we have two systems to detect a large number of gestures.

It's now your turn to use these services in your Kinect applications!

且听风吟~

(翻译)Gestures and Tools for Kinect

Gestures and Tools for Kinect