How Microsoft Kinetic Works

Microsoft's upcoming Kinect motion-control sensor for Xbox 360 isn't magic. The accessory, coming Nov. 4 for $150, uses a complex system of cameras, sensors, microphones and -- just as importantly -- software to watch and listen to a gamer moving in front of it.

A Microsoft patent, granted by the U.S. Patent and Trademark Office on Thursday, gives some insight into how Kinect works. In fact, the patent application sums up nicely, in one paragraph, what Kinect's sensors see and how Kinect sends that data to an Xbox.

To determine whether a target or object in the scene corresponds to a human target, each of the targets may be flood filled and compared to a pattern of a human body model. Each target or object that matches the human body model may then be scanned to generate a skeletal model associated therewith. The skeletal model may then be provided to the computing environment such that the computing environment may track the skeletal model, render an avatar associated with the skeletal model, and may determine which controls to perform in an application executing on the computing environment based on, for example, gestures of the user that have been recognized from the skeletal model. A gesture recognizer engine, the architecture of which is described more fully below, is used to determine when a particular gesture has been made by the user.

This figure, included in Microsoft's patent, shows a how a gamer's punching motion controls a boxing Xbox game through Kinect.Click here to enlarge

The patent application describes a gadget with an infrared light emitter, a depth sensor and a digital camera that combine to capture people and objects in three-dimensional space. The infrared light bounces off of the Xbox player -- and whatever furniture is in the room -- enabling the sensor to determine movement. The camera brings in video, which in part can be analyzed by face-recognition software.
Microphones pick up ambient noise in addition to a player's voice. Gizmodo reports that Kinect features a line of four microphones, strategically placed in set intervals so as to determine where a voice comes from. Software helps the system by canceling out echos and noise bleed-over from whatever game is being played.
Kinect takes the data from the sensors and, as noted above, "flood fills" the image of the player, creating a sort of psychedelic silhouette. The software then compares what it sees to a preset model of the human body and determines a rough skeletal structure.

Kinect's software builds a virtual skeleton out of a number of reference points on the human body.Click here to enlarge
"In this embodiment, a variety of joints and bones are identified: each hand, each forearm, each elbow, each bicep, each shoulder, each hip, each thigh, each knee, each foreleg, each foot, the head, the torso, the top and bottom of the spine, and the waist," Microsoft's patent states. "Where more points are tracked, additional features may be identified, such as the bones and joints of the fingers or toes, or individual features of the face, such as the nose and eyes."
If it weren't for the software, however, Kinect would feed the Xbox a mess of unneeded data. Microsoft built in motion thresholds, so that an unintentional small motion might not trigger the game but a deliberate small motion might.
It's a balancing act.
"Properly identifying gestures and the intent of a user greatly helps in creating a positive user experience," the patent says. "Where a gesture recognizer system is too sensitive, and even a slight forward motion of the hand is interrupted as a throw, the user may become frustrated because gestures are being recognized where he has no intent to make a gesture, and thus, he lacks control over the system. Where a gesture recognizer system is not sensitive enough, the system may not recognize conscious attempts by the user to make a throwing gesture, frustrating him in a similar manner.
"At either end of the sensitivity spectrum, the user becomes frustrated because he cannot properly provide input to the system."
The patent describes a number of other scenarios Microsoft took into account for its built-in sensitivity thresholds. These include a user's height (including footwear, hairstyle and posture), the relative angles of bones (signifying, for instance, whether the user is jumping), the location of a gesture (for example, throwing a football occurs above the shoulder), and others.

Kinect could recognize sign-language gestures. Click to enlarge
You can read the patent application, now approved, here.
Interestingly, the a good portion of the patent is dedicated to the possible application of Kinect to communication through sign language. Perhaps not now, but sometime, Kinect could recognize American Sign Language and use signs as just another input device.
"The user is making a gesture with his left hand to signal the character 'a' in American Sign Language," the patent states. "This gesture may be interpreted as if the user were pressing the 'a' key on a keyboard."

In optional operation, each gestured character is converted to its spoken equivalent or text equivalent. This may then be output locally, or sent across a communications network for remote output to a second user, or both.
For instance, where the user is playing an online multiplayer video game, such as a first person shooter, the game may also support voice chat. Where the user is unable to speak, he may be prevented from joining in the voice chat. Even though he would be able to type input, this may be a laborious and slow process to someone fluent in ASL.
Under the present system, he could make ASL gestures to convey his thoughts, which would then be transmitted to the other users for auditory display. The user's input could be converted to voice locally, or by each remote computer. In this situation, for example, when the user kills another user's character, that victorious, though speechless, user would be able to tell the other user that he had been 'PWNED.'
In another embodiment, a user may be able to speak or make the facial motions corresponding to speaking words. The system may then parse those facial motions to determine the user's intended words and process them according to the context under which they were inputted to the system.
This figure shows the elements of a computer environment in which Kinect could be included.Click here to enlarge

2 comments:

AnonymousJanuary 25, 2011 at 3:01 PM
Well written...It is good that such interesting technologies are brought into the market. Such technologies will definitely attract the people. When inventors come up with such new and interesting ideas or inventions, they must make sure that they are protecting their inventions using the Intellectual properties. Recently i happened to come across an article titled "Request for Examination of a patent application – a mandatory requirement in India" which describes in detail about the need of request for examination of a patent application in India. Have a look at the mentioned article at "http://www.sinapseblog.com/2011/01/request-for-examination-of-patent.html". This kind of articles will definitely help people in clearing their doubts in proceeding with obtaining a patent application for their invention.
Markus ForbesAugust 19, 2024 at 4:15 AM
Great blog you haave

Bonjour & Welcome

FlickR

Popular Posts

How to open programs Via Windows Explorer

IOS7 lock screen security flaws

Bing Background Images 01/19/2013 - 01/27/2013

Bing Background 02/02/2013 - 02/10/2013

How to Install NS-3 in Linux and Configure it with Eclipse

Bing Background 02/11/2013 - 02/16/2013

Manage root access on rooted Android

Android JB 4.2+ Exchange plain text password bug

Enable eclipse global menu integration in Ubuntu (Fix)

Get Microsoft profession tools for FREE