Case Study – Mid Term | Introduction to Interactive 3D Environments Programming

Over the course of these past two months I have become enamored with 3D animation. Not only that – I am determined to get good at it. Some may find this interesting because I am not a gamer and I had never really thought about 3D before outside the context of movies. Now I find that the class I took only because Dr. Baker told me that I should is my favorite. And now I find that my head is filled with ideas about its multitude of applications and questions about how I can make it one of the cornerstones of my new career. (The caveat being that I am a total beginner and have my work cut out for me!) But, God willing and if the creek don’t rise, I will come out of this with highly marketable skills. But wait! There’s more.

Enter our journal reports. When I first started focusing on the Kinect it was for a couple of simple reasons: I have a Wii, I really like it, and the Kinect is a more advanced version. Which brings us to motion capture. (Another technology that I had never given much thought to but am now fascinated with.) That is why my case study is going to expand upon the materials I covered already with an eye to the future. Where is all of this taking us? What are researchers and developers working on right now? What are they hoping for? How will it impact social media and marketing? How will it impact the military? If I can peer a few moments into the future and get an inkling of what is to come then I will be happy.

Way back in February, during the second week of the semester I asked around to get some guidance on which sensor to write about while wondering aloud how best to find something that would enrich my 3D Animation class experience. At that time the fabulous Gabby suggested that I look into the XBOX Kinect because it was being tinkered around with in conjunction with Blender. Needless to say I was blown away. So I took her advice and did a little research about the Kinect sensor and this is what I learned:

The Kinect sensors are housed in a flat black box that sits atop a small platform. There are four components. The first is a full color VGA camera. This sensor is used for facial recognition and body “skeleton” detection, which is used to map 48 points on the human body. (The software, which I will discuss later can use these mapped points to fill in body images when blocked by other players or furniture during a game!)

The second sensor is used to determine the dimensions of the room that the game is being played in. It is composed of something called a monochrome CMOS (complimentary metal oxide semiconductor) and an infrared projector with sensor. Because all three are used together most lighting conditions do not affect the way the game “sees” the room.

The third sensor is an array of four microphones that are specially designed to ignore ambient noise with 24-bit analog-to-digital converter (ADC) and Kinect-resident signal processing including acoustic echo cancellation and noise suppression. Sound can also be recorded.

And lastly is the Tilt Motor, which allows for the data being produced by a 3-axis accelerometer configured for a 2G range, where G is the acceleration due to gravity. It is possible to use the accelerometer to determine the current orientation of the Kinect.

Kinect Array Specifications

Viewing angle 43° vertical by 57° horizontal field of view
Vertical tilt range ±27°
Frame rate (depth and color stream) 30 frames per second (FPS)
Audio format 16-kHz, 24-bit mono pulse code modulation (PCM)
Audio input characteristics A four-microphone array with 24-bit analog-to-digital converter (ADC) and Kinect-resident signal processing including acoustic echo cancellation and noise suppression
Accelerometer characteristics A 2G/4G/8G accelerometer configured for the 2G range, with a 1° accuracy upper limit.

The Kinect Software

There are many elements to the software that is part of the Kinect game program, but with out a doubt, the software that frees the user from hand held input devices is what gives true meaning to it. When the developers were looking for ways to maximize the users’ experience they make a conscious decision NOT to pre-program motion into canned actions and reactions into the software. Instead, they did something amazing. They decided to “teach” the program to learn by categorizing and classifying real people in the real world. As a result:

“Every single motion of the body is an input,”
– Alex Kipman, Microsoft’s director of Project Natal

Kinect can also distinguish players and their movements even if they’re partially hidden. Kinect extrapolates what the rest of your body is doing as long as it can detect some parts of it. This allows players to jump in front of each other during a game or to stand behind pieces of furniture in the room

The developers used countless people of all different ages, sizes, even wearing different kinds of clothing to create the massive database the software would “learn” from. The algorithms for this program were written by Jamie Shotton, a member of Project Natal, the Cambridge based Microsoft research team, which developed the Kinect software. The process by which it does that is outlined below.

Step 1: As you stand in front of the camera, it judges the distance to different points on your body. In the image on the far left, the dots show what it sees, a so-called “point cloud” representing a 3-D surface; a skeleton drawn there is simply a rudimentary guess. (The image on the top shows the image perceived by the color camera, which can be used like a webcam.)

Step 2: Then the brain guesses which parts of your body are which. It does this based on all of its experience with body poses—the experience described above. Depending on how similar your pose is to things it’s seen before, Natal can be more or less confident of its guesses. In the color-coded person above [bottom center], the darkness, lightness, and size of different squares represent how certain Natal is that it knows what body-part that area belongs to. (For example, the three large red squares indicate that it’s highly probable that those parts are “left shoulder,” “left elbow” and “left knee”; as the pixels become smaller and muddier in color, such as the grayish pixels around the hands, that’s an indication that Natal is hedging its bets and isn’t very sure of its identity.)

Step 3: Then, based on the probabilities assigned to different areas, Natal comes up with all possible skeletons that could fit with those body parts. (This step isn’t shown in the image above, but it looks similar to the stick figure drawn on the left, except there are dozens of possible skeletons overlaid on each other.) It ultimately settles on the most probable one. Its reasoning here is partly based on its experience, and partly on more formal kinematics models that programmers added in.

Step 4: Once Natal has determined it has enough certainty about enough body parts to pick the most probable skeletal structure, it outputs that shape to a simplified 3D avatar [image at right]. That’s the final skeleton that will be skinned with clothes, hair, and other features and shown in the game.

Step 5: Then it does this all over again—30 times a second! As you move, the brain generates all possible skeletal structures at each frame, eventually deciding on, and outputting, the one that is most probable. This thought process takes just a few milliseconds, so there’s plenty of time for the Xbox to take the info and use it to control the game.

The Kinect Actuators, (sort-of)

When discussing the Kinect actuators, or the mechanism by which a control system acts upon an environment, it is easiest to discuss the applications they have in other areas. Originally commercially used control systems and actuators were simple, like those in fixed mechanical or electronic systems, or software-based like printer drivers. But now, with humans as actuators things are getting interesting. After all this stuff was not originally intended for use in gaming. The software behind it comes from research originally done at MIT in the mid 1990’s. The scientist there decided that when it comes to actuators “Stiffness isn’t everything” and when performing certain tasks it not only was it possible to get good force resolution while filtering out high frequency disturbances from the environment, it was desirable as well. That is why they invested so much time and energy into series elastic actuators.

Which brings us to consumer robotics, clearly the most mainstream and exciting application so far. People have tons videos up on YouTube demonstrating all kinds of cool stuff they have done with the Kinect system. Things like a humanoid machine chopping a banana while being controlled remotely…..

to using it to tell a Roomba vacuum to go back and re-do a spot it missed. Let’s face it: Who cares that the Kinect software is not good enough for highly complex robotic systems for industries like defense? Its low cost and flexibility is changing the face of consumer robotics right now. Its motion sensing input device, through some amazing software, provides full-body 3D motion capture, facial recognition and voice recognition capabilities are all awesome abilities to incorporate into a robot in order to get it to do what you want. The video below shows a super low coast home made elastic actuator.

https://www.youtube.com/watch?v=J3m4vksAWtM&playnext=1&list=PL1637A5CDA8D13E36&feature=results_main

Then there is the use of the software in animation. This application is an example of data manipulation software. The Kinect sensors generate a tremendous amount of data. While tooling around on the Internet I found more than a few companies offering some variation.

Peter Dinklage of Game of Thrones fame as Mister Sinister in X Men:Days of Future Past? Maybe, if you believe the buzz on websites such as Badass Digest and Indiewir.com. Vanity Fair magazine has confirmed that the red hot actor is has been cast as the primary antagonist, and Bryan Singer, the producer, added fuel to the rumor fire, when he made the following comment about using motion capture during an recent MTV interview:

“I definitely want to use this technology again, and I might even be using some of it in a different way in ‘X-Men.’ I don’t wanna say how, yet, but I’m definitely using some of this technology on ‘X-Men’ which I never used in any of the other ‘X-Men’ films.”

There is also a growing demand outside of the entertainment industry.

For example, the NFL is embracing motion capture. In a recent article by Luis Bien for the on-line magazine sbNation explained how. Forty-two hopeful young players were evaluated after performing a series of drills and tasks wearing capture suits. The data that was generated was then analyzed at Motus, a New York City biomechanics lab. They then identified tiny errors in each athlete’s form that was having a negative impact on their performance so adjustments can be made.

New Balance is using motion capture in combination with 3D printing to transform the science of racing shoes. In fact, this past January at the New Balance Games, athlete Jack Bolas competed while wearing shoes that were custom-designed by this process. In a recent article for the Boston Globe a representative of the company said that in the future they hoped for them to be affordable by the general public.

In an attempt to get some insight.

Over the break I attended a workshop given by a woman named Victoria Nece. Her and her partner have just created a manipulation program called KinectToPin. Not surprisingly, the day was a bit crazy. Just try to picture it: Eight strangers, all with different types of computers, different levels of expertise AND a teacher who had never taught the class before. Add all that to typical technical glitches and you really wind up with quite a day.

Below is a video of me recording my marker-less raw data:

The capture program is run through terminal and generates data that looks like this:

Effects 3D Point Control #1 3D Point #2
Frame X pixels Y pixels Z pixels
0 338.87537 84.99136 122.55664
1 338.78912 85.23187 122.879326
2 338.60226 85.58409 122.29409
3 338.6587 86.5849 121.91251
4 338.89645 86.347626 121.745895
5 338.87836 84.98755 122.0451
6 339.12137 84.27066 121.693756
7 339.0376 84.594894 121.669044
8 339.09326 84.198685 121.60439
9 339.32986 84.419785 121.539856
10 340.24704 85.04266 121.67168
11 340.7793 85.77255 121.69226
12 340.48853 85.38559 121.61558
13 341.00864 85.050186 121.65762
14 341.2102 84.58797 122.00688

It is then imported into After Effects and behold:

Please note the KinctToPin area of the After Effects work station

The following pictures are of Olympic athletes from about a hundred years ago. I imported one into Photoshop and chopped him up into pieced that matched the key points on my skeleton.

The end result being this: (PSD)

Then began the tedious task creating “pins” and attaching them to the generated skeleton (which requires a very specific naming process). They also had to be scaled.

and so on…..

Then! MOVIE!

Case Study – Mid Term

Blender Links

Recent Posts

Recent Comments

Archives

Categories

Meta

The OpenLab at City Tech:A place to learn, work, and share

Support

Accessibility

Copyright

Creative Commons

The OpenLab at City Tech:A place to learn, work, and share

Support

Accessibility

Copyright

Creative Commons