Problems, Solutions, and Ideas: Physics-based skeleton fit, inverse kinematics, inverse dynamics, and retargetting

Introduction

I recently had a paper accepted and published that discussed using the Open Dynamics Engine (ODE) as a simple tool for inverse kinematics and inverse dynamics. While I was presenting, I was fortunate to have an opportunity to discuss the paper with Victor Zordan who had a similar paper published at SIGGRAPH in 2003. After a few minutes of confusion, he pointed out to me that I was abusing the terminology. Part of what I was calling "Inverse Kinematics" is known as skeleton fitting and has its own body of literature.

Anyway, I've been feeling like the paper was not as clear as it should be for a technique that is pretty simple; so I decided to try again here.

My end goal is to produce coordinated dynamic movements in simulated humanoid characters to accomplish some end goal. To get there, we are attempting to draw inspiration from human movements since it seems that humans manage to control themselves pretty well. This leads to a sub-goal of capturing the important features of a human movement. We begin with motion capture data from a real human and a physically-simulated humanoid model.

Method

We obtain the motion capture data using a 16 camera Phasespace system. This system tracks optical markers (LEDs) that pulse in known patterns. These active markers make it so that there's less risk of markers getting mixed up when they pass near each other--a problem that optical systems with passive (IR reflective) markers have to deal with. Downsides to active markers are that they're more expensive and require electrical wiring and a power source (so they can't be used to track to a flying pigeon, for example).

Model

We implement the humanoid model as a collection of rigid bodies connected by joint constraints. Automatically finding the kinematic skeleton parameters belongs partially to the domain of skeleton fitting which has processes for figuring out where joints should go and what degrees of freedom or angular limits they should have based on the relative locations of markers observed over time. We bypass that step by simply constructing a "good-enough" model in ODE.

Figure 1. Humanoid model, designed by hand and simulated with ODE.
The model has 48 internal degrees of freedom.

The model is based roughly on my own dimensions and flexibility. We use universal joints for the elbows, wrists, knees, and ankles. We use hinge joints to connect the toes to the heels. All other joints are ball-and-socket joints with three angular degrees of freedom: hips, shoulders, collar-bones, upper-neck, lower-neck, upper spine, and lower spine. This gives a total of 48 internal degrees of freedom.

ODE's joints provide built-in joint limit constraints, although for ball-and-socket joints, you need to use a second joint known as an "angular motor". This is because it is difficult to keep three independent angular degrees of freedom well-defined. We use the joint limit constraints to keep the body from adopting poses that are not usually possible, such as bending the elbows backwards.

The engine also provides joint motors. These allow you to set a target velocity for the joint as well as limits on the amount of force or torque that can be used to achieve that velocity. For example, one might set the elbow joint to bend at a fixed rate of two radians per second, using no more than 200 N m of torque. If the permitted amount of torque is insufficient, the system will not be able to reach the target velocity, but will get as close as it can. The joint motors are incorporated into the solver along with all other constraints and the system is solved simultaneously.

We use the joint motors to provide a resting pose for the model. For each degree of freedom, we define a target angle and a maximum amount of torque for achieving that angle. We then set the target joint velocity as necessary for the joint to achieve its target orientation in one simulation frame. This, along with the joint limits, can be thought of as a "prior" over the space of all relative body poses that biases the body to toward particular solutions and also decreases rapid oscillation between different body orientations.

If the resting pose is a prior, we need some evidence in order to create a likelihood and ultimately find an a posteriori probable body pose. For this, we use the motion capture data. The basic approach is very simple: we attach the markers to the model and they drag the body along. The actual implementation of this is only a little bit more involved, but for the sake of formality, we will first define a set of symbols.

Definitions

We have a set of $n=21$ rigid bodies $B=[b_1, ...,b_2]$. Each body has shape, dimensions, and an inertia tensor computed from its shape and dimensions, assuming a constant density of $800\frac{\text{kg}}{\text{m}^3}$, which gives the model approximately the same total mass as me. We have $h=20$ joints $J=[j_1,...,j_h]$ connecting the $n$ bodies. Each joint has 1 to 3 angular degrees of freedom. We then have position data from $k=32$ motion capture markers: $M=[m_1,...,m_k]$. The data are sampled over discrete intervals of time and the simulation engine works with discrete time; so we will also refer to time discretely with $T$ frames of time, with each frame representing $\Delta$ seconds. We will refer to the column vector position of a marker or body at frame $t$ as $p(m_i,t)$. Its linear velocity is $v(m_i,t)$. For bodies, we will also refer to the orientation $q(b_i,t)$ (a quaternion) and angular velocity $\omega(b_i,t)$. In our implementation, each angular degree of freedom of each joint has its own scalar value, and we treat each joint as if it had three angular degrees of freedom $a(j_i,t)$ with the extra values being assigned to zero when there are fewer than three actual degrees of freedom.

We create a rigid body in ODE for each marker. We instruct ODE to treat these markers as "kinematic" bodies. To the simulator, this means that the marker bodies are treated as though they had infinite mass and their trajectories cannot be changed by external forces. When we create the marker bodies, we place each according to its initial position: $x(m_i,0)$. For each frame of simulation, we set the velocity of each marker body by finite differencing so that it will get where it needs to be on the next frame. By convention, we are actually setting the velocity for the next rather than the current frame: $v(m_i,t+1) = \frac{x(m_i,t+1)-x(m_i,t)}{\Delta}$.

Skeleton pose fitting

We then attach each marker $m_i$ with a ball-and-socket joint to some point on a corresponding body $b_h$. Skeleton fitting literature provides automatic ways to figure out what that point should be. We just assigned the points by hand. To the physics engine, since $m_i$ is kinematic, this essentially just defines a constraint on $b_h$, stating that the point the marker is attached to must also have the velocity $v(m_i,t+1)$, plus a little extra if the attachment point is not equal to $x(m_i,t)$. The constraint solver then tries to make that happen. We can then record the resulting body pose, $x(b_i,t)$ and $q(b_i,t)$ for each frame of time.

The orientation of some bodies may be under-constrained by the markers (there are no markers on the neck). Fortunately the "resting-position" prior will take care of that. Other bodies may be over-constrained because the markers will move around a little bit and human joints cannot be modeled perfectly accurately in the simulator. This is not a problem either, because, as I mentioned before, constraints in ODE can actually be thought of as very stiff springs with implicit integration. If two markers attached to a single body part go different directions and other constraints prevent the part from following them, the constraints "stretch" to accommodate the move and then snap back into place when possible. Because we want to be sure that the marker constraints stretch before the internal body joints do, we tell ODE to use "looser springs" for the markers than the internal joints $J$ by setting the "constraint force mixing" parameter (CFM) to be $1\times 10^{-4}$ while the other joints use a value ten times smaller. With these values, the body still follows the markers with high accuracy, but keeps the body parts together as it should. However, these values were chosen somewhat arbitrarily. In future work, we will make more principled decisions, keeping in mind that we are trying to balance a number of competing springs.

Inverse kinematics

To me, the distinction between finding the skeleton pose and inverse kinematics is very small. If you want to constrain a point on the skeleton to follow a path, it makes little difference if the path comes from a motion capture marker or is synthesized. The physics engine only sees constraints and solves them all simultaneously, as far as is possible. This means that it is very easy to create additional constraints to prevent ground penetration or foot-skate. We use the standard collision model and contact joints with friction to accomplish this.

The simulation finds a body pose that does a good job satisfying the marker constraints. Once we have that body pose, we can use it as a new resting pose and modify the movement using the same technique. If we want the hand to reach a little farther, we can constrain it to do so. It is important, however, to balance our springs. If we try to move the hand but the arm joints are too stiff, we might end up dragging the entire body instead of just extending our reach. The nice thing is that this method provides an intuitive way to control the result that you will get. You can weaken the springs or modify the setpoints controlling the arms to bias the movement to the joints of your choice, or you can add additional constraints to keep the waist in place or the feet planted.

The method works well, in real-time, even when the simulated model dimensions and mass are changed drastically, allowing you to quickly retarget the markers to another body model.

Simply changing the model without any other changes will not do great retargeting, of course. Unfortunately, I do not believe that there is a perfect way automatically accomplish high-quality, general-purpose retargeting without a firm understanding of the purpose of the movement. If the movement is clapping hands, you need to constrain the hands to meet at the right time. If the movement is locomotion, you need to constrain the limbs to produce the appropriate ground forces at the right times and you need to decide if a larger body should walk faster because of its longer legs or if it should use a short stride to match the original movement. These decisions cannot really be made automatically because there are situations where an animator might want either. Once you decide what you want, however, you might be able to implement it as a constraint in the physics engine.

Inverse Dynamics

For some purposes, it is useful to know how much effort is required to accomplish a particular movement. Perhaps we wish to control a real robot and need to know the torques to apply at each joint. Perhaps we want a measure of the effort that a person exerts when performing a particular action. Once we have a kinematic sequence of body poses, we can use the physics engine to extract dynamics.

To accomplish this, we simply need to begin with the character model in the appropriate starting state and then constrain the internal joints to reproduce the relative movement that the markers induced at each frame. The constraint solver functions by first dividing the desired velocities by the size of the timestep to produce accelerations. It then does a lot of matrix math to solve, essentially, $F=ma$. Once the solver has found the appropriate forces and torques, the physics engine integrates these to get a change in position and orientation for each body. The torques and forces found along the way are available if they are needed.

Figure 2. Workflow for accomplishing inverse dynamics using ODE. The finite difference in marker data over time provides a velocity constraint on the state of an articulated character model. The finite difference over model poses between different frames of time provides an internal constraint on a character model.

Results

The good news is that these methods are straightforward and robust to implement. Unfortunately, discrepancies between the model and reality will make it so that the dynamic model falls over unless action is taken to stabilize it. In this work, we simply used "Hand of God forces". We attached a joint to the model's waist that would constrain it to reproduce the orientations recorded during the pose-fitting pass. To minimize the effect of these external forces, we limited the amount of stabilizing torque available.

Figure 3. A simulated character imitates a human movement, shifting balance from one foot to the other.

Figure 4. Measured ground forces for the right and left feet (red and green) are very close to the ground forces computed using the physics engine (magenta and blue for right and left feet respectively). Yellow and cyan lines show the external stabilization forces, limited to 30 N m along the pitch and roll axes.

Some validation of this method is available in a simple experiment shown in Figures 3 and 4. Using a pair of Nintendo Wii Balance boards (using the wiiuse library to record data), I measured vertical ground forces for both feet while transitioning from bipedal stance, to balancing on my left and then right leg (Fig. 3). We then recovered the movements and the forces using the method described above (Fig. 4). The computed forces closely match the measured forces, showing that it may be possible to compute inverse dynamics, even with multiple contact constraints. There are some large perturbations during the transition interval, but even these are not terribly serious and can be blamed on discrepancies between the contact surfaces and collision computations (my feet are not, in fact, cylinders) and the fact that the force plates are smoothing their output, but we report the computed values without any filtering.

Conclusion

Although ODE has been described as a tool for creating games, it can be used for many other purposes. The constraints solver provides a simple mechanism for controlling an animated character or even extracting joint torques for quantifying human behavior or controlling other devices. This is pretty cool and easy to do.

Benefits

This approach has some advantages over related work. In general, people solve the inverse kinematics problem by minimizing squared marker error. Noisy markers are a big problem if you do this. A blip in the motion capture that causes a measured marker location to jump causes a big squared error. This kinks the skeleton. This approach also requires a lot of markers to fully specify the skeleton. The ability to bias the skeleton solution with some prior is nice. Although it is probably possible to include additional terms in the error minimization function, multi-objective optimization can be touchy work. On the other hand, setting the spring stiffnesses may be more-or-less the same problem.

For inverse dynamics, the big advantage of this approach is that the entire system (including contact forces) is computed simultaneously.

Problems, Solutions, and Ideas

Thursday, December 20, 2012

Physics-based skeleton fit, inverse kinematics, inverse dynamics, and retargetting