Friday, May 27, 2011

Sampling from a multivariate normal in C++

Update 3 May 2013:  There was a Stackoverflow link to this, so I posted an answer with some updated code.


For some reason, it's hard to find code that lets you sample from a multivariate normal.  That's annoying.  So I followed Wikipedia's instructions for creating the sample.  I use Boost to generate univariate normal samples and Eigen to handle the matrix math.  Here's my current solution:
#ifndef __EIGENMULTIVARIATENORMAL_HPP
#define __EIGENMULTIVARIATENORMAL_HPP

#include <Eigen/Dense>

#include <math.h>

#include <boost/random/mersenne_twister.hpp>
#include <boost/random/normal_distribution.hpp>
#include <boost/random/variate_generator.hpp>

/**
    We find the eigen-decomposition of the covariance matrix.
    We create a vector of normal samples scaled by the eigenvalues.
    We rotate the vector by the eigenvectors.
    We add the mean.
*/
template<typename _Scalar, int _size>
class EigenMultivariateNormal
{
    boost::mt19937 rng;    // The uniform pseudo-random algorithm
    boost::normal_distribution<_Scalar> norm;  // The gaussian combinator
    boost::variate_generator<boost::mt19937&,boost::normal_distribution<_Scalar> >
       randN; // The 0-mean unit-variance normal generator

    Eigen::Matrix<_Scalar,_size,_size> rot;
    Eigen::Matrix<_Scalar,_size,1> scl;

    Eigen::Matrix<_Scalar,_size,1> mean;

public:
    EigenMultivariateNormal(const Eigen::Matrix<_Scalar,_size,1>& meanVec,
        const Eigen::Matrix<_Scalar,_size,_size>& covarMat)
        : randN(rng,norm)
    {
        setCovar(covarMat);
        setMean(meanVec);
    }

    void setCovar(const Eigen::Matrix<_Scalar,_size,_size>& covarMat)
    {
        Eigen::SelfAdjointEigenSolver<Eigen::Matrix<_Scalar,_size,_size> >
           eigenSolver(covarMat);
        rot = eigenSolver.eigenvectors();
        scl = eigenSolver.eigenvalues();
        for (int ii=0;ii<_size;++ii) {
            scl(ii,0) = sqrt(scl(ii,0));
        }
    }

    void setMean(const Eigen::Matrix<_Scalar,_size,1>& meanVec)
    {
        mean = meanVec;
    }

    void nextSample(Eigen::Matrix<_Scalar,_size,1>& sampleVec)
    {
        for (int ii=0;ii<_size;++ii) {
            sampleVec(ii,0) = randN()*scl(ii,0);
        }
        sampleVec = rot*sampleVec + mean;
    }
    
};

#endif

Tuesday, May 10, 2011

*DRAFT* Spherical Bezier Splines

In Euclidean space, splines are a convenient way to generate polynomial curve that satisfies a certain set of requirements: passes through certain points, has certain derivatives, has good continuity, etc.  In 3d simulation, for example, you can smoothly move a body from its current position and velocity to another position and velocity at a specific time by creating 3rd order spline that meets your needs.  If the body needs to avoid certain locations along its route or faces velocity constraints, you can insert additional control points, making the body's trajectory a piecewise third-order polynomial.  This is very convenient.

The body's orientation, however, is a little bit different.  Rigid-body orientation can be easily represented using unit-length quaternions. Interpolating from one quaternion to another is simple enough using slerp as addressed in the previous post.  However, if you want to smoothly change angular velocity, then you need something else.  Slerp is effectively linear interpolation abstracted to the surface of a sphere.  It might be appropriate to call this geodesic interpolation.  In Euclidean space, a first order spline is just linear interpolation.  A second order spline is linear interpolation of two linear interpolations and the pattern continues into higher orders.  That's why the De Casteljau's algorithm works.  It seems reasonable to do the same thing on the surface of a sphere.

So I went looking for this sort of thing.  Shoemake, in his '85 quaternion paper, talked about the possibility to do this, but didn't explore it much.  Mostly, it seems like computation was too expensive at the time to do a lot of spherical splines.  Apparently, two years later, he wrote about spherical quadrangle interpolation (squad):  Squad(A,B,c,d,t) = Slerp( Slerp(A,B,t), Slerp(c,d,t), 2t(1-t) ).

I can't readily find a copy of his original paper, but there are a number of sources that talk about it.  I'm not sure what the advantage of using quadrangle interpolation is.  In quadrangle interpolation, you're essentially interpolating between two lines (or arcs on the sphere) instead of interpolating between points.  Moreover, you use a quadratic function to interpolate the two lines so that you start off moving along one line, go towards the other line, and then come back to your original line.  It's pretty easy to illustrate this in Euclidean space.
Quadratic spline interpolation: The red line is a quadratic that initially follows the green line but then ends up along the blue line.  This is not squad.

Quadrangle interpolation: the red line is a cubic polynomial that starts at the green line, bends toward the blue line, but then ends up back on the green line.  This is what squad uses.
It is kind of nice that you can get a cubic result with only 3 interpolations instead of 6.  This isn't a big deal at all though in Euclidean space (since you don't actually do the interpolations).  However, in Shoemake's time, the trigonometric functions necessary for slerp would probably have been quite expensive if you really wanted that C2 continuity..
 A correspondence between the cubic quadrangle case: $$f(t) = (1-2t(1-t))((1-t)A+tB)+2t(1-t)((1-t)C+tD)$$ and the cubic spline: $$g(t)=(1-t)^3A + 3t(1-t)^2B + 3t^2(1-t)C + t^3D$$ can be found by looking at what it takes to accomplish my goal---interpolating from A to D with pre-defined first-derivatives at A and D.  In the cubic spline, you set $$B=A+\frac{dg(0)/dt}{3}$$ and $$C=D-\frac{dg(1)/dt}{3}$$.  In the quadrangle, you set $$B=\frac{A+2D+df(1)/dt}{3}$$ and $$C=\frac{3A-B+df(0)/dt}{2}$$. 

A number of papers and articles talk a bit about spherical splines, but they often mean different things by the term.  Popiel and Noakes produced a very rigorous and mathematically gratuitous discussion of cubic splines on the surface of a sphere.  By all appearances, they are using a ton of symbols to precisely specify and simultaneously obfuscate the rather intuitive statements made by Shoemake twenty years earlier.

Buss and Fillmore look at spherical splines from the common perspective that each point along a spline is a weighted average of multiple control points (instead of a hierarchy of linear interpolations).  This gives rise to some grief inasmuch as averaging multiple points on the surface of a sphere is challenging to pin down, but they do it.

Anyway, my literature search didn't really give me what I wanted, although it did say that it's possible.  The algorithm didn't seem too complex, so I decided to give it a shot.

A common thing in most of the literature is to insist that the distance from one point on the unit sphere to another should be defined as the shortest angle between them: $\theta = acos(p_1\cdot p_2)$.  I decided that one doesn't really need to enforce that requirement.  You can go around the great circle of a sphere an arbitrary number of times and still stop at a given point.  That's still a valid 'line'.

You can describe a point in 3D Euclidean space relative to some frame of reference with 3 scalars, but you can also describe the point with a unit-length direction vector and a magnitude scalar giving the distance from the origin.  In Euclidean space, that's not especially helpful.  On the sphere, however, it allows us to describe a point that is several circumferences away from the original point.  It also allows us to place control points some multiple of $\pi$ radians away from the reference point, which is not possible using only points because there are multiple "shortest paths" from one pole of a sphere to the other.  On the sphere, a direction vector is one that is orthogonal to the reference vector and distance is an unconstrained angle such that if we have control point $P_1$ and direction $D_1$ and angle $\theta_1$, then, on the sphere, control point $P_2 = \cos(\theta_1)P_1 + \sin(\theta_1)D_1$.  If we add in our interpolation parameter to this, we get an alternative form of the slerp equation: $P(t) = \cos(t\theta)P(0) + \sin(t\theta)D$.

With this as a basis, I can now talk about two different types of spherical splines that I played with.  For simplicity, the discussion is in terms of quadratic splines, but the methods extend to higher orders.  The first method is the same as Shoemake suggested.  To create a quadratic bezier spline from control points $P_1$, $P_2$, and $P_3$, we're essentially just calling $$P(t)=slerp(slerp(P_1,P_2,t),slerp(P_2,P_3,t),t)$$
The other type of spherical spline is somewhat related to the idea of quadrangle interpolation, and if I have done anything original in this work, this is probably it.  Rather than find a point that interpolates two interpolated points, we interpolate the first plane of interpolation (the full frame of reference) onto the second plane and then do the interpolation within the interpolated frame of reference.
Given control points $P1$, $P2$, $P3$, we would interpolate the orange plane defined by $P1,P2$ towards the purple plane defined by $P2,P3$ and then do interpolation within that interpolated plane.
Both methods produce some nice results, generally quite different from each other.

Some math:

First method-- $slerp(slerp(P_1,P_2,t),slerp(P_2,P_3,t),t)$.  Given the direction-magnitude representation, we have convenience variables $D_1$, $D_2$, $\theta_1$, and $\theta_2$ such that $P_2 = cos(\theta_1)P_1 + sin(\theta_1)D_1$ and likewise, $P_3 = cos(\theta_2)P_2 + sin(\theta_2)D_2$.  We define intermediate variables $$K_{t,1} = cos(t\theta_1)P_1 + sin(t\theta_1)D_1$$ and $$K_{t,2} = cos(t\theta_2)P_2 + sin(t\theta_2)D_2$$.  To find the interpolated point, we then compute $\phi_t = \cos^{-1}(K_{t,1}\cdot K_{t,2}) + 2c\pi$ where $c$ is a constant describing how many time to go around the unit sphere when interpolating between the two original curves.  We cannot use a single direction vector for this rotation because the interpolation is generally happening through multiple planes, but we can still loop out around the sphere.  If $c=-1$, then the interpolation takes the long way around the sphere.  The final value uses the classic form of slerp: $$P(t) = \frac{\sin((1-t)\phi)}{\sin(\phi)}K_{t,1} + \frac{\sin(t\phi)}{\sin(\phi)}K_{t,2}$$
Computing the derivative of this function is a bit tedious, but not otherwise particularly complicated, although I had not previously been aware of the inverse trig derivatives: $\frac{d}{dt}\cos^{-1}(u) = -\left(1-u^2\right)^{-\frac{1}{2}}\frac{du}{dt}$.   
So we end up with:

$$\begin{align}\frac{dP(t)}{dt} =
&\frac{\cos((1-t)\phi_t)\left(-\phi_t +(1-t)\frac{d\phi_t}{dt}\right)}{\sin(\phi_t)}K_{1,t}-\\
&\frac{\sin((1-t)\phi_t)\cos(\phi_t)\frac{d\phi_t}{dt}}{\sin^2(\phi_t)}K_{1,t} +\\

&\frac{\cos(t\phi_t)\left(\phi_t+t\frac{d\phi_t}{dt}\right)}{\sin(\phi_t )}K_{2,t} - \frac{sin(t\phi_t)\cos(\phi_t)\frac{d\phi_t}{dt}}{\sin^2(\phi_t)}K_{2,t} +\\
&\frac{\sin((1-t)\phi_t)}{\sin(\phi_t)}\frac{dK_{1,t}}{dt}+ \frac{\sin(t\phi_t)}{\sin(\phi_t)}\frac{dK_{2,t}}{dt}
\end{align}$$
Where $\frac{dK_{i,t}}{dt}$ is computed recursively and $$\frac{d\phi_t}{dt}= -\frac{K_{1,t}\cdot \frac{dK_{2,t}}{dt}+ \frac{dK_{1,t}}{dt}\cdot K_{2,t}}{\sqrt{1-K_{1,t}\cdot K_{2,t}}}$$  Although lengthy and full of some relatively processor intensive functions, this can be solved on modern hardware without too much trouble up to a few orders deep (and a more patient person than I could probably also simplify this down a bit).  With a cubic-degree recursive slerp spline, you can chain a sequence of knots together and interpolate between groups of four.  However, the generated curve is not quite so aesthetically appealing on the surface of a sphere.  It tends to kink up in places.

The next method approximates interpolated planes.  We begin by defining some convenience variables and showing the computation of the spline itself.

The idea of this computation is to rotate the $P_1,D_1$ plane towards $P_2$ by $t\theta_1$ to get $S_P,S_D$.  We then rotate $S_D$ toward $D_2$ by $t\phi$.  Finally, we interpolate by $t\theta_2$.
$$\begin{align}
S_P &= \cos(t\theta_1)P_1 + \sin(t\theta_1)D_1\\
S_D &= -\sin(t\theta_1)P_1 + \cos(t\theta_1)D_1\\
\\
R_U &=(\cos(t\phi)-1)U + \sin(t\phi)V\\
R_V &=-\sin(t\phi)U + (\cos(t\phi)-1)V\\
\\
T_P &= S_P + (U\cdot S_P)R_U + (V\cdot S_P)R_V\\
T_D &= S_D + (U\cdot S_D)R_U + (V\cdot S_D)R_V\\
\\
P(t) &= \cos(t\theta_2)T_P + \sin(t\theta_2)T_D
\end{align}$$

Where $U$ and $V$ are the axis that rotates $S_D$ onto $D_2$ and $\phi$ is the amount of rotation to accomplish this.  Although this is uglier than the recursive slerp, the derivative of this function is a bit friendlier:
$$\begin{align}
\frac{dS_P}{dt}& = -\theta_1\sin(t\theta_1)P_1 + \theta_1\cos(t\theta_1)D_1\\
\frac{dS_D}{dt}& = -\theta_1\cos(t\theta_1)P_1 + -\theta_1\sin(t\theta_1)D_1\\
\frac{dR_U}{dt}& = -\phi\sin(t\phi)U + \phi\cos(t\phi)V\\
\frac{dR_V}{dt}& = -\phi\cos(t\phi)U -\phi\sin(t\phi)V\\

\frac{dT_P}{dt} &= \frac{dS_P}{dt}+\left(U\cdot \frac{dS_P}{dt}\right)R_U + (U\cdot S_P)\frac{dR_U}{dt}  +\left(V\cdot \frac{dS_P}{dt}\right)R_V + (V\cdot S_P)\frac{dR_V}{dt}\\
\frac{dT_D}{dt} &= \frac{dS_D}{dt} + \left(U\cdot \frac{dS_D}{dt}\right)R_U + (U\cdot S_D)\frac{dR_U}{dt} + \left(V\cdot \frac{dS_D}{dt}\right)R_V + (V\cdot S_D)\frac{dR_V}{dt}\\
\frac{dP(t)}{dt} &= -\theta_2\sin(t\theta_2)T_P+\theta_2\cos(t\theta_2)T_D+\cos(t\theta_2)\frac{dT_P}{dt} + \sin(t\theta_2)\frac{dT_D}{dt}\end{align}
$$
This function turns out to not actually be quadratic in $t$, but still accomplishes a similar task.  In many cases, this function might be better because the it does not ever have any of the tight kinks that a spherical quadratic may have.  A disadvantage of this function is that it is not symmetric.  It is also difficult to extend to higher order functions.  To extend this to make a smooth curve interpolating multiple points, however, you can overlap middle sections in way similar to that done with cubic b-splines.  Also, because the function is not quadratic in time, you can use the control points to get desired velocity direction and then explicitly control time to get desired velocity magnitude.


The original plan, however, was to actually rotate one 2D frame of reference directly onto another using a single parameter.  In 3D it's fairly simple to find the rotation matrix that rotates one plane onto another.  Given the two orthogonal vectors that define the plane, we use the cross-product to find the third orthogonal vector.  After finding this matrix for both planes, we invert the first and pre-multiply by the second.  The resulting matrix will transform points in the first frame of reference into points in the second frame of reference.  From this matrix, we can extract axis and angle of rotation.  Interpolation is then a matter of 1) rotating the plane of rotation by the appropriate fraction, 2) interpolating the magnitude of the angle of interpolation within the plane, and 3) interpolating within the rotated plane by the interpolated angle. 

In more than 3d, this doesn't work because two orthogonal vectors is not sufficient to define a full frame of reference.  However, because the first vector of the second frame must be co-planar with the first frame, the transformation will always be occurring in a 3d subspace, regardless of the actual dimensionality of the space.  So what you do is find the plane of rotation that rotates $P_1$ onto $P_2$ and also rotates $D_1$ onto $D_2$.  We can find two vectors in that plane by vector subtraction: $V_P = P_2-P_1$ and $V_D = D_2-D_1$. To use them, we must orthonormalize the vectors.  If they do not span a 2D space, then we can arbitrarily choose a dimension to rotate through.

This method is a bit more challenging to apply to higher order functions and this one is polynomial in time.  To generate a cubic, it seems that you would need to rotate the first plane toward the second, rotate the second plane toward the third, and then rotate the two intermediate planes toward each other.  In higher than 3 dimensions, it seems possible for the intermediate planes to span a 4d space which makes rotate one onto the other a bit trickier.
All three splines interpolate from +y toward -x and then to +z.  Orange is quadratic slerp.  Green is rotating planes of rotation.  Black is the rotating planes approximation.
We rotate the intermediate control point down to -y.  Then we move the final endpoint to +y.  When the control points are co-linear, there is a free parameter controlling the angle between the interpolation from $P_1$ to $P_2$ and the interpolation from $P_2$ to $P_3$.