Nan Zhang's Blog: 2015

Monday, February 9, 2015

Loop Closure Calculation (And Yet Another Application of Quaternion)

An inherent problem of registration within multiple captures is error accumulation. Assuming that we take n 3D captures, we can find 3D transfer between all consecutive captures. That should in turn enable us to calculate 3D transfer between any two of these captures such as between view 1 and view n. However, 3D transfer between view 1 and view n can be quite noisy. The reason is that 3D transfer of each pair of consecutive captures has a small error, and combining them together to get 3D transfer between view 1 and n add those errors together. This is called error accumulation.

One effective way to combat error accumulation is called loop closure. When you take multiple captures, a loop may be formed. For instance, when you use a camera to take a panorama view, view 1 and view n overlap. Then there should exist two ways to derive the relative location difference between view 1 and n. The first way is to combine 1->2, 2->3, ..., n-1->n. The second way is to directly find out from n->1 since view 1 and n ends at roughly the same place and form a loop. Due to error accumulation, result obtained by the first way often turns out to be less accurate than the result got from the second method. Thus we can use the second result to calibrate the first result. The outcome of this loop closure technique is that each estimate of 1->2 to n-1->n becomes more accurate.

A seminal paper about loop closure is "Multiview Registration of 3D Scenes by Minimizing Error between Coordinate Frames" written by G C Sharp et. al. Our coding mainly follows this paper and we will call this paper "the paper" for later text. Assuming from view i to view j, the 3D transformation matrix is R(i,j) and offset is t(i.j) where R is a 3-by-3 matrix and t is a 3-by-1 vector, we know R(1,2), R(2,3), ..., R(n-1,n), R(n,1) as well as t(1,2), t(2,3), ..., t(n-1,n), t(n,1). Based on Eq. (11) in the paper, we have:
R(1,2)...R(k,k+1)E(k,k+1)R(k+1,k+2)...R(n,1)=I
where I is a 3-by-3 identity matrix and E(k,k+1) is a 3-by-3 matrix to be found out. We will present Matlab code to calculate E(k,k+1). In Matlab code, R is a matrix which has 3*(n+1) lines and 4 rows. R has format as:
[R(1,2) t(1,2)]
[R(2,3) t(2,3)]
...
[R(n-1,n) t(n-1,n)]
[R(n,1) t(n,1)]
The code to calculate E(k,k+1) is:

Ralpha = zeros(mx, 3); Rbeta = zeros(mx, 3); 
for i = 1:mx/3
    if (i == 1)
        Ralpha(3*(i-1)+1:3*(i-1)+3,1:3) = R(3*(i-1)+1:3*(i-1)+3,1:3);
    else
        Ralpha(3*(i-1)+1:3*(i-1)+3,1:3) = Ralpha(3*(i-2)+1:3*(i-2)+3,1:3)*R(3*(i-1)+1:3*(i-1)+3,1:3);
    end
end

for i=mx/3:-1:1
    if (i == mx/3)
        Rbeta(3*(i-1)+1:3*(i-1)+3,1:3) = R(3*(i-1)+1:3*(i-1)+3,1:3);
    else
        Rbeta(3*(i-1)+1:3*(i-1)+3,1:3) = R(3*(i-1)+1:3*(i-1)+3,1:3)*Rbeta(3*(i)+1:3*(i)+3,1:3);
    end
end 

Emat = zeros(mx, 3);Emat_divN = zeros(mx, 3);
for i = 1:mx/3
    if (i ~= mx/3)
        Emat(3*(i-1)+1:3*(i-1)+3,1:3) = Ralpha(3*(i-1)+1:3*(i-1)+3,1:3)'*Rbeta(3*i+1:3*i+3,1:3)';
    else
        Emat(3*(i-1)+1:3*(i-1)+3,1:3) = Ralpha(3*(i-1)+1:3*(i-1)+3,1:3)';
    end
end

After calculating E(k,k+1), according to Eq. (22) of the paper we need to find the 1/n power of E(k,k+1). Without quaternion, it will be clueless to find the 1/n power of a rotation matrix. But by using quaternion, this become straightforward. We first convert E(k,k+1) to quaternion, divide the rotation angle by n, and then convert it back with the following Matlab code:

for i = 1:mx/3
    [w, x, y, z] = R2quaternion(Emat(3*(i-1)+1:3*(i-1)+3,1:3));
    w_new=cos(acos(w)/(mx/3));
    x_new=x*sin(acos(w)/(mx/3))/sin(acos(w));
    y_new=y*sin(acos(w)/(mx/3))/sin(acos(w));
    z_new=z*sin(acos(w)/(mx/3))/sin(acos(w));
    Emat_divN(3*(i-1)+1:3*(i-1)+3,1:3) = quaternion2R(w_new,x_new,y_new,z_new);
end

The final step is to use E_divN(k,k+1) to calibrate R(k,k+1):

Rtilda = zeros(mx, 4);
for i = 1:mx/3
    Rtilda(3*(i-1)+1:3*(i-1)+3,1:3) = R(3*(i-1)+1:3*(i-1)+3,1:3)*Emat_divN(3*(i-1)+1:3*(i-1)+3,1:3); 
end

For vector t, we use the simplified approach to calculate the average of all t vectors and deduct this average from each t:

tsum = zeros(3,1);
for i = 1:mx/3
    tsum = tsum+R(3*(i-1)+1:3*(i-1)+3,4);
end
tavg = tsum/(mx/3);

followed by

for i = 1:mx/3
    Rtilda(3*(i-1)+1:3*(i-1)+3,4) = R(3*(i-1)+1:3*(i-1)+3,4)-tavg; 
end

The whole Matlab code can be downloaded from here: https://drive.google.com/folderview?id=0B05MVyfGr8lmQWZVV25qQWpmbkE&usp=sharing

Wednesday, January 7, 2015

Wii Motion Monitor: Quaternion in Android App

Wii motion monitor is an Android app that I wrote. This app connects Wiimote made by Nintendo with Android smartphone. Wiimote is a motion sensing device. In the case of this app, when the user changes the orientation of Wiimote, the 3D object in Android phone screen will rotate together with Wiimote. We should mention that no every phone vendor allows their phones to communicate with Wiimote. At the time we wrote this app, only Google and Motorola's Android phones can do this. An introduction of this app can be found in Youtube at https://www.youtube.com/watch?v=2HHfNzsYHIg. This app is also an open source project which can be downloaded at http://code.google.com/p/wii-motion-monitor/ (After code.google deprecated, you can download from https://drive.google.com/drive/folders/0B05MVyfGr8lmOUw3SlRHVFBqNW8)

This app is mainly consisted of two parts: the first part is a customized library which establishes the wireless link between Wiimote and smart phone. Majority of codes in this part is written in C language, and then incorporated into Android project by using Java Native Interface. This part of code is in the folder of ./jni. It benefits greatly from an open source library named fWIIne (http://sourceforge.net/projects/fwiine/). The second part is signal processing code which enables the virtual object in phone screen to track the 3D rotation of the real Wiimote in which quaternion is the backbone of the algorithm.

The inputs of 3D rotation tracking are from inertial sensors in Wiimote including accelerometer and gyroscope. Each accelerometer reading includes three values ax, ay and az, which represent the acceleration in x-axis, y-axis and z-axis, respectively. When the object is not moving, due to gravity, accelerometer tells Wiimote's orientation. For example, [0, 0, 9.8] means the object is lying on the desk facing upwards. Another inertial sensor, gyroscope, measures the rotation speed of the object. It also has three values gx, gy and gz, which represent the rotation speed around x-axis, y-axis and z-axis, respectively. When an object is stationary, each value is zero. But when the object starts to move, gyroscope starts to have meaningful outputs. In fact, accelerometer and gyroscope complement each other well. When Wiimote is moving, it is difficult to derive its orientation from accelerometer alone since gravity is mixed with motion acceleration. But this is not a problem for gyroscope since it records the rotation in each moment and thus provides useful information for orientation. Suffering from error accumulation, gyroscope has its own drawback. That means each gyroscope reading contains error, and when many readings are added together, the total error will shoot to the sky. However, accelerometer does not suffer from error accumulation. Therefore, while gyroscope tracks short-term movement accurately, its error accumulation can be corrected with the help of accelerometer. Because of their complementary properties, many 3D control algorithms fuse the inputs from both gyroscope and accelerometer.

In general, there are two ways to represent rotation in 3D space: Euler angle and quaternion. Euler angle is a more intuitive way to interpret 3D rotation. If one wants to rotate an object from one orientation to another, he can at first rotate it around x-axis for certain degree, then y-axis, followed by z-axis. These angles are called Euler angles. In fact, going through different axes, there are multiple ways available for the rotations. Despite its intuitiveness, Euler angles have some serious issues. The most famous one among them is gimbal lock, which means your object won't rotate as it is supposed to be. I still remembered my own encounter with gimbal lock. When I initially coded the rotation in my app with Euler angle, if pointing Wiimote to certain direction (I think it is up), the virtual 3D object on my phone screen will rotate violently showing that the rotation tracking algorithm collapsed. Another issue of Euler angle is that it is not signal processing friendly. For example, if someone asks you to perform a 1/n rotation from vector u to vector v, how to do that with Euler angle is far from clear. By comparison, quaternion is a concept which seems intriguing at the beginning but becomes easy to use after you get familiar with it. The idea of quaternion is simple. Assuming that there are two 3D vectors u and v, their rotation is only determined by their cross product, u x v, and an angle. In total, 4 elements are sufficient to define the rotation, three for cross product and one for angle. Unlike Euler angle, quaternion is unique. Also it does not suffer from gimbal lock. For the sake of signal processing, quaternion is also easier than Euler angle. Posing the same question of 1/n rotation from vector u to vector v, you can just use the same cross product but make the angle 1/n of the original value. An important property of quaternion is that the squares of its four elements add up to 1: q0^2+q1^2+q2^2+q3^2=1.

Now we can start to explain how quaternion-based signal processing is used in Wii motion monitor app. The bulk of it is within IMUupdate() function of CubeRenderer.java under .\src\com\goldsequence\motionmonitorpro. Note that for this quaternion-based motion tracking algorithm, we follow Madgwick’s IMU code but with some changes on filtering method (his code can be found at https://code.google.com/p/imumargalgorithm30042010sohm/) Our function has six inputs: ax/ay/az are from accelerometer and gx/gy/gz are from gyroscope. The way that gyroscope and accelerometer data are combined is like complementary filter.

At first, the input of accelerometer is normalized:

        norm = (float) Math.sqrt(ax*ax + ay*ay + az*az);
        ax = ax / norm;
        ay = ay / norm;
        az = az / norm;

Based on quaternion, we can derive 3D rotation matrix. Their relationship is that 3D rotation matrix
M =

[q0*q0+q1*q1-q2*q2-q3*q3    2*q1*q2-2*q0*q3           2*q1*q3+2*q0*q2      ]

[2*q1*q2+2*q0*q3            q0*q0+q2*q2-q1*q1-q3*q3   2*q2*q3-2*q0*q1      ]

[2*q1*q3-2*q0*q2            2*q2*q3+2*q0*q1         q0*q0+q3*q3-q1*q1-q2*q2]

The last low of M is parallel to z-axis, which the direction of gravity. Since MM'=I, M*[vx vy vz]'=[0 0 1]'. Cross multiplying this row with accelerometer tells us the discrepancy between accelerometer measurement and quaternion:
        // estimated direction of gravity
        vx = 2*(q1*q3 - q0*q2);
        vy = 2*(q0*q1 + q2*q3);
        vz = q0*q0 - q1*q1 - q2*q2 + q3*q3;

        // error is sum of cross product between reference direction of field and direction measured by sensor
        ex = (ay*vz - az*vy);
        ey = (az*vx - ax*vz);
        ez = (ax*vy - ay*vx);

Therefore, those discrepancies ex/ey/ez are used to correct the measurement of gyroscope:
         // integral error scaled integral gain
       exInt = filtCoef*exInt + (1-filtCoef)*ex;
       eyInt = filtCoef*eyInt + (1-filtCoef)*ey;
       ezInt = filtCoef*ezInt + (1-filtCoef)*ez;

       // adjusted gyroscope measurements
       gx = filtCoef2*gx + (1-filtCoef2)*(Kp*ex + Ki*exInt)/T;
       gy = filtCoef2*gy + (1-filtCoef2)*(Kp*ey + Ki*eyInt)/T;
       gz = filtCoef2*gz + (1-filtCoef2)*(Kp*ez + Ki*ezInt)/T;

At last, the calibrated gyroscope measurement is incorporated into quaternion. When a quaternion q multiplies with another vector p, it becomes
[q0 -q1 -q2 -q3]*[p0 p1 p2 p3]'
[q1 q0 -q3 q2]
[q2 q3 q0 -q1]
[q3 -q2 q1   q0]
where p' = [p0 p1 p2 p3]' is a 4 x 1 vector. With Quaternion properties of i^2=j^2=k^2=ijk=-1, (q0+q1*i+q2*j+q3*k)*(p0+p1*i+p2*j+p3*k)=(q0p0-q1p1-q2p2-q3p3)+(q0p1+q1p0+q2p3-q3p2)*i+(q0p2+q2p0+q3p1-q1p3)*j+(q0p3+q1p2-q2p1+q3p0)*k. This explains the quaternion multiply matrix.

Let p = [0 gx gy gz], and then multiplying the product of q and p with T/2 to make it delta quaternion, we have the code below. The T/2 factor is used to change angle speed to angle.

        // integrate quaternion rate and normalise
        q0 = q0 + (-q1*gx - q2*gy - q3*gz)*halfT*T;
        q1 = q1 + (q0*gx + q2*gz - q3*gy)*halfT*T;
        q2 = q2 + (q0*gy - q1*gz + q3*gx)*halfT*T;
        q3 = q3 + (q0*gz + q1*gy - q2*gx)*halfT*T;

This ends our explanation of our motion tracking algorithm based on quaternion. Our hope is that this example can give you a better understanding of quaternion-based signal processing.

Here is one paper for using inertial sensor data: http://users.cecs.anu.edu.au/~Jonghyuk.Kim/pdf/2008_Euston_iros_v1.04.pdf

Sunday, January 4, 2015

Matlab Script for Rodrigues' rotation formula

Rotation matrix is a 3x3 unitary matrix which rotates one 3D vector to another. Assuming two unit 3D vectors k and v and their angle \theta, the rotation matrix is calculated by using Rodrigues' formula (refer to wiki for more details http://en.wikipedia.org/wiki/Rodrigues%27_rotation_formula):

\mathbf{v}_\mathrm{rot} = \mathbf{v} \cos\theta + (\mathbf{k} \times \mathbf{v})\sin\theta + \mathbf{k} (\mathbf{k} \cdot \mathbf{v}) (1 - \cos\theta)~.

Here is an example of Matlab script of how to calculate the matrix:

% This script is used to calculate rotation matrix based on Rodrigues formula

clear all; close all;

% acc measurement

in0 = [-1.213062233 0.472455803 9.666190333];

in1 = [0 0 1];

% Reference

in0Norm = in0 ./ sqrt(in0*in0.');

in1Norm = in1 ./ sqrt(in1*in1.');

axisRaw = cross(in0Norm, in1Norm);

axis = axisRaw./sqrt(axisRaw*axisRaw.');

angle = acos(dot(in0Norm, in1Norm));

M = [axis(1)^2+(1-axis(1)^2)*cos(angle) axis(1)*axis(2)*(1-cos(angle))-axis(3)*sin(angle) axis(1)*axis(3)*(1-cos(angle))+axis(2)*sin(angle);

axis(1)*axis(2)*(1-cos(angle))+axis(3)*sin(angle) axis(2)^2+(1-axis(2)^2)*cos(angle) axis(2)*axis(3)*(1-cos(angle))-axis(1)*sin(angle);

axis(1)*axis(3)*(1-cos(angle))-axis(2)*sin(angle) axis(2)*axis(3)*(1-cos(angle))+axis(1)*sin(angle) axis(3)^2+(1-axis(3)^2)*cos(angle)];

In this script, in0 and in1 are two 3D vectors. One example is that in0 and in1 are readings of accelerometer for a smart phone at different orientations. The output, matrix M, is the rotation matrix.

[Added on 07/30/2016] Rodrigues' rotation formula can be also understand in this way. The rotation matrix R is:
R = I + sin(angle)*K + (1-cos(angle))*K*K
where K is the matrix for cross product as K = [0 -axis(3) axis(2); axis(3) 0 -axis(1); -axis(2) axis(1) 0]. How to interpret this formula? For the input vector of in0, K*K*in0 becomes -in0, and K*in0 is a vector perpendicular to in0. Thus, R*in0 become sin(angle)*in0_perpendicular+cos(angle)*in0_parallel.

Saturday, January 3, 2015

Saving 3D file 10x faster

PCL library tutorial provides a framework to grab .pcd file: http://pointclouds.org/documentation/tutorials/dinast_grabber.php#dinast-grabber. This functionality is really useful since users can save the camera capture for later processing. However, one drawback of the exist scheme is that the duration for capturing is too long. Usually it takes a few seconds to save one .pcd file for a 640x480 image resolution. For the scenario while multiple captures are needed, the long delay makes the capturing difficult. Since .pcd file is like text file, a common way to reduce the capture time is to save the data in binary format instead of in text format. After switching the file format from .pcd to binary, we observe a roughly 10x acceleration of capture time. Instead of one capture in a few seconds, it changes to several captures in one second. We believe that this technique may also be useful to other people. Thus we want to share the trick here. The main function for binary file generation is like this:
void saveCloudToBin (const std::string &file_name, const pcl::PointCloud<pcl::PointXYZRGBA> &cloud, const int cloud_size)
{
   std::ofstream myfile(file_name, std::ios_base::binary);
   if (myfile.is_open())
   {
    for (int i = 0; i < cloud_size; i++)
    {
     myfile.write((char *)&cloud.points[i].x, sizeof(float));
     myfile.write((char *)&cloud.points[i].y, sizeof(float));
     myfile.write((char *)&cloud.points[i].z, sizeof(float));
     myfile.write((char *)&cloud.points[i].rgb , sizeof(int));
    }
    myfile.close();
   }
   else
   {
    std::cout << "Unable to open " << file_name << std::endl;
   }
}
The whole program, openni_grabber_fast.cpp, is available here: https://drive.google.com/file/d/0B05MVyfGr8lmZ2xfUjRIV1F0dWc/view?usp=sharing
An associated make file can be found at: https://drive.google.com/file/d/0B05MVyfGr8lmSFUwdnJKX3U3dW8/view?usp=sharing

After running openni_grabber_fast.exe, binary files will be generated with the names of test_pcd0.bin, test_pcd1.bin, etc. In order to convert from binary files to .pcd files, we wrote another program named binary2pcd.cpp available at https://drive.google.com/file/d/0B05MVyfGr8lmanhIRWduem83eW8/view?usp=sharing. Putting binary2pcd.exe at the same folder of binary files and running it should generate corresponding .pcd files as well as 2D image files.

The procedure discussed here is based on Windows platform. But with source codes and make file available, extending this to other platforms should be straightforward.

Getting x,y,z from 2D image+depth

Kinect-like devices sometimes give us information such as images and corresponding depth in each pixel of the corresponding image. It is our task to generate x, y, z values based on them. Generating z is straightforward since it is equivalent to depth. However, generating x and y needs some extra work. A file in PCL library (https://github.com/PointCloudLibrary/pcl/blob/master/tools/png2pcd.cpp) gives us hints of how to obtain x and y. According to the library code, the calculation formulas are as follows:
width = depth * std::tan (h_viewing_angle * PI / 180 / 2) * 2;
height = depth * std::tan (v_viewing_angle * PI / 180 / 2) * 2;
x = (image_x - image_width / 2.0) / image_width * width;
y = (image_y - image_height / 2.0) / image_height * height;
z = depth;

From these formulas, h_viewing_angle and v_viewing_angle are inherent parameters associated with a camera, and they become larger when the camera has bigger viewing angles in horizontal and vertical directions, respectively. The angles are constant for all images taken by the same camera. If we know these two angles, then x and y can be derived solely from pixel location and depth. For a 640x480 image with a pixel at (10,20) location,
x=(10-320)/640*width
y=(20-240)/480*height

We can simplify the equations further by letting:
widthCoef = tan (h_viewing_angle * PI / 180 / 2) * 2
heightCoef = tan (v_viewing_angle * PI / 180 / 2) * 2;

Then the equations become:
x = (image_x - image_width / 2.0) / image_width * depth*widthCoef;
y = (image_y - image_height / 2.0) / image_height * depth*heightCoef;
z = depth;

For an Asus XtionPro live camera owned by me, widthCoef = 1.21905 (62.7 degree) and heightCoef = 0.914286 (49.1 degree). One benefit of knowing widthCoef and heightCoef is that even image_x and image_y become fractional due to interpolation, we can still calculate x and y values reliably.

One easy way to calculate these coefficients is that if we have .pcd generated by PCL library as well as its 2d image, then extracting x, y, z for single pixel with known location in the image tells us these coefficients. After that , these coefficients can be used repeatedly.

Nan Zhang's Blog