Saturday, January 3, 2015

Getting x,y,z from 2D image+depth

Kinect-like devices sometimes give us information such as images and corresponding depth in each pixel of the corresponding image. It is our task to generate x, y, z values based on them. Generating z is straightforward since it is equivalent to depth. However, generating x and y needs some extra work. A file in PCL library (https://github.com/PointCloudLibrary/pcl/blob/master/tools/png2pcd.cpp) gives us hints of how to obtain x and y. According to the library code, the calculation formulas are as follows:
width = depth * std::tan (h_viewing_angle * PI / 180 / 2) * 2;
height = depth * std::tan (v_viewing_angle * PI / 180 / 2) * 2;
x = (image_x - image_width / 2.0) / image_width * width;
y = (image_y - image_height / 2.0) / image_height * height;
z = depth;



From these formulas, h_viewing_angle and v_viewing_angle are inherent parameters associated with a camera, and they become larger when the camera has bigger viewing angles in horizontal and vertical directions, respectively. The angles are constant for all images taken by the same camera. If we know these two angles, then x and y can be derived solely from pixel location and depth. For a 640x480 image with a pixel at (10,20) location,
x=(10-320)/640*width
y=(20-240)/480*height



We can simplify the equations further by letting:
widthCoef = tan (h_viewing_angle * PI / 180 / 2) * 2
heightCoef = tan (v_viewing_angle * PI / 180 / 2) * 2;


Then the equations become:
x = (image_x - image_width / 2.0) / image_width * depth*widthCoef;
y = (image_y - image_height / 2.0) / image_height * depth*heightCoef;
z = depth;


For an Asus XtionPro live camera owned by me, widthCoef = 1.21905 (62.7 degree) and heightCoef = 0.914286 (49.1 degree). One benefit of knowing widthCoef and heightCoef is that even image_x and image_y become fractional due to interpolation, we can still calculate x and y values reliably.

One easy way to calculate these coefficients is that if we have .pcd generated by PCL library as well as its 2d image, then extracting x, y, z for single pixel with known location in the image tells us these coefficients. After that , these coefficients can be used repeatedly.

No comments:

Post a Comment