Tuesday, July 12, 2011

Good Features of iOS 5 for Augmented Reality App. Programming

In iOS 5 introduced in WWDC2011, there are two new things that can help Augmented Reality App. development.

1. A Direct Path for Video Data to OpenGL 

There is a video background in most augmented reality applications in general. One of method to render the video background is using OpenGL texture, using glTexSubImage2D, and it has been used in many AR App. However, calling glTexSubImage2D causes delay and the overall OpenGL performance decreases.

Apple's new CVOpenGLESTextureCache let developers directly copy video data from memory to texture and it clearly improves the performance as shown in the demo (See the WWDC video of the session 414, Advances in OpenGL ES for iOS 5).

2. GLKit 

OpenGL ES programming on iOS has been handy because managing thing required for OpenGL rendering, such as timers, display links, drawing functions.. etc., were not that much convenient. In addition, EAGLView was not really integrated to CocoaTouch.

The GLKit in iOS 5 helps developers manage those things and also provides other functionalities: math functions for simple linear algebra, OpenGL texture management, and special effect with shaders. Though GLKit, developers can make their Apps more shiny and more efficiently run.

Monday, June 13, 2011

ARM NEON Basic Tutorials

NEON is an SIMD instruction set on ARM core CPUs. It helps to improve performance of an App. that does computationally complex jobs, such as image processing and 3D graphics.

I looked for tutorials for beginners, but there are not much. I found the following 4 posts. They explain basics of NEON step-by-step.

Coding for NEON - Part 1: Load and Stores
Coding for NEON - Part 2: Dealing With Leftovers
Coding for NEON - Part 3: Matrix Multiplication
Coding for NEON - Part 4: Shifting Left and Right

Sunday, March 13, 2011

Kinect Color - Detph Camera Calibration

Kinect has two cameras, one for capturing a color image and the other for capturing an IR image. Although real-time depth information is provided by IR camera, the depth map tells how far the IR camera's pixels are and we actually do not know the depth information of the color image because the two cameras have different characteristics. As we can see in the image below, the pixels do not match in the two images.  The locations of the hand and arm are completely different in the two images. 

If we use the Kinect device for HCI, it does not matter much because using depth information is enough in most cases. However, if we'd like to use it for 3D scene capture or to want to relate the RGB and depth images, we need to match the color image's pixels to the depth image's. Thus, we need to perform calibration. 

Kinect camera calibration is not different from the general camera calibration. We just need to capture images of a chessboard pattern from IR and RGB cameras. We need to capture several images of a chessboard pattern. When capturing images from the IR camera, we need to block the emitter with something for good corner detection in chessboard images. If not, the captured images will look like below and corner detection will fail.

If the lightings in your environment does not have enough IR rays, you need a light source that emits IR rays (Halogen lamp ?). It might be good to capture the same scenes with two cameras. The below images show two images captured from the IR and RGB cameras, respectively. 

Once images are taken, then we can perform calibration w.r.t each camera by using OpenCV API, MATLAB calibration toolbox, or GML calibration toolbox. After calibration, the intrinsic camera matrices, K_ir and K_rgb, and distortion parameters of the two cameras are obtained. 

To achieve our goal, we need one more information, the geometrical relationship between the two cameras expressed as a rotation matrix R and a translation vector t. To compute them, capture the same scene containing the chessboard pattern with the two cameras and compute extrinsic parameters. From two extrinsic parameters, the relative transformation can be computed easily. 

Now, we can compute the depth of the color image from the depth image provided by the IR camera. Let's consider a pixel p_ir in the IR image. The 3D point P_ir corresponding to the p_ir can be computed by back-projecting p_ir in the IR camera's coordinate system.

P_ir = inv(K) * p_ir 

P_ir can be transformed to the RGB camera's coordinate system through relative transformation R and t.

P_rgb = R * P_ir + t

Then, we project P_rgb onto the RGB camera image and we obtain a 2D point p_rgb.

p_rgb = K_rgb * P_rgb

Finally, the depth value corresponding to the location p_rgb in RGB image is P_rgb's Z axis value.

depth of p_rgb =  Z axis value of P_rgb 

P_ir : 3D point in the IR camera's coordinate system
R, t : Relative transformation between two cameras
P_rgb : 3D point in the RGB camera's coordinate system
p_rgb : The projection of P_rgb onto the RGB image

In the above, conversion to homogeneous coordinates are omitted. When two or more 3D points are projected to the same 2D location in the RGB image, the closest one is chosen. We can also compute the color values of the depth map pixels in the same way. p_ir's color  corresponds to the color of p_rgb.

Here is the resulting depth image of the RGB camera. Since the RGB camera sees wider region than the IR camera, not all pixels' depth information are available.

If we ovary the RGB image and the computed depth image, we can see that the two match well, while they do not before calibration as shown at the beginning of this post.

Here is a demo video showing the depth map of the RGB image and the color map of the depth image.

Wednesday, February 23, 2011

Kinect Demo App by Nicolas Burrus

There is a Kinect demo application, which contains several applications, written by Nicolas Burrus. (Kinect RGB Demo). There are binary build for Mac OS and Windows.

I downloaded the demo and ran it on my MacBook Pro and it runs well except some buggy behavior of the demo app. I think it is useful for simple examination with Kinect without writing codes to do the same job.

Using Kinect on Mac OS X

Recently, Kinect sensor receives focuses from Computer Vision researchers because its real-time depth retrieval capability. Although MS has not released the driver for Kinect, many developers hacked the device and there are some useful libraries that allows us to use Kinect on a PC.

To use Kinect devices on a Mac, all we need to do is installing some libraries, libFreenect and libusb-1.0.
libFreenect provides APIs to access Kinect's functionalities. libusb-1.0-devel is a modified USB camera driver. If you have installed OpenCV via macports, you may have libusb-1.0. However, what we need for the kinect camera is the 'libusb-1.0-devel' version, which conflicts with the old libusb-1.0. If you try to install libusb-1.0-devel via macports, it will fail because of confliction. You can build a proper one from the source code.

Once both libraries are built and installed in the system, you can run a sample program in libFreenect library. I made an XCODE project by replacing the freeglut library with the Apple's GLUT.framework and tested the example on my MacBook Pro 15' (Core2Duo 2.3Ghz, Geforce 9600M GT).

It runs very well in 30 Fps mode !!

Monday, February 14, 2011

OpenCL Driver for ATI Radeon is very bad on Mac OS X

I tested a simple kernel for image copy on both my iMac (ATI Radeon HD 5750) and MacBook Pro (Geforce 9600m GT). The MacBook Pro produces a right results, while iMac produces a weird results with some garbage data. So, This may be a bug in OpenCL driver for ATI Radeon on Mac OS X.

__kernel void image_copy_simple
__global unsigned char *src, 
__global unsigned char *dst,
const int pitch,
const int chans
  const uint gid_x = get_global_id(0) ;
  const uint gid_y = get_global_id(1) ;
  uint index = gid_x *chans + gid_y * pitch ;

  if(chans == 1) 
    dst[index] = src[index] ;
  else if(chans == 2) {
    dst[index] = src[index] ;
    index++ ;
    dst[index] = src[index] ;
  else if(chans == 3) {
    dst[index] = src[index] ;
    index++ ;
    dst[index] = src[index] ;
    index++ ;
    dst[index] = src[index] ;
  else if(chans == 4) {
    dst[index] = src[index] ;
    index++ ;
    dst[index] = src[index] ;
    index++ ;
    dst[index] = src[index] ;
    index++ ;
    dst[index] = src[index] ;

Since Apple does not update its graphics driver frequently, this bug may be exist until Lion (OS X 10.7) comes (?). A workaround for this problem is changing the type of 'dst' from 'unsigned char' to 'float' .

Another problem of the OpenCL driver on Mac OS X is it doesn't support the image object. What's funny is that Apple's OpenCL documentation explicitly mentions the usage of image object. But when I tried to use it on the machine with ATI hardware, it fails. Querying CL_DEVICE_IMAGE_SUPPORT via clGetDeviceInfo function gives me 'NO'.  By the way, NVIDIA driver supports the image object well.

Tuesday, February 8, 2011

Running Dominant Orientation Templates on MAC OS X

Update: Recently, ESM library is released for Mac OS X platform. But I didn't apply it to my XCODE project yet. You can find it on the ESM tracking library web page.

"Dominant Orientation Template" (DOT) proposed by Stefan Hinterstoisser et al.  is a very robust approach to detect object in an image. One of interesting part of DOT is that it does not use any feature point detections, which are commonly used for object detection / tracking in computer vision.  The authors released the source code here .

The source code provides MS VS2008 solution and it works well if some dependencies are satisfied (OpenCV, Intel IPP, and ESM tracking library). Running DOT on Mac OS X is not difficult. What we have to do is installing dependencies and creating an XCODE project.
  • OpenCV : available in source code or can be installed via MacPorts
  • Intel IPP: On Mac OS X, IPP is included in Intel C++ Composer XE 2011 for Mac. You can download an evaluation version from Intel's website. Installation is straightforward and just follow the instructions. 
  • ESM tracking library: Actually, the ESM library is not really necessary for DOT but the authors use it for template verification and pose estimation of the patch in case of a 2D planar target.  Unfortunately, ESM library is not released for Mac OS X currently so that we need to remove ESM-related codes from the DOT source code later. 
Once dependencies are installed, we have to make an XCODE project. Open XCODE and create a command line project. Then add DOT source codes and library files of dependencies as shown in below.

I installed OpenCV 2.2 from MacPorts and added .dylib files. For IPP, I added static library builds (.a) instead of dynamic linking library files (.dylib).  Both of them are in /opt/intel/composerxe-2011.x.xxx/ipp/lib. Make sure that search paths for IPP headers and libraries are set in the XCODE project settings if XCODE cannot find them.

Refer the screenshot to see files required for DOT. The cv_camera class in DOT source code works well with the built-in iSight. But the image is captured as 1280x1024 by default. To make the size 640x480, I modified the cv_camera.cc file as follows.

bool cv_camera_usb::set_capture_data( int a_width, int a_height, double a_fps )
#ifdef WIN32

mp_cam = cvCaptureFromCAM(0);
    cvSetCaptureProperty( mp_cam, CV_CAP_PROP_FRAME_WIDTH, 640 );
    cvSetCaptureProperty( mp_cam, CV_CAP_PROP_FRAME_HEIGHT, 480 );

 if( mp_cam != NULL )
      m_run = true;
      m_run = false;

return m_run;


Another modification I made is removing all the parts related to ESM library and it is not that much difficult. Finally DOT runs well on my Mac.