Saturday, April 4, 2020

Low Light Image Combining Using Python (2) -- Running Faster

In a previous blog, a Python script to combine several low light raw images for improved SNR has been introduced. But that script runs rather slow. To combine four images together, it takes 155.8s in my PC which uses Intel i5-8400 @ 2.8GHz x 6 core with 16GB memory. How to run it faster? After some attempts, we are able to cut the time to roughly 1/8 of the baseline mainly with two tricks: 1) using decimated image for alignment; 2) processing in multi-core. The new processing times are:

Baseline: 155.8s
After using decimated image for alignment: 40.6s
After using decimated image for alignment + processing in multi-core: 21.1s

Now let me show you how to do it:

Using decimated image for alignment

Image alignment is the most computationally intensive part of the image combining pipeline. In fact, the time spent on image alignment is more than 90% of the total. The operation of image alignment is shown below. During image alignment, the candidate block is swiped though the target block in both horizontal and vertical directions. Assume block size is B, search range in both horizontal and vertical are S, the complexity of image alignment is proportional to B*S*S. For a 4k x 3k image and search range equal to 40 ([-40,40] with each even position), the complexity for matching each pair is 19.2G operations, which is enormous.

























One observation is that due to the structure of Bayer pattern, we only calculate candidate of even number of pixel shift such as 0/2/4 etc. Then a natural thought is to decimate both candidate and target by 2. This will bring ~4x acceleration. Instead of B*S*S complexity, it is now B*S*S/4. This explains why processing time is reduced from 155.8s to 40.6s after decimation. Performance wise, this decimation means that instead of using pixels of all colors, only one out of four pixels with green color will be used for alignment. However, the quality of combining seems to hold, which indicate that there are enough pixels left to guarantee the quality of image alignment. The Python code change is below, ":2" is the delta:

Old code:


candidate = rgb_raw_image_candi[f,row_start+row_offset+boundary_adjust[row, col, 0]:row_end+row_offset+boundary_adjust[row, col, 1],
col_start+col_offset+boundary_adjust[row, col, 2]:col_end+col_offset+boundary_adjust[row, col, 3]]
target = rgb_raw_image[row_start+boundary_adjust[row, col, 0]:row_end+boundary_adjust[row, col, 1],
col_start+boundary_adjust[row, col, 2]:col_end+boundary_adjust[row, col, 3]]

New code:

candidate = rgb_raw_image_candi[row_start+row_offset+boundary_adjust[row, col, 0]:row_end+row_offset+boundary_adjust[row, col, 1]:2,
col_start+col_offset+boundary_adjust[row, col, 2]:col_end+col_offset+boundary_adjust[row, col, 3]:2]
target = rgb_raw_image[row_start+boundary_adjust[row, col, 0]:row_end+boundary_adjust[row, col, 1]:2,
col_start+boundary_adjust[row, col, 2]:col_end+boundary_adjust[row, col, 3]:2]

Multi-core processing

Our baseline script use single thread to process. By distributing tasks to multiple threads, we expect the running time to be shorter. Introduction of Python-based parallel processing can be found here. Our task is to align three images with the base image. Therefore, it can be divided into three sub-tasks which are to align each image with the base. In this way, these three sub-tasks can be run independently which brings the best parallel processing gain. Since the sub-tasks are independent, we use the most basic parallel processing module of pooling. What pooling module does is to assign each sub-task to a thread. image_align is the function for sub-task execution and image_input is a list with each element to be input images for a sub-task.

Pooling code:

with Pool(6) as p:
    image_output = p.map(image_align, image_input)

Due to the overhead of parallel processing, the processing time is not cut to 1/3 of single thread. However, parallel processing does reduce the running time by ~20s (40.6s -> 21.1s). In general, when each sub-task is more computationally heavier, there is more time saving by parallel processing.

With both alignment with decimated image and multi-core processing, the final processing time is reduced to 21.1s from 155.8s. My code can be found here