Image augmentation is an important technique used in image-related deep learning to reduce overfitting. Neural networks used in deep learning often have millions of parameters and they are good at memorizing training data. That means the converged network model is good at training data but bad at other data such as validation or test data. We call this phenomenon "overfitting" since the model fits too well to the training data. There are multiple ways to alleviate overfitting issue. One of them is image augmentation. Image augmentation means that instead of feeding training data only, we also feed randomly tilted, shifted, flipped or scaled training images to the model. Randomized images make it harder for the neural network to memorize the training data. Therefore, it may reduce the gap between training accuracy and validation accuracy. Ultimately, data augmentation technique could improve the validation accuracy.
Some deep learning library like Keras provides good built-in function for image augmentation called ImageDataGenerator. This tool is fairly easy to use and fits well with Keras framework. However, if other deep learning library like PyTorch is used, ImageDataGenerator is not available and users need to develop its own image augmentation code. In this article, we will demonstrate how to do that. In particular, we will show an example of how to augment images for the task of image segmentation and this task involves a pair of images: image and mask. Similar method can also be used to augment single image.
The baseline of augmentation code comes from EKami's work related to Kaggle carvana challenge (https://github.com/EKami/carvana-challenge/blob/original_unet/src/img/augmentation.py). This code is an excellent example of how image augmentation should be done. On top of it, we made the following improvements:
1. The original function of augment_img(img, mask) seems not to always change image and mask at the same time. Digging deeper, random_shift_scale_rotate() function are called twice, one for image and another for mask. However, np.random.random() used in that function can have different values for image and for mask. That seems to explain why image and mask are not always modified simultaneously. To address this issue, we create a new function named augment_img_reshape(). In this new function, a common random number is always shared between image and mask conversion. That guarantees that both images are modified simultaneously.
2. To favor deep learning, the input image are often modified to channel-first format as (im_chan, im_width, im_height) for image and (1, im_width, im_height) for mask. This format does not match with the format required by the baseline code. Instead, we add format conversion support to augment_img_reshape().
To show an example, running image_augmentation.py stored in here generated both original image/mask and image/mask after augmentation. One example is shown below. Note that since it is a random event, sometime the images before and after augmentation are identical.