Nan Zhang's Blog: July 2018

Kaggle.com is a website famous for hosting machine learning competitions. Recently I participated one competition named plant seedings classification (https://www.kaggle.com/c/plant-seedlings-classification/) and would like to share what I have learned here. The goal of this competition is for image classification. For training set, the host provides over four thousand images including 12 different plant species. The goal is to categorize the test image into one of these 12 species. This problem is a typical computer vision challenge which can be solved by deep learning. The neural network used is convolutional neural network (CNN). After CNN network trained by training images, it will be applied to test images for classification.

In order to achieve good performance, I tried various approaches and found the following two tips to be particularly useful:
1. Start from pre-trained network, and retrain it using the new data.
2. Ensemble by combining results of multiple models.

Before diving into more details, let me list the results of some approaches that I have tried.

Tip 1 is not my first choice. In fact, I started from designing a 10-layer network (conv/conv/pool/conv/conv/pool/conv/conv/pool/dense) by myself (row 'My own CNN network' in the table with 0.95717 accuracy). But then I found that if building our classifier based on some well-designed deep networks like VGG, Xception and Densenet, better accuracy can be achieved. Explaining what is VGG or Densenet is not the scope of this article. However, it is fair to say that these networks have more layers than the network I designed and also some of them such as densenet or InceptionV3 apply more exotic topology. Training these networks from scratch can be challenging, but we can start from pre-trained weight and after that it is much easier to train. An example of Keras code is shown below for building a model based on pre-trained Xception network. The base model is initialized from a Xception network with pre-trained weight based on Imagenet but not including top part. Then a pooling layer and a dense layer are added upon the base model. The output of the dense layer predicts the likelihood of which plant the image belongs to. This example can be easily extended to other network such as VGG. Weight setting corresponding to the best result are saved and used for testing. Using the pre-trained network boosts accuracy by at least two percents compared with self-designed networks in the first row. The best accuracy of single network is achieved by densenet201 and densenet169 as 0.98236.

base_model = Xception(weights='imagenet', input_shape=(img_size, img_size, 3), include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dropout(0.5)(x)
x = Dense(1024, activation='relu')(x)
x = Dropout(0.5)(x)
predictions = Dense(12, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)

model.compile(optimizer='Adadelta',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

model.fit_generator(datagen.flow(train_img, train_label, batch_size=batch_size), steps_per_epoch=len(train_img)//batch_size, epochs=100, verbose=1)
model.save_weights('Xception.h5')

To further improve performance, we need tip 2. If you ask machine learning practitioners the best way to improve performance, many of them probably tell you "ensemble". Ensemble method is to combine together the prediction results of multiple models. Since the prediction results generated by these models are more or less independent, combining them together reduce the noise and enhance the results. However, the non-trivial thing about ensemble is that combining more models does not mean better results. For example, in our experiments Densenet201 + Densenet169 + Densenet121 + InceptV3 + Xcep + InceptRosV2 + VGG16 + VGG19 has the most combining and not the best accuracy. Without straightforward rule, finding the best ensemble model takes lots of trials. Another problem of ensemble approach is how much weight should be assigned to each base model. Not all base models are equal; some base models behave better than others. Most likely base model with better performance should be assigned larger weight. In communication, there are schemes such as Maximum-Ratio Combining (MRC), which combines multiple models and determines combining weight using information of signal and noise level. But in deep learning, it is not easy to obtain metrics such as signal and noise level. Thus in my work, I assign each weight to each base model. But please put in your mind that other ways of weight assignment in deep learning do exist. After many trials, eventually we found that Densenet201 + Densenet169 + InceptV3 + Xcep +InceptRosV2 gives the best performance (0.98992). I should say that by ensemble method alone, this ensemble model gives the accuracy of 0.9874. But after analyzing multiple submissions and correlated with their results, we are able to correct two entries which boosts the final accuracy to 0.98992. So we game the system a little bit but not too much. This score of 0.98992 ranks top 3% accuracy. The top player of leader board has accuracy of 1.0!

In summary, participating the competition gives lots of fun. Although we participated after the competition closed and our score does not count, working on this problem still teach us quite a few things. We also learned from many discussions and kernels in Kaggle website. Here we want to pay back by sharing our experience and hopefully it can benefit our audience.

[Update on 11/16/2018] Code examples have been added to https://github.com/legendzhangn/blog/tree/master/plant_seedlings

Nan Zhang's Blog

Sunday, July 22, 2018

My First Kaggle Competition -- Plant Seedlings Classification

About Me

Blog Archive