Recap of “the ImageNet Classification with Deep Convolutional Neural Networks” article

Introduction

4 min readSep 16, 2020

Today the power of machine learning applied to pattern recognition is known. But this was not possible just a decade ago. The surprising evolution of the processing capacity of a neural network is immensely gigantic, so much so that in short periods of time such as 1 year or two, these changes are evident.

This is demonstrated in the article, the name of which I described in the title. Where they show the training process of a convolutional neural network and that using a technique achieves better results. All this within a competition that seeks to explore the fastest and most accurate Machine Learning processes.

Procedures

In the competition, there is a dataset of 15 million high-quality images of which a subset of 1000 images is used in each of the 100 categories. The team for their own reasons scaled to a lower resolution so that the model developed by them was able to process the images.

They created a Convolutional Neural Network with this architecture

This architecture is important in the work since it is inspired by the way in which the cerebral cortex processes images more quickly. Basically it scans an image with a filter so that it obtains the relevant aspects of it, regardless of the position or angle at which it is located.

They also base their project on previous research with the activation functions. So they use the ReLU since it proves to be very fast when it comes to obtaining an error rate of a certain value compared to other activation functions, such as, for example, the tanh.

Like any training algorithm, the higher the processing power, in terms of hardware the greater the speed at which it delivers the results, so accuracy can be improved by adding more layers without this meaning that you have to wait weeks for a result.

But in this case, the researchers made use of 2GPUs due to the size of the data set, in addition to relying on other techniques to improve performance. One of these is Local Response Normalization where basically a lateral inhibition is done and where -creating competition for big activities amongst neuron outputs computed using different kernels”.

They also face one of the biggest problems in neural networks that tend to biased results, which is overfitting, which is solved by adding more data to training. But how do you do this if you don’t have the time or the capacity? Simple, an image is used and it is reflected, rotated in different degrees and other perspectives that can be imagined and achieved with simple movements of the image. In this way, they were able to multiply their data set by a factor of 2048.

Additionally, they used the dropout strategy, where the probability of something is totally denied when it is below the 0.5 thresholds. This is similar to the way you define a 0 in a transistor if the current is less than 0.7 v the transistor is “off” even though there is a very small current flow.

Transistor real signal vs digital signal

They then trained the model with the stochastic gradient descent technique. Where a network learns from the output and adjusts the weights of each of the neurons in order to get closer to the result.

Results

They obtained two types of results, which they classified as Top 5 and Top 1. In addition to comparing the performance of CNN with respect to other techniques. Showing in this way the comparative advantages, where the CNN obtains better precision. Where the best was the Top 5 with CNN getting an error rate of about 17%.

Conclusion

They are amazed at the trainability that a large and deep convolutional neural network is capable of achieving record results on a highly challenging data set. And if observing in the long run is not necessarily that deep and still spectacular results are achieved.

Personal Notes

It is interesting to observe the behaviour of CNNs as these networks show great results and it is all more about architecture than using a lot of hardware processing. In this case, 2 GPUs were necessary so they were in full competition, but in routine training, this type of network shows very good results without the need to resort to several GPUs.

Thanks for reading!

Reference

[1] Krizhevsky A, Sutskever I, Hinton G. ImageNet Classification with Deep Convolutional Neural Networks, University of Toronto