I just finished the fourth course of the deeplearning series, and it was immensely enjoyable. I have to admit with the advent of Hinton’s capsule networks the motivation to start this set on Convolutional Neural Networks was a little harder than the previous three. Hinton and other bloggers have already outlined shortcomings of CNNs and the thought at the back of my mind was whether it was worth it to spend the time learning something that may become obsolete.
Nevertheless it was worth it, and as I learned from attending NIPS 2017 last week and through a Deep Learning Study Group, there is still much work to be done with capsule networks. More on that later.
As per the previous three deeplearning.ai courses, this course had the following characteristics:
- Clear and concise
- Andrew Ng went over the concepts meticulously
- The exercises were clearly documented and easy to follow
- Grading went without a hitch except for one instance (see Caveats below)
- Touched on a wide range of concepts and use cases
- CNNs, Residual and Inception networks
- Object detection, neural style transfer, and face recognition
- Afforded lots of opportunities for further study
- Many ways to practice Keras and Tensorflow
- You could easily complete the exercises and leave it at that, but then you would be shortchanging yourself. There are so many other avenues to explore based on the code written that you owe yourself to put some additional effort and discover something new.
- There are some caveats
- As in previous courses, the exercises were structured well and documented extensively. They were however, much more challenging that those in the previous three courses.
- Grading for the last exercise of the fourth week dealing with triplet loss is still flawed as of this writing. There is a workaround if you look in the discussion forums.
- There is flaw when trying to use the model in the Week 2 programming assignment to perform your own predictions.
The course encompasses four weeks, each culminating in a quiz and one or two programming assignments. An understanding of Python, Keras, and Tensorflow would be helpful, but is not necessary as there are tutorials in the previous courses. Following is a synopsis of each week’s content.
Andrew Ng first starts with the motivation for CNNs and what makes them powerful. The initial lectures on edge detection take a step by step approach to showing how different filters perform different functions by detecting different basic patterns. In later lectures, you’ll see how filters in deeper layers detect more complex shapes (Week 4, What are deep ConvNets learning).
The middle set of lectures deal with the technical guts of CNNs, namely filter size, stride, padding, and pooling, culminating in an example of a single layer CNN. He finally ends with a lecture on the advantages of CNN over fully connected networks, namely:
- Fewer parameters to train
- Parameter sharing – same filters can be used in different areas of the image
- Sparsity of connections – cells in resultant layers are dependant only on a small subset of the previous layer, hence less prone to overfitting
This weeks’ lectures deal with looking at case studies and different architectures. They delve into the history of CNNs and their evolution, which helps with seeing how different architectures influence results and ultimately gives you better intuition to build better CNNs for yourself
Residual networks help train deeper networks and help with the vanishing or exploding gradient problem by having the output of layer skip the next layer (skip connections) and feed into the one after that. This helps with performance and stability of the parameters during training. One advantage of the skip connection is that it learns the identity function better than a NN without a skip connection and hence doesn’t hurt performance as much.
The pooling layer is useful for reducing the height and width dimensions of CNNs, whereas the 1×1 convolution (aka Network in Network) are useful for reducing the channel dimension of CNNs. This becomes important when talking about the Inception network.
The Inception network combine results for all the different filter sizes. This results in a layer with a very large channel size, and hence the 1×1 convolution comes into play here to reduce the channel dimension.
This was probably the most interesting section because it dealt with object detection. The main concepts here are:
- Sliding windows of various sizes to determine if an object is detected or not.
- Bounding boxes determined by the network to outline an object.
- Intersection over Union (IOU) calculations and non-max suppression to determine the best anchor box and ways to de-duplicate boxes that detect the same object
- Anchor boxes – boxes of various dimensions that are related to the objects you want to detect.
- YOLO Algorithm – You only look once algorithm that quickly performs object detection. I confirmed with a Waymo engineer at the NIPS 2017 conference that the company has more advanced algorithms than YOLO, but all other concepts are still relevant.
Face verification vs face recognition
Face recognition problems have the issue of not having enough data to train for a traditional CNN. In addition, you have what is called a one-shot learning problem, which is the problem of performing recognition based on one single image. So how is this problem solved?
For Face recognition, you train two networks (a siamese network, which are two CNNs with the same parameters) to encode faces into a n-vector. Then use a difference function d(img1, img2) that will say yes or no based on the similarity of the two encodings. How is this network trained, and what is the objective function?
The objective function used to train a siamese network is called a triplet loss, which utilizes an anchor image, a positive (similar) image, and a negative (dissimilar) image
Neural Style Transfer
- Calculate similarity scores between activations in a certain layer as a way to keep track of style. You can also extend this to activations between layers.
- Apply similarity scores to generated image to transfer style.
- Instead of updating weights, it updates the pixels
- Each iteration generates a better representation of the style over the new content than previous generations
Deeplearning.ai’s course on CNNs is a good overview of the concepts and use cases around the Convolutional Neural Network. The explanations were clear, concise, and except for a grading hiccup in one of the programming exercises, the quizzes and assignments definitely helped with reinforcing the ideas in the lessons. I’m definitely looking forward to taking the fifth installment of the series – Sequence Models – which is starting Dec 18th.