Flaw in deeplearning.ai assignment for Course 4, Week 2 – Residual Networks

Screen Shot 2017-12-17 at 7.06.39 PM

What you’re seeing on top is the last section of the assignment on Residual Networks for the deeplearning.ai course for Convolutional Neural Networks, Week 2.  The assignment goes through the coding, training and testing of a residual network to recognize images of hand signs.


Screen Shot 2017-12-17 at 7.12.21 PM

The problem is, even after going through the exercise, the residual network has a hard time making predictions about what it sees.  For example, even using the image of the hand indicating “two”, the model will predict “zero”.

The problem lies with some preprocessing that’s done before the datapoint is used for prediction.  As you can see from the code above, there is a call to preprocess_input(x), which is a Keras function.  It’s function doesn’t seem to make any sense given the context of the problem.  You can check out the code for that function here.

The solution, is to divide by 255 ( x = x / 255 ) instead, which yields a range of probabilities, with the highest being the most likely candidate.  This yields the best results.  In the example above, the model is 99% sure the image denotes a “five”




Overview of the Convolution Neural Networks course from deeplearning.ai

Screen Shot 2017-12-03 at 9.01.23 AM

I just finished the fourth course of the deeplearning series, and it was immensely enjoyable.  I have to admit with the advent of Hinton’s capsule networks the motivation to start this set on Convolutional Neural Networks was a little harder than the previous three.  Hinton and other bloggers have already outlined shortcomings of CNNs and the thought at the back of my mind was whether it was worth it to spend the time learning something that may become obsolete.

Nevertheless it was worth it, and as I learned from attending NIPS 2017 last week and through a Deep Learning Study Group, there is still much work to be done with capsule networks. More on that later.

As per the previous three deeplearning.ai courses, this course had the following characteristics:

  • Clear and concise
    • Andrew Ng went over the concepts meticulously
    • The exercises were clearly documented and easy to follow
    • Grading went without a hitch except for one instance (see Caveats below)
  • Touched on a wide range of concepts and use cases
    • CNNs, Residual and Inception networks
    • Object detection, neural style transfer, and face recognition
  • Afforded lots of opportunities for further study
    • Many ways to practice Keras and Tensorflow
    • You could easily complete the exercises and leave it at that, but then you would be shortchanging yourself.  There are so many other avenues to explore based on the code written that you owe yourself to put some additional effort and discover something new.
  • There are some caveats
    • As in previous courses, the exercises were structured well and documented extensively.  They were however, much more challenging that those in the previous three courses.
    • Grading for the last exercise of the fourth week dealing with triplet loss is still flawed as of this writing.  There is a workaround if you look in the discussion forums.
    • There is flaw when trying to use the model in the Week 2 programming assignment to perform your own predictions.

The course encompasses four weeks, each culminating in a quiz and one or two programming assignments.  An understanding of Python, Keras, and Tensorflow would be helpful, but is not necessary as there are tutorials in the previous courses.  Following is a synopsis of each week’s content.

Week 1

Andrew Ng first starts with the motivation for CNNs and what makes them powerful. The initial lectures on edge detection take a step by step approach to showing how different filters perform different functions by detecting different basic patterns.  In later lectures, you’ll see how filters in deeper layers detect more complex shapes (Week 4, What are deep ConvNets learning).

The middle set of lectures deal with the technical guts of CNNs, namely filter size, stride, padding, and pooling, culminating in an example of a single layer CNN.  He finally ends with a lecture on the advantages of CNN over fully connected networks, namely:

    • Fewer parameters to train
    • Parameter sharing – same filters can be used in different areas of the image
    • Sparsity of connections – cells in resultant layers are dependant only on a small subset of the previous layer, hence less prone to overfitting

Screen Shot 2017-12-13 at 8.00.41 AM

Week 2

This weeks’ lectures deal with looking at case studies and different architectures. They delve into the history of CNNs and their evolution, which helps with seeing how different architectures influence results and ultimately gives you better intuition to build better CNNs for yourself

Residual networks help train deeper networks and help with the vanishing or exploding gradient problem by having the output of layer skip the next layer (skip connections) and feed into the one after that. This helps with performance and stability of the parameters during training.  One advantage of the skip connection is that it learns the identity function better than a NN without a skip connection and hence doesn’t hurt performance as much.

The pooling layer is useful for reducing the height and width dimensions of CNNs, whereas the 1×1 convolution (aka Network in Network) are useful for reducing the channel dimension of CNNs.  This becomes important when talking about the Inception network.

The Inception network combine results for all the different filter sizes.  This results in a layer with a very large channel size, and hence the 1×1 convolution comes into play here to reduce the channel dimension.

     Screen Shot 2017-12-13 at 7.58.42 AM

Week 3

This was probably the most interesting section because it dealt with object detection.  The main concepts here are:

    • Sliding windows of various sizes to determine if an object is detected or not.
    • Bounding boxes determined by the network to outline an object.
    • Intersection over Union (IOU) calculations and non-max suppression to determine the best anchor box and ways to de-duplicate boxes that detect the same object
    • Anchor boxes – boxes of various dimensions that are related to the objects you want to detect.
    • YOLO Algorithm – You only look once algorithm that quickly performs object detection. I confirmed with a Waymo engineer at the NIPS 2017 conference that the company has more advanced algorithms than YOLO, but all other concepts are still relevant.

Screen Shot 2017-12-13 at 7.57.08 AM

Week 4

Face verification vs face recognition

Face recognition problems have the issue of not having enough data to train for a traditional CNN.  In addition, you have what is called a one-shot learning problem, which is the problem of performing recognition based on one single image.  So how is this problem solved?

For Face recognition, you train two networks (a siamese network, which are two CNNs with the same parameters) to encode faces into a n-vector.  Then use a difference function d(img1, img2) that will say yes or no based on the similarity of the two encodings.  How is this network trained, and what is the objective function?

The objective function used to train a siamese network is called a triplet loss, which utilizes an anchor image, a positive (similar) image, and a negative (dissimilar) image

Screen Shot 2017-12-12 at 12.13.39 PM


Neural Style Transfer

  • Calculate similarity scores between activations in a certain layer as a way to keep track of style.  You can also extend this to activations between layers.
  • Apply similarity scores to generated image to transfer style.
  • Instead of updating weights, it updates the pixels
  • Each iteration generates a better representation of the style over the new content than previous generations

Deeplearning.ai’s course on CNNs is a good overview of the concepts and use cases around the Convolutional Neural Network.  The explanations were clear, concise, and except for a grading hiccup in one of the programming exercises, the quizzes and assignments definitely helped with reinforcing the ideas in the lessons.  I’m definitely looking forward to taking the fifth installment of the series – Sequence Models – which is starting Dec 18th.

Deep Learning – Playing with Neural Style using Torch and Tensorflow

Deep Learning is the hot topic in artificial intelligence circles right now and with the advent of the Go competition and other deep learning advancements, a lot of attention has focused on platforms that make deep learning accessible. Two of those platforms are Torch and Tensorflow. I spent a weekend trying them out and here are some preliminary thoughts.

My point of comparison was the Neural Style project implemented in both platforms by Justin Johnson and Anish Athalye. Neural Style is a deep learning implementation that tries to derive artistic styles from pictures and applies them to a candidate image. The result is a mashup of the original picture in the style (or styles) set at input time.

The first thing to notice is the complexity of the setup. Tensorflow is easier as there aren’t as many components and steps involved. With Torch, there are quite a few moving parts, and even though there are scripts that allow you to do one step installation, I can see how this can become problematic as libraries and models get updated.

For instance, with Tensorflow I could get the basic Neural Style command running after the installation, but with Torch, I encountered errors, related to missing libraries or incompatibilities. I eventually opted for a more complex command line that allowed me to bypass those issues.  More on this in a later post.

However, one thing that Torch shines in is execution time. The torch implementation of Neural Style ran orders of magnitude faster that Tensorflow. In less than an hour (using a Macbook Pro), it went through a thousand iterations whereas with Tensorflow it took more than two days.

Case and point are the following examples.  The first is a 100 iteration run on Tensorflow that took about half a day.  As you can see, you can discern the outlines of the candidate image in the result, but it’s far from done.

Original image:


Tensorflow after 100 iterations, which took about six hours.


With Torch on the other hand, I ran the image through several styles in less time than it took Tensorflow to render the previous example.

Same image using Torch after 1000 iterations, using the Munch stylescream_img_9619 the_scream



And of course, Van Gough

vg_img_9619 starry_night

Running Spark 1.3.1 examples on a CDH4 cluster

Spark 1.x for CDH4

Recently, I came across versions of Spark 1.x and higher on the Apache Spark site that have a distribution built for CDH4.  The downloaded file looks like spark-1.3.1-bin-cdh4.tgz

After verifying that the SparkPi example program worked locally (Mac running Yosemite) and on the cluster, I proceeded to do a build with some Spark code that emulates some of the Pig-based processes currently in production using the following directions.


The code was compiled as is with some modifications to the pom using Apache Maven 3.1.1.  The pom changes dealt with pointing the repo to Cloudera and specifying the correct hadoop version.  In my case, they were:


  <!-- If using CDH, also add Cloudera repo -->
      <id>Cloudera repository</id>

To build, you can do the traiditonal

mvn clean package -DskipTests

or if you make subsequent changes and just want to compile a specific project (examples, in this case):

mvn package –projects examples –also-make -DskipTests=true


Just copy the resulting spark-examples***.jar file to your distribution and make sure that *** matches with the spark version and hadoop versions in the filename, otherwise run-example will complain it can’t find the relevant libraries:

Error Msg: Failed to find Spark examples assembly

The cluster was also running java 1.6, and since I had compiled on my Mac with 1.7,  I had to import 1.7 and point the environment variables to it (JAVA_HOME, PATH) before execution.


Run time was was 8 minutes compared to Pig’s 12.  However, it should be noted that Spark was running standalone.  Looking forward to running this in cluster mode.

Using the Hadoop cluster to improve site health

The ECG Hadoop cluster is used mainly for SEO and search related features on the various global sites.  In the past few weeks, specifically since the launch of South Africa in late November, it has performed additional duty as a platform to identify site and workflow issues.

As issues (real or perceived) were identified, code was deployed and adhoc reports generated to provide visibility.  This involved logging where needed, data cleanup, processing, and report generation.  From initial issue identification to implementation and report generation from production, the turnaround ranged from one hour to around 3 days.

The rapid turnaround enabled the team to quickly respond to site and usability issues.

Identifying bad or missing redirect rules

The first issue occurred right after the launch and involved redirect rules.  These rules allowed users to navigate from the old site to the new site seamlessly.  Obviously not all urls were mapped correctly and gaps showed up as nonsense urls that either resulted in 404s or ZRPs (zero result pages).

Using the current logging and Hadoop processing infrastructure, especially around search queries, erroneously mapped search urls were identified and fixed.  Whereas users had previously seen either 404s or ZRPs, they now were redirected properly.

Tracking zero result pages

The report required only a quick Pig script to aggregate data already collected from the search logs.  From discovery to coding and report generation was about one hour.

Identifying hurdles to successful posts

The second issue revolved around our user’s ability to successfully post ads.  There was no visibility into what hurdles our users were encountering and therefore logging was initiated during the posting phase to identify errors encountered by the users which may hinder/discourage their ability to finish posting an ad.  Some errors involved rules which were easily removed or relaxed and others involved clarifying posting requirements.

Identifying hurdles to successful posts

The above graph helped analyze how successfully our users have been able to post ads and what hurdles stood in their way.  A more detailed graph not shown identified specific errors users encountered that may have discouraged them from completing the post.

Data skew

The third issue surfaced after Christmas, when a rogue process started hitting the ECG South Africa site with hits that amounted to about 10% of all traffic.  This introduced data skew which caused the current Hadoop processes to complete much later than usual.  After identification, the traffic was filtered out from processing and the system returned to normal.  Logging from the rogue process was still collected and traffic eventually stopped.  It’s not clear at this time whether the traffic was ip blocked upstream or it stopped independently.

Tracking spam

The above illustrates a spike in traffic from a single source starting around 12/28 lasting till 1/1.  The traffic did not pose an additional load to the overall system, but caused MR processes to complete later because of data skew.

Git Gripe

Git and renaming files

There’s an entry here about how intelligent Git is in detecting renamed files or folders.  When I hear that, I think “Oh, so I can rename something on the filesystem and git will automatically detect that when I do a status or a commit (albeit the latter is functionality I may not want).

Turns out that’s not the case.  Git can detect renames without you explicitly performing a ‘git mv’.  However after you do the operation outlined above, a ‘git status’ will only show a deleted file and an untracked file and not the desired rename.  So where’s the magic?  Turns out you have to add the untracked file and even then a ‘git status’ won’t show you the intelligence at work.  You have to do a ‘git commit -a –dry-run’, in which case git will show that it found the connection between the old and new file.

Here’s a screencast that illustrates what I’m talking about.

This is on git version

Here’s the link that started me on this thread.  Took a bit too much discussion for the guy to get his answer.


Language inconsistencies Part 1

First in a series of posts about inconsistences in computing languages

First up is Python and the way strings are read.  The following are two ways to access a part of a string.  One produces an error while the other doesn’t.

First, the one that produces an error

print ”.[0]  # access the first element of an empty string produces an error

You would think this second example would produce an error too, but on the contrary.

print ”.[0:]  # accessing the string using a sequence (note the colon) doesn’t produce an error.