GANs Specialization from DeepLearning.ai

Back in November 2020, I finished 3 courses that were part of the Generative Adversarial Networks (GANs) Specialization created by DeepLearning.ai and taught at Coursera. It was a very informative and enjoyable course. It went through the gamut of GANs, from its history, the fundamentals, up to the State of the Art. Following is a quick synopsis of the three courses.

Course 1 – Build Basic GANs

Week 1 gives you the introduction to GANs – what is the generator and the discriminator, their history and what you can do with them.  It also introduces the BCE cost function

Week 2 introduces Deep Convolutional GANs and goes through some basics – activation functions, batch normalization, basics of convolutions, stride and padding, pooling and upsampling, and transposed convolutions

Week 3 Starts introducing improvements to the basic GANs methodology.  It talks about problems with using BCE Loss like model collapse and vanishing gradients and how it can be overcome by using Wasserstein Loss and 1-L Continuity

Week 4 is about conditional and controllable generation.  Conditional is about what is generated and controllable is about making tweaks around what is generated.

Course 2 – Build Better GANs

Week 1 talks about evaluating GANs, why it’s difficult and using features instead of pixels for comparison.  It talks about the feature extractor and introduces Frechet Inception Distances (FID) and an earlier but flawed metric called the Inception Score.  Later it introduces sampling and a truncation trick to refine your GANs for fidelity or diversity and also talks about precision and recall as it relates to GANs.

Screen Shot 2021-08-01 at 5.40.46 AM

Week 2 deals with GANs disadvantages and machine bias.  One disadvantage is evaluation – there isn’t a metric that’s grounded in theory. Evaluation is done mainly via comparisons between generated and real images.  It’s also not invertible – you can derive the noise vector from the image, and finally there’s no density estimation, which is good for anomaly detection.

Alternatives to GANs are presented, including variable autoencoders, flow models, and hybrid architectures

Machine Bias is also introduced in this week, illustrated by the COMPAS program, a commercial algorithm used by courts for pre-trial sentencing.  Bias also entails discussions of fairness and what constitutes a good fairness metric, examples being demographic parity and equality of odds.

Screen Shot 2021-08-13 at 5.23.48 PM

Finally, the week wraps up with a discussion around how biases can be introduced either through the training set or the code architecture.

Week 3 delves into improvements in GANs and introduces us to StyleGANs – the state of the art of GANs capabilities as of this writing.  It features better fidelity, diversity, and feature control ( think mixing features from two images to produce a new one or adding glasses to faces ).  The main components of StyleGANs is

  1. Progressive growing – introduced by ProGANs, a predecessor of StyleGANs, it starts creating the image initially via low resolution images and progressively grows the image while also being evaluated by the discriminator.
  2. The noise mapping network, which takes the initial noise vector and produces and intermediate noise vector to be fed into StyleGANs, and
  3. The adaptive instance normalization (AdaIN) which takes the intermediate noise vector and treats them before injecting them into various layers of the StyleGANs.  Earlier layers change general features and later layers modify finer features.

Screen Shot 2021-08-15 at 10.09.19 AM

Course 3 – Apply GANs

Week 1 Data Augmentation and Data Privacy.  GANs can be used to augment data sets when real world data cannot be attained.  Although there are other techniques for data augmentation ( like taking a photo and flipping/rotating/zooming it ), GANs gives you more variation and also adds realism.  

GANs can also be used for privacy and anonymity.  For privacy, it can be used to anonymize MRI or CT scans that are used for training a model.  One case re: anonymity is hiding the true identity of someone who would be unwilling to testify.  One obvious con is the rise of deepfakes and its ability to manipulate video to make it look like someone was there was. he/she wasn’t

Week 2 deals with Image to Image translation with Pix2Pix.  I2I is used to transfer styles, transform an image to an abstraction ( i.e. segment map ), colorizing video

Screen Shot 2021-08-24 at 12.50.01 AM

Pix2Pix is a kind of paired conditional generation model.  It’s pretty complicated model and the course delves into its various components – PatchGAN, U-Net, and the Pixel Distance Loss Term.

Week 3 introduces unpaired image translation and the CycleGAN.  Whereas paired I2I has pairs of images that correspond to each other, there is no such thing for unpaired I2I – instead you have to piles of images and the GAN tries to learn differences between one pile and the other.  This is the famous zebras to horses translation.

Screen Shot 2021-08-24 at 1.01.41 AM

Downloading code

The programming labs were in jupyter notebooks provided by the course, but you could download the code and run it on your own system, which I did and will show you how below.

In the Coursera notebook, do File->Open

This will show you the home directory.  If there are files to download, you can do it here.  In certain cases, the file may be too big, or you have a directory with too many files to download.  In that case, you’ll have to tar and split the files.

Archive and splitting files

We archive directories to put everything into one file and split them if the resultant archive is too large to download ( because of browser limitations ). 

You can do command line commands in the coursera notebook via the bang (!)

!tar cvfz <your_file>.tar.gz <directory or file>

To split the file, do this and use -b to determine the split sixe.  The following splits files according to 50 mb

!split -b 50m <your tar file> <split file format>

!split -b 50m example.tar.gz example.tar.gz.part

This will give you split files example.tar.gz.part[a-z]

At this point, you can go to the directory in your coursera notebook select each file and individually download them.

The put it back together, just cat the files

cat example.tar.gz.part* > example.tar.gz

CUDA out of memory

I encountered this while running Course 3 Week 2 assignments A and B.  It may not be related to the code.

Several fixes can be found in https://stackoverflow.com/questions/54374935/how-to-fix-this-strange-error-runtimeerror-cuda-error-out-of-memory

The one that seemed to work was removing the nvidia cache

rm -fr ~/.nv

Summary

Overall, taking the course was a great experience.  The fact that you can download most of the code means you can continue to learn and explore long after the course is over.  Like most deeplearning.ai courses, the lectures cover a wide range of subjects and the labs give you plenty of leeway to explore the implementations.  Until recently, the slack channels for the various courses were available to look up questions you didn’t ask before, but they recently archived those channels and moved to a different server.