AMD deep learning desktop build

Work notes while building an AMD based desktop for deep learning projects with Ubuntu 18.04, CUDA, with TF and Pytorch

Monday Feb 17th ( President’s Day ) started at 10:30a

  • added the CPU
  • Add memory
    • Ripjaws memory won’t fit under the Noctua fan, so have to rip open the covers 14B517F1-D581-4741-81F4-2F491F646708
  • add M.2
  • Screw motherboard to case
  • Add graphics card
  • Hook up power
  • Hook up connections
  • Power on test

    • Power supply just went on standby.  After about 20 minutes of debugging, turns out the on off connections weren’t secure.

  • Hooked up HD
  • Power on
  • image

Finished 3:17p ( including lunch )

Saturday Feb 22nd (9:09pm)

Finished around midnight

Final notes

  • 500GB of Samsung 1TB formatted ext4 – /dev/nvme0n1p5
  • Dual boot into Ubuntu first
  • Formatted 8GB microSD card to hold Ubuntu bootable disk and installation
  • Installs
    • Nvidia driver – 430
    • CUDA 10.1
    • Anaconda
    • Python 3.7
    • TF 2.0.1 – conda activate tfgpu
    • Pytorch – conda activate pytorch

References

  • Ubuntu install
  • Conda  install tensorflow-gpu – this will also install CUDA.
    • The following NEW packages will be INSTALLED:

      _libgcc_mutex: 0.1-main
      _tflow_select: 2.1.0-gpu
      absl-py: 0.9.0-py37_0
      asn1crypto: 1.3.0-py37_0
      astor: 0.8.0-py37_0
      blas: 1.0-mkl
      blinker: 1.4-py37_0
      c-ares: 1.15.0-h7b6447c_1001
      ca-certificates: 2020.1.1-0
      cachetools: 3.1.1-py_0
      certifi: 2019.11.28-py37_0
      cffi: 1.14.0-py37h2e261b9_0
      chardet: 3.0.4-py37_1003
      click: 7.0-py_0
      cryptography: 2.8-py37h1ba5d50_0
      cudatoolkit: 10.1.243-h6bb024c_0
      cudnn: 7.6.5-cuda10.1_0
      cupti: 10.1.168-0
      gast: 0.2.2-py37_0
      google-auth: 1.11.2-py_0
      google-auth-oauthlib: 0.4.1-py_2
      google-pasta: 0.1.8-py_0
      grpcio: 1.27.2-py37hf8bcb03_0
      h5py: 2.10.0-py37h7918eee_0
      hdf5: 1.10.4-hb1b8bf9_0
      idna: 2.8-py37_0
      intel-openmp: 2020.0-166
      keras-applications: 1.0.8-py_0
      keras-preprocessing: 1.1.0-py_1
      ld_impl_linux-64: 2.33.1-h53a641e_7
      libedit: 3.1.20181209-hc058e9b_0
      libffi: 3.2.1-hd88cf55_4
      libgcc-ng: 9.1.0-hdf63c60_0
      libgfortran-ng: 7.3.0-hdf63c60_0
      libprotobuf: 3.11.4-hd408876_0
      libstdcxx-ng: 9.1.0-hdf63c60_0
      markdown: 3.1.1-py37_0
      mkl: 2020.0-166
      mkl-service: 2.3.0-py37he904b0f_0
      mkl_fft: 1.0.15-py37ha843d7b_0
      mkl_random: 1.1.0-py37hd6b4f25_0
      ncurses: 6.1-he6710b0_1
      numpy: 1.18.1-py37h4f9e942_0
      numpy-base: 1.18.1-py37hde5b4d6_1
      oauthlib: 3.1.0-py_0
      openssl: 1.1.1d-h7b6447c_4
      opt_einsum: 3.1.0-py_0
      pip: 20.0.2-py37_1
      protobuf: 3.11.4-py37he6710b0_0
      pyasn1: 0.4.8-py_0
      pyasn1-modules: 0.2.7-py_0
      pycparser: 2.19-py_0
      pyjwt: 1.7.1-py37_0
      pyopenssl: 19.1.0-py37_0
      pysocks: 1.7.1-py37_0
      python: 3.7.6-h0371630_2
      readline: 7.0-h7b6447c_5
      requests: 2.22.0-py37_1
      requests-oauthlib: 1.3.0-py_0
      rsa: 4.0-py_0
      scipy: 1.4.1-py37h0b6359f_0
      setuptools: 45.2.0-py37_0
      six: 1.14.0-py37_0
      sqlite: 3.31.1-h7b6447c_0
      tensorboard: 2.1.0-py3_0
      tensorflow: 2.1.0-gpu_py37h7a4bb67_0
      tensorflow-base: 2.1.0-gpu_py37h6c5654b_0
      tensorflow-estimator: 2.1.0-pyhd54b08b_0
      tensorflow-gpu: 2.1.0-h0d30ee6_0
      termcolor: 1.1.0-py37_1
      tk: 8.6.8-hbc83047_0
      urllib3: 1.25.8-py37_0
      werkzeug: 1.0.0-py_0
      wheel: 0.34.2-py37_0
      wrapt: 1.11.2-py37h7b6447c_0
      xz: 5.2.4-h14c3975_4
      zlib: 1.2.11-h7b6447c_3

  • Test Tensorflow and python3 install
    • (tf-gpu) dan@dan-X399-AORUS-PRO:~/dev/tensorflow-cnn-tutorial$ python3
      Python 3.7.6 (default, Jan 8 2020, 19:59:22)
      [GCC 7.3.0] :: Anaconda, Inc. on linux
      Type “help”, “copyright”, “credits” or “license” for more information.
      >>> import tensorflow as tf
      >>> tf.__version__
      ‘2.1.0’
      >>> tf.test.is_gpu_available()
      WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
      Instructions for updating:
      Use `tf.config.list_physical_devices(‘GPU’)` instead.
      2020-02-23 01:48:20.149384: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
      2020-02-23 01:48:20.175975: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493005000 Hz
      2020-02-23 01:48:20.176961: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cc220e6780 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:48:20.176985: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
      2020-02-23 01:48:20.178028: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
      2020-02-23 01:48:20.351783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
      pciBusID: 0000:41:00.0 name: GeForce RTX 2060 computeCapability: 7.5
      coreClock: 1.71GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
      2020-02-23 01:48:20.352047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:48:20.353750: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      2020-02-23 01:48:20.355686: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
      2020-02-23 01:48:20.355959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
      2020-02-23 01:48:20.357749: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
      2020-02-23 01:48:20.358979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
      2020-02-23 01:48:20.363192: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
      2020-02-23 01:48:20.364707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
      2020-02-23 01:48:20.364766: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:48:20.665490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
      2020-02-23 01:48:20.665531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
      2020-02-23 01:48:20.665540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
      2020-02-23 01:48:20.667506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:41:00.0, compute capability: 7.5)
      2020-02-23 01:48:20.669697: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cc24b5ece0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:48:20.669713: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
      True
      >>>

  • Run MNIST test using this tutorial https://github.com/dragen1860/TensorFlow-2.x-Tutorials
    • The Cole Murray tutorial used previously wasn’t compatible with TF 2.1 used here https://github.com/ColeMurray/tensorflow-cnn-tutorial
    • (tf-gpu) dan@dan-X399-AORUS-PRO:~/dev/TensorFlow-2.x-Tutorials/03-Play-with-MNIST$ python3 main.py
      Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
      11493376/11490434 [==============================] – 1s 0us/step
      datasets: (60000, 28, 28) (60000,) 0 255
      2020-02-23 01:52:33.707955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
      2020-02-23 01:52:33.720688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
      pciBusID: 0000:41:00.0 name: GeForce RTX 2060 computeCapability: 7.5
      coreClock: 1.71GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
      2020-02-23 01:52:33.720807: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:52:33.721751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      2020-02-23 01:52:33.722790: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
      2020-02-23 01:52:33.722949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
      2020-02-23 01:52:33.723951: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
      2020-02-23 01:52:33.724572: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
      2020-02-23 01:52:33.726953: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
      2020-02-23 01:52:33.727906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
      2020-02-23 01:52:33.728194: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
      2020-02-23 01:52:33.751744: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493005000 Hz
      2020-02-23 01:52:33.752694: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620d9fcbb40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:52:33.752717: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
      2020-02-23 01:52:33.753560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
      pciBusID: 0000:41:00.0 name: GeForce RTX 2060 computeCapability: 7.5
      coreClock: 1.71GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
      2020-02-23 01:52:33.753608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:52:33.753625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      2020-02-23 01:52:33.753639: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
      2020-02-23 01:52:33.753654: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
      2020-02-23 01:52:33.753668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
      2020-02-23 01:52:33.753682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
      2020-02-23 01:52:33.753697: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
      2020-02-23 01:52:33.755060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
      2020-02-23 01:52:33.755099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:52:33.834700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
      2020-02-23 01:52:33.834733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
      2020-02-23 01:52:33.834740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
      2020-02-23 01:52:33.836021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5113 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:41:00.0, compute capability: 7.5)
      2020-02-23 01:52:33.837739: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620dd752280 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:52:33.837752: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
      Model: “sequential”
      _________________________________________________________________
      Layer (type) Output Shape Param #
      =================================================================
      dense (Dense) multiple 200960
      _________________________________________________________________
      dense_1 (Dense) multiple 65792
      _________________________________________________________________
      dense_2 (Dense) multiple 65792
      _________________________________________________________________
      dense_3 (Dense) multiple 2570
      =================================================================
      Total params: 335,114
      Trainable params: 335,114
      Non-trainable params: 0
      _________________________________________________________________
      2020-02-23 01:52:34.579333: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      0 loss: 1.2610700130462646 acc: 0.15625
      200 loss: 0.43644407391548157 acc: 0.6821875
      400 loss: 0.35265296697616577 acc: 0.8464062
      600 loss: 0.30810198187828064 acc: 0.870625
      800 loss: 0.2214876413345337 acc: 0.90234375
      1000 loss: 0.29607510566711426 acc: 0.89453125
      1200 loss: 0.2684360444545746 acc: 0.9134375

Meetup – Computer vision on mobile device & Multi-stage ML for document understanding

https://www.meetup.com/SF-Big-Analytics/events/258514786/

Went to a very informative meetup at GoPro headquarters in San Mateo.  The takeaways were mainly from the first speaker from Facebook around image recognition on mobile and from the various participants re: what positions they were hiring for.

Computer Vision

Sam walked through various techniques and papers detailing the work needed to train and infer from models staged in the mobile.  Following are some of the papers outlined in the talk

DSD: DENSE-SPARSE-DENSE TRAINING FOR DEEP NEURAL NETWORKS

Click to access 1607.04381.pdf

Screen Shot 2019-02-28 at 11.38.21 AM

Value-aware Quantization for Training and Inference of Neural Networks

Click to access Eunhyeok_Park_Value-aware_Quantization_for_ECCV_2018_paper.pdf

Screen Shot 2019-02-28 at 11.37.15 AM

FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search

https://arxiv.org/abs/1812.03443

Screen Shot 2019-02-28 at 11.42.21 AM

ChamNet: Towards Efficient Network Design through Platform-Aware Model Adaptation

Click to access 1812.08934.pdf

Screen Shot 2019-02-28 at 11.43.35 AM

Document Understanding

The basis of the talk from Henry and Vivek was: Where is the text, what is the text, what does the text mean.  They didn’t offer any papers or insights, but demonstrated the feature and how it’s used within the Workday process.

They compared MXNet, with consulting help from Amazon, with Tensorflow and felt that the former was faster to train and more accurate.

Hiring

Salesforce was hiring.  The representative was from the infrastructure group and they were hiring software and data engineers with some work experience.  They are a Java/Scala shop.

Facebook was hiring all positions in all locations with various amounts of experience, pretty much anywhere and anything.

Workday is another Java/Scala shop.  They do hire Python people for data science.  Previously, they would translate python models into Java/Scala, but that is not scalable from a product point of view, so now they just containerize the Python model.

GoPro is also hiring, but like Salesforce, the representative was from the infrastructure/platform team, so were looking primarily at software and data engineers.

Overview of the Convolution Neural Networks course from deeplearning.ai

Screen Shot 2017-12-03 at 9.01.23 AM

I just finished the fourth course of the deeplearning series, and it was immensely enjoyable.  I have to admit with the advent of Hinton’s capsule networks the motivation to start this set on Convolutional Neural Networks was a little harder than the previous three.  Hinton and other bloggers have already outlined shortcomings of CNNs and the thought at the back of my mind was whether it was worth it to spend the time learning something that may become obsolete.

Nevertheless it was worth it, and as I learned from attending NIPS 2017 last week and through a Deep Learning Study Group, there is still much work to be done with capsule networks. More on that later.

As per the previous three deeplearning.ai courses, this course had the following characteristics:

  • Clear and concise
    • Andrew Ng went over the concepts meticulously
    • The exercises were clearly documented and easy to follow
    • Grading went without a hitch except for one instance (see Caveats below)
  • Touched on a wide range of concepts and use cases
    • CNNs, Residual and Inception networks
    • Object detection, neural style transfer, and face recognition
  • Afforded lots of opportunities for further study
    • Many ways to practice Keras and Tensorflow
    • You could easily complete the exercises and leave it at that, but then you would be shortchanging yourself.  There are so many other avenues to explore based on the code written that you owe yourself to put some additional effort and discover something new.
  • There are some caveats
    • As in previous courses, the exercises were structured well and documented extensively.  They were however, much more challenging that those in the previous three courses.
    • Grading for the last exercise of the fourth week dealing with triplet loss is still flawed as of this writing.  There is a workaround if you look in the discussion forums.
    • There is flaw when trying to use the model in the Week 2 programming assignment to perform your own predictions.

The course encompasses four weeks, each culminating in a quiz and one or two programming assignments.  An understanding of Python, Keras, and Tensorflow would be helpful, but is not necessary as there are tutorials in the previous courses.  Following is a synopsis of each week’s content.

Week 1

Andrew Ng first starts with the motivation for CNNs and what makes them powerful. The initial lectures on edge detection take a step by step approach to showing how different filters perform different functions by detecting different basic patterns.  In later lectures, you’ll see how filters in deeper layers detect more complex shapes (Week 4, What are deep ConvNets learning).

The middle set of lectures deal with the technical guts of CNNs, namely filter size, stride, padding, and pooling, culminating in an example of a single layer CNN.  He finally ends with a lecture on the advantages of CNN over fully connected networks, namely:

    • Fewer parameters to train
    • Parameter sharing – same filters can be used in different areas of the image
    • Sparsity of connections – cells in resultant layers are dependant only on a small subset of the previous layer, hence less prone to overfitting

Screen Shot 2017-12-13 at 8.00.41 AM

Week 2

This weeks’ lectures deal with looking at case studies and different architectures. They delve into the history of CNNs and their evolution, which helps with seeing how different architectures influence results and ultimately gives you better intuition to build better CNNs for yourself

Residual networks help train deeper networks and help with the vanishing or exploding gradient problem by having the output of layer skip the next layer (skip connections) and feed into the one after that. This helps with performance and stability of the parameters during training.  One advantage of the skip connection is that it learns the identity function better than a NN without a skip connection and hence doesn’t hurt performance as much.

The pooling layer is useful for reducing the height and width dimensions of CNNs, whereas the 1×1 convolution (aka Network in Network) are useful for reducing the channel dimension of CNNs.  This becomes important when talking about the Inception network.

The Inception network combine results for all the different filter sizes.  This results in a layer with a very large channel size, and hence the 1×1 convolution comes into play here to reduce the channel dimension.

     Screen Shot 2017-12-13 at 7.58.42 AM

Week 3

This was probably the most interesting section because it dealt with object detection.  The main concepts here are:

    • Sliding windows of various sizes to determine if an object is detected or not.
    • Bounding boxes determined by the network to outline an object.
    • Intersection over Union (IOU) calculations and non-max suppression to determine the best anchor box and ways to de-duplicate boxes that detect the same object
    • Anchor boxes – boxes of various dimensions that are related to the objects you want to detect.
    • YOLO Algorithm – You only look once algorithm that quickly performs object detection. I confirmed with a Waymo engineer at the NIPS 2017 conference that the company has more advanced algorithms than YOLO, but all other concepts are still relevant.

Screen Shot 2017-12-13 at 7.57.08 AM

Week 4

Face verification vs face recognition

Face recognition problems have the issue of not having enough data to train for a traditional CNN.  In addition, you have what is called a one-shot learning problem, which is the problem of performing recognition based on one single image.  So how is this problem solved?

For Face recognition, you train two networks (a siamese network, which are two CNNs with the same parameters) to encode faces into a n-vector.  Then use a difference function d(img1, img2) that will say yes or no based on the similarity of the two encodings.  How is this network trained, and what is the objective function?

The objective function used to train a siamese network is called a triplet loss, which utilizes an anchor image, a positive (similar) image, and a negative (dissimilar) image

Screen Shot 2017-12-12 at 12.13.39 PM

 

Neural Style Transfer

  • Calculate similarity scores between activations in a certain layer as a way to keep track of style.  You can also extend this to activations between layers.
  • Apply similarity scores to generated image to transfer style.
  • Instead of updating weights, it updates the pixels
  • Each iteration generates a better representation of the style over the new content than previous generations

Deeplearning.ai’s course on CNNs is a good overview of the concepts and use cases around the Convolutional Neural Network.  The explanations were clear, concise, and except for a grading hiccup in one of the programming exercises, the quizzes and assignments definitely helped with reinforcing the ideas in the lessons.  I’m definitely looking forward to taking the fifth installment of the series – Sequence Models – which is starting Dec 18th.

Deep Learning – Playing with Neural Style using Torch and Tensorflow

Deep Learning is the hot topic in artificial intelligence circles right now and with the advent of the Go competition and other deep learning advancements, a lot of attention has focused on platforms that make deep learning accessible. Two of those platforms are Torch and Tensorflow. I spent a weekend trying them out and here are some preliminary thoughts.

My point of comparison was the Neural Style project implemented in both platforms by Justin Johnson and Anish Athalye. Neural Style is a deep learning implementation that tries to derive artistic styles from pictures and applies them to a candidate image. The result is a mashup of the original picture in the style (or styles) set at input time.

The first thing to notice is the complexity of the setup. Tensorflow is easier as there aren’t as many components and steps involved. With Torch, there are quite a few moving parts, and even though there are scripts that allow you to do one step installation, I can see how this can become problematic as libraries and models get updated.

For instance, with Tensorflow I could get the basic Neural Style command running after the installation, but with Torch, I encountered errors, related to missing libraries or incompatibilities. I eventually opted for a more complex command line that allowed me to bypass those issues.  More on this in a later post.

However, one thing that Torch shines in is execution time. The torch implementation of Neural Style ran orders of magnitude faster that Tensorflow. In less than an hour (using a Macbook Pro), it went through a thousand iterations whereas with Tensorflow it took more than two days.

Case and point are the following examples.  The first is a 100 iteration run on Tensorflow that took about half a day.  As you can see, you can discern the outlines of the candidate image in the result, but it’s far from done.

Original image:

img_9619

Tensorflow after 100 iterations, which took about six hours.

mix_100

With Torch on the other hand, I ran the image through several styles in less time than it took Tensorflow to render the previous example.

Same image using Torch after 1000 iterations, using the Munch stylescream_img_9619 the_scream

Picasso

face_img_96192-style1

And of course, Van Gough

vg_img_9619 starry_night