AMD deep learning desktop build

Work notes while building an AMD based desktop for deep learning projects with Ubuntu 18.04, CUDA, with TF and Pytorch

Monday Feb 17th ( President’s Day ) started at 10:30a

  • added the CPU
  • Add memory
    • Ripjaws memory won’t fit under the Noctua fan, so have to rip open the covers 14B517F1-D581-4741-81F4-2F491F646708
  • add M.2
  • Screw motherboard to case
  • Add graphics card
  • Hook up power
  • Hook up connections
  • Power on test

    • Power supply just went on standby.  After about 20 minutes of debugging, turns out the on off connections weren’t secure.

  • Hooked up HD
  • Power on
  • image

Finished 3:17p ( including lunch )

Saturday Feb 22nd (9:09pm)

Finished around midnight

Final notes

  • 500GB of Samsung 1TB formatted ext4 – /dev/nvme0n1p5
  • Dual boot into Ubuntu first
  • Formatted 8GB microSD card to hold Ubuntu bootable disk and installation
  • Installs
    • Nvidia driver – 430
    • CUDA 10.1
    • Anaconda
    • Python 3.7
    • TF 2.0.1 – conda activate tfgpu
    • Pytorch – conda activate pytorch

References

  • Ubuntu install
  • Conda  install tensorflow-gpu – this will also install CUDA.
    • The following NEW packages will be INSTALLED:

      _libgcc_mutex: 0.1-main
      _tflow_select: 2.1.0-gpu
      absl-py: 0.9.0-py37_0
      asn1crypto: 1.3.0-py37_0
      astor: 0.8.0-py37_0
      blas: 1.0-mkl
      blinker: 1.4-py37_0
      c-ares: 1.15.0-h7b6447c_1001
      ca-certificates: 2020.1.1-0
      cachetools: 3.1.1-py_0
      certifi: 2019.11.28-py37_0
      cffi: 1.14.0-py37h2e261b9_0
      chardet: 3.0.4-py37_1003
      click: 7.0-py_0
      cryptography: 2.8-py37h1ba5d50_0
      cudatoolkit: 10.1.243-h6bb024c_0
      cudnn: 7.6.5-cuda10.1_0
      cupti: 10.1.168-0
      gast: 0.2.2-py37_0
      google-auth: 1.11.2-py_0
      google-auth-oauthlib: 0.4.1-py_2
      google-pasta: 0.1.8-py_0
      grpcio: 1.27.2-py37hf8bcb03_0
      h5py: 2.10.0-py37h7918eee_0
      hdf5: 1.10.4-hb1b8bf9_0
      idna: 2.8-py37_0
      intel-openmp: 2020.0-166
      keras-applications: 1.0.8-py_0
      keras-preprocessing: 1.1.0-py_1
      ld_impl_linux-64: 2.33.1-h53a641e_7
      libedit: 3.1.20181209-hc058e9b_0
      libffi: 3.2.1-hd88cf55_4
      libgcc-ng: 9.1.0-hdf63c60_0
      libgfortran-ng: 7.3.0-hdf63c60_0
      libprotobuf: 3.11.4-hd408876_0
      libstdcxx-ng: 9.1.0-hdf63c60_0
      markdown: 3.1.1-py37_0
      mkl: 2020.0-166
      mkl-service: 2.3.0-py37he904b0f_0
      mkl_fft: 1.0.15-py37ha843d7b_0
      mkl_random: 1.1.0-py37hd6b4f25_0
      ncurses: 6.1-he6710b0_1
      numpy: 1.18.1-py37h4f9e942_0
      numpy-base: 1.18.1-py37hde5b4d6_1
      oauthlib: 3.1.0-py_0
      openssl: 1.1.1d-h7b6447c_4
      opt_einsum: 3.1.0-py_0
      pip: 20.0.2-py37_1
      protobuf: 3.11.4-py37he6710b0_0
      pyasn1: 0.4.8-py_0
      pyasn1-modules: 0.2.7-py_0
      pycparser: 2.19-py_0
      pyjwt: 1.7.1-py37_0
      pyopenssl: 19.1.0-py37_0
      pysocks: 1.7.1-py37_0
      python: 3.7.6-h0371630_2
      readline: 7.0-h7b6447c_5
      requests: 2.22.0-py37_1
      requests-oauthlib: 1.3.0-py_0
      rsa: 4.0-py_0
      scipy: 1.4.1-py37h0b6359f_0
      setuptools: 45.2.0-py37_0
      six: 1.14.0-py37_0
      sqlite: 3.31.1-h7b6447c_0
      tensorboard: 2.1.0-py3_0
      tensorflow: 2.1.0-gpu_py37h7a4bb67_0
      tensorflow-base: 2.1.0-gpu_py37h6c5654b_0
      tensorflow-estimator: 2.1.0-pyhd54b08b_0
      tensorflow-gpu: 2.1.0-h0d30ee6_0
      termcolor: 1.1.0-py37_1
      tk: 8.6.8-hbc83047_0
      urllib3: 1.25.8-py37_0
      werkzeug: 1.0.0-py_0
      wheel: 0.34.2-py37_0
      wrapt: 1.11.2-py37h7b6447c_0
      xz: 5.2.4-h14c3975_4
      zlib: 1.2.11-h7b6447c_3

  • Test Tensorflow and python3 install
    • (tf-gpu) dan@dan-X399-AORUS-PRO:~/dev/tensorflow-cnn-tutorial$ python3
      Python 3.7.6 (default, Jan 8 2020, 19:59:22)
      [GCC 7.3.0] :: Anaconda, Inc. on linux
      Type “help”, “copyright”, “credits” or “license” for more information.
      >>> import tensorflow as tf
      >>> tf.__version__
      ‘2.1.0’
      >>> tf.test.is_gpu_available()
      WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
      Instructions for updating:
      Use `tf.config.list_physical_devices(‘GPU’)` instead.
      2020-02-23 01:48:20.149384: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
      2020-02-23 01:48:20.175975: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493005000 Hz
      2020-02-23 01:48:20.176961: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cc220e6780 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:48:20.176985: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
      2020-02-23 01:48:20.178028: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
      2020-02-23 01:48:20.351783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
      pciBusID: 0000:41:00.0 name: GeForce RTX 2060 computeCapability: 7.5
      coreClock: 1.71GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
      2020-02-23 01:48:20.352047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:48:20.353750: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      2020-02-23 01:48:20.355686: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
      2020-02-23 01:48:20.355959: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
      2020-02-23 01:48:20.357749: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
      2020-02-23 01:48:20.358979: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
      2020-02-23 01:48:20.363192: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
      2020-02-23 01:48:20.364707: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
      2020-02-23 01:48:20.364766: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:48:20.665490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
      2020-02-23 01:48:20.665531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
      2020-02-23 01:48:20.665540: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
      2020-02-23 01:48:20.667506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/device:GPU:0 with 5220 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:41:00.0, compute capability: 7.5)
      2020-02-23 01:48:20.669697: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cc24b5ece0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:48:20.669713: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
      True
      >>>

  • Run MNIST test using this tutorial https://github.com/dragen1860/TensorFlow-2.x-Tutorials
    • The Cole Murray tutorial used previously wasn’t compatible with TF 2.1 used here https://github.com/ColeMurray/tensorflow-cnn-tutorial
    • (tf-gpu) dan@dan-X399-AORUS-PRO:~/dev/TensorFlow-2.x-Tutorials/03-Play-with-MNIST$ python3 main.py
      Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
      11493376/11490434 [==============================] – 1s 0us/step
      datasets: (60000, 28, 28) (60000,) 0 255
      2020-02-23 01:52:33.707955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
      2020-02-23 01:52:33.720688: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
      pciBusID: 0000:41:00.0 name: GeForce RTX 2060 computeCapability: 7.5
      coreClock: 1.71GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
      2020-02-23 01:52:33.720807: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:52:33.721751: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      2020-02-23 01:52:33.722790: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
      2020-02-23 01:52:33.722949: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
      2020-02-23 01:52:33.723951: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
      2020-02-23 01:52:33.724572: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
      2020-02-23 01:52:33.726953: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
      2020-02-23 01:52:33.727906: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
      2020-02-23 01:52:33.728194: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
      2020-02-23 01:52:33.751744: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3493005000 Hz
      2020-02-23 01:52:33.752694: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620d9fcbb40 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:52:33.752717: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
      2020-02-23 01:52:33.753560: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
      pciBusID: 0000:41:00.0 name: GeForce RTX 2060 computeCapability: 7.5
      coreClock: 1.71GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
      2020-02-23 01:52:33.753608: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:52:33.753625: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      2020-02-23 01:52:33.753639: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
      2020-02-23 01:52:33.753654: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
      2020-02-23 01:52:33.753668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
      2020-02-23 01:52:33.753682: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
      2020-02-23 01:52:33.753697: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
      2020-02-23 01:52:33.755060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
      2020-02-23 01:52:33.755099: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
      2020-02-23 01:52:33.834700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
      2020-02-23 01:52:33.834733: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
      2020-02-23 01:52:33.834740: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
      2020-02-23 01:52:33.836021: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5113 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:41:00.0, compute capability: 7.5)
      2020-02-23 01:52:33.837739: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5620dd752280 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
      2020-02-23 01:52:33.837752: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2060, Compute Capability 7.5
      Model: “sequential”
      _________________________________________________________________
      Layer (type) Output Shape Param #
      =================================================================
      dense (Dense) multiple 200960
      _________________________________________________________________
      dense_1 (Dense) multiple 65792
      _________________________________________________________________
      dense_2 (Dense) multiple 65792
      _________________________________________________________________
      dense_3 (Dense) multiple 2570
      =================================================================
      Total params: 335,114
      Trainable params: 335,114
      Non-trainable params: 0
      _________________________________________________________________
      2020-02-23 01:52:34.579333: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
      0 loss: 1.2610700130462646 acc: 0.15625
      200 loss: 0.43644407391548157 acc: 0.6821875
      400 loss: 0.35265296697616577 acc: 0.8464062
      600 loss: 0.30810198187828064 acc: 0.870625
      800 loss: 0.2214876413345337 acc: 0.90234375
      1000 loss: 0.29607510566711426 acc: 0.89453125
      1200 loss: 0.2684360444545746 acc: 0.9134375