Working through the examples in here
01-TF2.0-Overview
Issue with conv_train.py
Turns out you’ll run into errors while doing “python conv_train.py” if you have RTX GPU cards. The error looks like:
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
The way to fix this was to add the following config code
gpu_devices = tf.config.experimental.list_physical_devices(‘GPU’)
for device in gpu_devices:
tf.config.experimental.set_memory_growth(device, True)
This fix reoccurs throughout the tutorials, so anything with [1] means this fix needs to be applied
Ref: https://github.com/tensorflow/tensorflow/issues/25446
06-CIFAR-VGG [1]
This one will take a while since it goes through 250 epochs. The final output is below.
248 0 loss: 1.2293422457787528e-07 acc: 1.0
248 40 loss: 1.699651193121099e-07 acc: 1.0
248 80 loss: 9.825426872112075e-08 acc: 1.0
248 120 loss: 1.9557756303356655e-08 acc: 1.0
248 160 loss: 9.592597649543677e-08 acc: 1.0
test acc: 0.8035
249 0 loss: 8.940672557855578e-08 acc: 1.0
249 40 loss: 1.9976806697741267e-07 acc: 1.0
249 80 loss: 1.3271312582219252e-07 acc: 1.0
249 120 loss: 1.5133961994706624e-07 acc: 1.0
249 160 loss: 1.7275985442211095e-07 acc: 1.0
test acc: 0.8034
07-Inception [1]
Another one that takes a while to train
99 100 loss: 2.6542637e-08
99 110 loss: 3.2596286e-09
99 120 loss: 6.007011e-08
99 130 loss: 1.7462094e-07
99 140 loss: 8.5215746e-08
99 150 loss: 5.541332e-08
99 160 loss: 0.0
99 170 loss: 1.2712442e-07
99 180 loss: 1.2107097e-07
99 190 loss: 2.328306e-09
99 200 loss: 1.8579665e-07
99 210 loss: 1.16880365e-07
99 220 loss: 6.8451996e-08
99 230 loss: 0.0
99 evaluation acc: 0.995
08-ResNet [1]
09-RNN-Sentiment-Analysis [1]
25000/25000 [==============================] – 19s 765us/sample – loss: 0.0100 – accuracy: 0.9966 – val_loss: 1.0064 – val_accuracy: 0.8208
Epoch 18/20
25000/25000 [==============================] – 19s 756us/sample – loss: 0.0082 – accuracy: 0.9976 – val_loss: 1.0868 – val_accuracy: 0.8055
Epoch 19/20
25000/25000 [==============================] – 19s 768us/sample – loss: 0.0123 – accuracy: 0.9959 – val_loss: 1.0246 – val_accuracy: 0.8152
Epoch 20/20
25000/25000 [==============================] – 19s 765us/sample – loss: 0.0056 – accuracy: 0.9983 – val_loss: 1.1529 – val_accuracy: 0.8017
25000/25000 [==============================] – 4s 175us/sample – loss: 1.1529 – accuracy: 0.8017
Final test loss and accuracy : [1.1528555624079704, 0.80172]
10-ColorBot
- pip install matplotlib
- In utils.py, change
tf.string_split
to
tf.compat.v1.string_split
Output:
eval/loss: 0.060911
31 0 loss: 0.018744416534900665
32 0 loss: 0.020183812826871872
33 0 loss: 0.018964599817991257
34 0 loss: 0.016659250482916832
35 0 loss: 0.014833886176347733
36 0 loss: 0.011656707152724266
37 0 loss: 0.017404720187187195
38 0 loss: 0.018422933295369148
39 0 loss: 0.013807449489831924
Colorbot is ready to generate colors!
Give me a color name (or press enter to exit): blue
rgb: (0, 39, 197)
Give me a color name (or press enter to exit):
11-AE
Need to pip install pillow
It’ll run through epochs and give you a plot of it’s best estimate
New images saved !
Epoch[9/55], Step [50/600], Reconst Loss: 66.8226
Epoch[9/55], Step [100/600], Reconst Loss: 63.5340
Epoch[9/55], Step [150/600], Reconst Loss: 63.8510
Epoch[9/55], Step [200/600], Reconst Loss: 68.7778
Epoch[9/55], Step [250/600], Reconst Loss: 62.9948
Epoch[9/55], Step [300/600], Reconst Loss: 67.2257
Epoch[9/55], Step [350/600], Reconst Loss: 65.7579
Epoch[9/55], Step [400/600], Reconst Loss: 66.6951
Epoch[9/55], Step [450/600], Reconst Loss: 69.7199
Epoch[9/55], Step [500/600], Reconst Loss: 69.1741
Epoch[9/55], Step [550/600], Reconst Loss: 63.6402
Epoch[9/55], Step [600/600], Reconst Loss: 65.3391

13-DCGAN [1]
Error: Cannot import name ‘toimage’ from ‘scipy.misc’
Solution:
Use PIL Image object instead:
from PIL import Image
…
# replace toimage call
im = Image.fromarray(final_image)
im.save(image_path)
…
https://github.com/dragen1860/TensorFlow-2.x-Tutorials/issues/40
The settings call for a lot of epochs (3000000), so this run will take a while
Update March 13th 2020 – Errorred out on corrupted double-linked list
1566200 d loss: 0.34074077010154724 g loss: 4.772915840148926
1566300 d loss: 0.3889990448951721 g loss: 4.689064025878906
1566400 d loss: 0.33986732363700867 g loss: 6.0187201499938965
1566500 d loss: 0.3325975239276886 g loss: 5.30816650390625
1566600 d loss: 0.3435876667499542 g loss: 5.103038311004639
1566700 d loss: 0.329081267118454 g loss: 5.9022932052612305
1566800 d loss: 0.34372204542160034 g loss: 5.6434760093688965
1566900 d loss: 0.333001047372818 g loss: 5.227304458618164
1567000 d loss: 0.36896687746047974 g loss: 4.873156547546387
1567100 d loss: 0.3333652913570404 g loss: 5.359485626220703
corrupted double-linked list
(tfgpu) dan@dan-X399-AORUS-PRO:~/dev/TensorFlow-2.x-Tutorials/13-DCGAN$
14-Pixel2Pixel [1]
Number of epochs = 1000
Error when saving image
15-CycleGAN [1]
Number of epochs = 1000
ime taken for epoch 601 is 1.366504192352295 sec
Time taken for epoch 641 is 1.4057285785675049 sec
Time taken for epoch 681 is 1.444746971130371 sec
Time taken for epoch 721 is 1.429642677307129 sec
Time taken for epoch 761 is 1.423170804977417 sec
Time taken for epoch 801 is 1.3776824474334717 sec
Time taken for epoch 841 is 1.363966464996338 sec
Time taken for epoch 881 is 1.4198977947235107 sec
Time taken for epoch 921 is 1.3814923763275146 sec
Time taken for epoch 961 is 1.419663429260254 sec