Ubuntu 16.04下安裝CUDA 8.0, Anaconda 4.4.0和TensorFlow 1.2.1

2020-06-16 17:02:14

Ubuntu 16.04下安裝Cuda 8.0, Anaconda 4.4.0和TensorFlow 1.2.1

Cuda

如果配了Nvidia卡的，可以考慮安裝Cuda，這樣之後可以用GPU加速。之前寫過一篇在Ubuntu 14.04上裝Cuda 7.5的文章(http://www.linuxidc.com/Linux/2016-11/136768.htm)。TensorFlow 1.2版本貌似需要Cuda Toolkit 8.0，過程和之前是差不多的。更新driver（如需），然後去Nvidia官網下載Cuda和cuDNN安裝即可。具體不再累述。對於大部分N卡，Cuda 8.0需要driver的最低版本為367，所以如果已經夠用，在安裝cuda的時候保險點的話就不用更新驅動。如果更新驅動後不幸中招，如迴圈登入或無法進入圖形介面等問題，可以到字元終端(CTL+ALT+F1)先嘗試清除已有驅動，禁用Nvidia開源驅動nouveau，然後重灌驅動。

sudo apt-get remove --purge nvdia*
sudo apt-get install update
sudo apt-get install dkms build-essential linux-headers-generic
sudo vim /etc/modprobe.d/blacklist.conf

sudo apt-get remove --purge nvdia*
sudo apt-get install update
sudo apt-get install dkms build-essential linux-headers-generic
sudo vim /etc/modprobe.d/blacklist.conf

在blacklist.conf中加上：

blacklist nouveau

blacklist lbm-nouveau

options nouveau modeset=0

alias nouveau off

alias lbm-nouveau off

sudo service lightdm stop
sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt-get update
sudo apt-get install nvidia-375

sudo service lightdm stop
sudo add-apt-repository ppa:graphics-drivers/ppa && sudo apt-get update
sudo apt-get install nvidia-375

重新啟動。如果進不了圖形介面，就把unity那坨都重灌一下，然後再通過sudo service lightdm start啟動桌面環境。

Anaconda

Anaconda發行版可以用於建立獨立的Python開發執行環境。每個環境中的python runtime都是獨立的，互不影響。這樣就不用擔心安裝A的時候把B的環境給破壞了。Anaconda最新版本4.4.0。下載連結為：https://www.continuum.io/downloads。安裝很方便，以Anaconda for Python 2.7為例：

bash ~/Downloads/Anaconda2-4.4.0-Linux-x86_64.sh

bash ~/Downloads/Anaconda2-4.4.0-Linux-x86_64.sh

然後就可以建立環境，比如建立兩個分別為python 2.7和3.5的環境：

conda create --name py35 python=3.5
conda create --name py27 python=2.7

conda create --name py35 python=3.5
conda create --name py27 python=2.7

其中py27和py35為環境名，之後用：

source activate <env name>

source activate <env name>

進入相應的環境。退出用：

source deactivate

source deactivate

列出當前環境資訊：

conda list

conda list

刪除環境可以用：

conda remove --name <env name> --all

conda remove --name <env name> --all

列出現有的環境：

conda env list

conda env list

列出環境中安裝的包：

conda list --name=<env name>

conda list --name=<env name>

更多用法請參見：https://conda.io/docs/using/envs.html　

進入環境後安裝包既可以用conda install也可以用傳統的pip install，有時網路不給力的時候可能下載會超時：

ReadTimeoutError: HTTPSConnectionPool(host='pypi.python.org', port=443): Read timed out.

如果真的只是因為慢，這裡可以用延長timeout時間來解決：

pip --default-timeout=10000 install -U <package name>

pip --default-timeout=10000 install -U <package name>

另外如果在使用過程中碰到下面錯誤：

ValueError: failed to parse CPython

有可能是和使用者目錄下的本地環境串了。一個方法是開啟anaconda2/lib/python2.7/site.py，修改ENABLE_USER_SITE = False。

TensorFlow

目前最新release版本為1.2.1（1.3還是RC狀態）。我們就以v1.2.1為例。最方便的話就是裝prebuild版：https://www.tensorflow.org/install/install_linux。如果已經裝了Anaconda，先進入環境（假設已經建立了python 2.7的環境，名為py27）：

source activate py27

source activate py27

如果沒有安裝Anaconda的話上面這步就省了。之後安裝TensorFlow，其中的binary下載連結需要根據python版本，有無GPU資訊在https://www.tensorflow.org/install/install_linux中自行選取。如python 3.5，有GPU的情況下就可以用：

pip install --ignore-installed --upgradehttps://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl

pip install --ignore-installed --upgradehttps://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.2.1-cp35-cp35m-linux_x86_64.whl

再稍微驗證下能否順利載入：

python -c "import tensorflow as tf;print(tf.__version__);"

python -c "import tensorflow as tf;print(tf.__version__);"

如果列印出剛??的版本號那就差不多了。

但官方prebuild版沒有加入x86並行指令(SSE/AVX/FMA)優化。因此訓練的時候會列印類似下面資訊：

2017-08-12 20:10:39.973508: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973536: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973541: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973549: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

2017-08-12 20:10:39.973508: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973536: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973541: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973545: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-12 20:10:39.973549: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.

有個鴕鳥的辦法就是將log level提高，眼不見心不煩：

export TF_CPP_MIN_LOG_LEVEL=2

export TF_CPP_MIN_LOG_LEVEL=2

但這樣把其它一些log也過濾了。另一方面，x86的並行加速指令在一些情況下是可以帶來幾倍的效能提升的。因此我們可以考慮自己編譯一個帶該優化的版本。先下載原始碼，然後checkout相應版本分支(如r1.2)：

git clone https://github.com/tensorflow/tensorflow
git checkout r1.2

git clone https://github.com/tensorflow/tensorflow
git checkout r1.2

參考https://stackoverflow.com/questions/41293077/how-to-compile-tensorflow-with-sse4-2-and-avx-instructions，安裝好編譯工具bazel後(https://docs.bazel.build/versions/master/install-ubuntu.html)，可以用以下命令編譯：

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

如果你編譯的時候碰到以下錯誤：

Loading:
Loading: 0 packages loaded
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):

Loading: 
Loading: 0 packages loaded
ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):

這是一個已知問題（https://github.com/tensorflow/tensorflow/pull/11949），解決方法見https://github.com/tensorflow/tensorflow/pull/11949/commits/c5d311eaf8cc6471643b5c43810a1feb19662d6c，目前貌似還沒有pick到發布分支，人肉pick下吧，應該就解決了。編譯好後用下面命令在指定目錄（如~/tmp/）生成whl安裝包，然後就和前面一樣安裝即可。

bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/

bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/

如果執行時出現下面錯誤：

ImportError: Traceback (most recent call last):
File "tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
ImportError: No module named pywrap_tensorflow_internal

ImportError: Traceback (most recent call last):
File "tensorflow/python/pywrap_tensorflow.py", line 41, in <module>
from tensorflow.python.pywrap_tensorflow_internal import *
ImportError: No module named pywrap_tensorflow_internal

根據https://stackoverflow.com/questions/35953210/error-running-basic-tensorflow-example，cd到非tensorflow原始碼目錄即可。

Ubuntu 15.04 下Caffe + + CUDA 7.0 安裝設定指南 http://www.linuxidc.com/Linux/2016-11/137497.htm

Caffe 深度學習入門教學 http://www.linuxidc.com/Linux/2016-11/136774.htm

Ubuntu 16.04下Matlab2014a+Anaconda2+OpenCV3.1+Caffe安裝 http://www.linuxidc.com/Linux/2016-07/132860.htm

Ubuntu 16.04系統下CUDA7.5設定Caffe教學 http://www.linuxidc.com/Linux/2016-07/132859.htm

Caffe在Ubuntu 14.04 64bit 下的安裝 http://www.linuxidc.com/Linux/2015-07/120449.htm

深度學習框架Caffe在Ubuntu下編譯安裝 http://www.linuxidc.com/Linux/2016-07/133225.htm

Caffe + Ubuntu 14.04 64bit + CUDA 6.5 設定說明 http://www.linuxidc.com/Linux/2015-04/116444.htm

Ubuntu 16.04上安裝Caffe http://www.linuxidc.com/Linux/2016-08/134585.htm

Caffe設定簡明教學 ( Ubuntu 14.04 / CUDA 7.5 / cuDNN 5.1 / OpenCV 3.1 ) http://www.linuxidc.com/Linux/2016-09/135016.htm

Ubuntu 16.04安裝 Caffe GPU版 http://www.linuxidc.com/Linux/2017-09/147111.htm

Ubuntu 16.04上安裝Caffe(CPU only) http://www.linuxidc.com/Linux/2016-09/135034.htm

本文永久更新連結地址：http://www.linuxidc.com/Linux/2017-11/148444.htm

Ubuntu 16.04下安裝CUDA 8.0, Anaconda 4.4.0和TensorFlow 1.2.1

熱門文章