Ubuntu 16.04安裝設定TensorFlow GPU版本

2020-06-16 17:27:15

requirements

Ubuntu 16.04
Python 2.7
Flask
tensorflow GPU 版本

安裝nvidia driver

經過不斷踩坑的安裝，終於google到了靠譜的方法，首先檢查你的NVIDIA VGA card model

sudo lshw -numeric -C display

可以看到你的顯示卡資訊，比如我的就是 product: GM107M [GeForce GTX 950M] [10DE:139A]，然後去NVDIA driver search page搜尋你的顯示卡需要的驅動型號，頁面如下：

下面是我的電腦對應的驅動版本

LINUX X64 (AMD64/EM64T) DISPLAY DRIVER

Version:    375.20
Release Date:   2016.11.18
Operating System:   Linux 64-bit
Language:   English (US)
File Size:  72.37 MB

從搜尋的結果頁面看到，我的驅動版本應該是375.20,為了再次確認一遍，你還可以使用這個命令檢視你可以使用的驅動：

ubuntu-drivers devices

結果顯示和搜尋到的驅動版本一樣，推薦也是375

== /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
vendor   : NVIDIA Corporation
model    : GM107M [GeForce GTX 950M]
modalias : pci:v000010DEd0000139Asv000017AAsd0000380Bbc03sc02i00
driver   : nvidia-367 - third-party free
driver   : nvidia-375 - third-party free recommended
driver   : nvidia-364 - third-party free
driver   : nvidia-358 - third-party free
driver   : xserver-xorg-video-nouveau - distro free builtin
driver   : nvidia-370 - third-party free

== cpu-microcode.py ==
driver   : intel-microcode - distro non-free

好了，終於可以安裝對應的驅動了，使用以下命令

version: 375
sudo apt-get install nvidia-375
//你自己的版本
//version : xxx
//sudo apt-get install nvidia-xxx

什麼，安裝很慢，找不到包？更換一下軟體源，這個自己google怎麼更換，最簡單的就是圖形介面裡面找到System->settings->Software&Updates，然後換一下源，比如阿里雲或者中科大（我突然不能連結中科大映象了，真實坑）,然後再執行一下命令

sudo apt-get install mesa-common-dev
sudo apt-get install freeglut3-dev

安裝完成之後，重新啟動電腦，驅動應該就完成了！你可以在dashboard上搜尋nvidia，看到像 NVIDIA X Server Settings的東西，就說明安裝驅動成功了，接下來就是安裝cuda8了

安裝cuda8

首先也是去下載cuda toolkit 8.0,可以自己註冊一個賬號。

一定要選擇runfile.下載完成之後，執行

sudo sh cuda_8.0.44_linux.run --override

然後就進入安裝過程，開始都是End User License Agreement，你可以CTRL +C 跳過，然後accept，下面就是安裝的互動介面，開始的Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?選擇n，因為你已經安裝驅動了。

Using more to view the EULA.
End User License Agreement
--------------------------


Preface
-------

The following contains specific license terms and conditions
for four separate NVIDIA products. By accepting this
agreement, you agree to comply with all the terms and
conditions applicable to the specific product(s) included
herein.


NVIDIA CUDA Toolkit


Description

The NVIDIA CUDA Toolkit provides command-line and graphical
tools for building, debugging and optimizing the performance
of applications accelerated by NVIDIA GPUs, runtime and math
libraries, and documentation including programming guides,
user manuals, and API references. The NVIDIA CUDA Toolkit
License Agreement is available in Chapter 1.


Default Install Location of CUDA Toolkit

Windows platform:

Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 367.48?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
 [ default is /usr/local/cuda-8.0 ]:  

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y 

Enter CUDA Samples Location
 [ default is /home/kinny ]: 

Installing the CUDA Toolkit in /usr/local/cuda-8.0 ...
Missing recommended library: libXmu.so

Installing the CUDA Samples in /home/kinny ...
Copying samples to /home/kinny/NVIDIA_CUDA-8.0_Samples now...
Finished copying samples.

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-8.0
Samples:  Installed in /home/kinny, but missing recommended libraries

Please make sure that
 -   PATH includes /usr/local/cuda-8.0/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-8.0/lib64, or, add /usr/local/cuda-8.0/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-8.0/bin

Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-8.0/doc/pdf for detailed information on setting up CUDA.

***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 361.00 is required for CUDA 8.0 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run -silent -driver

Logfile is /tmp/cuda_install_17494.log

設定cuda環境變數

export PATH="$PATH:/usr/local/cuda-8.0/bin"
export LD_LIBRARY_PATH="/usr/local/cuda-8.0/lib64"

nvidia-smi

結果出現以下輸出，說明設定成功

安裝深度學習庫cuDNN

首先下載cuDNN5.1，直接下載是非常慢的，必須走代理，我用的是終端下載的方法，注意前提是你已經註冊為開發者了！

proxychains wget https://developer.nvidia.com/compute/machine-learning/cudnn/secure/v5.1/prod/8.0/cudnn-8.0-linux-x64-v5.1-tgz
這個會被forbidden，因為沒有認證，開發者需要認證才能下載，你先用chrome下載，然後到show all裡面去copy真實的下載地址
proxychains wget http://developer.download.nvidia.com/compute/machine-learning/cudnn/secure/v5.1/prod/8.0/cudnn-8.0-linux-x64-v5.1.tgz?autho=1479703345_7fbb517b03361780b45a2c43277bb9ac&file=cudnn-8.0-linux-x64-v5.1.tgz
這次成功了！！速度還可以！不過下載下來的檔案名字有問題，修改成cudnn-8.0-linux-x64-v5.1.tgz就可以了

然後是解壓
tar xvzf cudnn-8.0-linux-x64-v5.1.tgz
然後將庫和標頭檔案copy到cuda目錄（一定是你自己安裝的目錄如/usr/local/cuda-8.0）,不過正確安裝的話，ubuntu一般就會有軟連結/usr/local/cuda -> /usr/local/cuda-8.0/
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

安裝tensorflow gpu enable python 2.7 版本，詳見官網

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl
sudo pip install --upgrade $TF_BINARY_URL

驗證
$python 
Python 2.7.12 (default, Jul  1 2016, 15:12:24) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
>>> quit()
大功告成！

錯誤

1.libcudart.so.8.0: cannot open shared object file: No such file or directory

kinny@kinny-Lenovo-XiaoXin:~/Study/tensorflow-0.11.0rc0/tensorflow/models/image/mnist$ python convolutional.py 
Traceback (most recent call last):
  File "convolutional.py", line 34, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/__init__.py", line 23, in <module>
    from tensorflow.python import *
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 28, in <module>
    _pywrap_tensorflow = swig_import_helper()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow', fp, pathname, description)
ImportError: libcudart.so.8.0: cannot open shared object file: No such file or directory

方法是設定環境變數，把以前設定的cuda環境變數改成一下這樣，這個是tensorflow官網上要求的環境變數；

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
export CUDA_HOME=/usr/local/cuda

2.TypeError: run() got an unexpected keyword argument ‘argv’

Traceback (most recent call last):
  File "convolutional.py", line 339, in <module>
    tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
TypeError: run() got an unexpected keyword argument 'argv'

方法是把main裡面的argv引數去掉

使用python 虛擬環境

使用gpu版本執行mnist例子非常慢，基本卡死在資料下載和讀取上了！為了比較gpu和cpu的效能，使用虛擬環境安裝了tensorflow的cpu版本；

sudo apt-get install python-pip python-dev python-virtualenv

mkdir py2virtualenv
virtualenv --system-site-packages ~/py2virtualenv/tensorflowcpu
source ~/py2virtualenv/tensorflowcpu/bin/activate
export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.11.0-cp27-none-linux_x86_64.whl
pip install --upgrade $TF_BINARY_URL

原來cpu版本資料讀取和下載很快！cpu適合做IO和簡單邏輯運算和加減，但是gpu不行，gpu不適合做高IO和加減法，但是在做矩陣運算表現十分強悍，我在把mnist資料集下載到本地後，分別使用cpu版本和gpu版本跑tensorflow/tensorflow/models/image/mnist/convolutional.py，結果顯示：

//cpu版本
Step 8100 (epoch 9.43), 130.6 ms
Minibatch loss: 1.630, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.8%
平均每 100 次 130.64ms 左右

real  19m5.685s
user  67m33.720s
sys 0m12.340s

//gpu版本
Step 8100 (epoch 9.43), 23.2 ms
Minibatch loss: 1.634, learning rate: 0.006302
Minibatch error: 0.0%
Validation error: 0.9%
平均每 100 次 23.2ms 左右

real  3m28.296s
user  2m45.888s
sys 0m29.064s

GPU在矩陣密集運算方面完虐cpu，大概是6倍。我的是GTX 950M,不知道現在的GTX 1080M是什麼情況。

Caffe 深度學習入門教學 http://www.linuxidc.com/Linux/2016-11/136774.htm

Ubuntu 16.04下Matlab2014a+Anaconda2+OpenCV3.1+Caffe安裝 http://www.linuxidc.com/Linux/2016-07/132860.htm

Ubuntu 16.04系統下CUDA7.5設定Caffe教學 http://www.linuxidc.com/Linux/2016-07/132859.htm

Caffe在Ubuntu 14.04 64bit 下的安裝 http://www.linuxidc.com/Linux/2015-07/120449.htm

深度學習框架Caffe在Ubuntu下編譯安裝 http://www.linuxidc.com/Linux/2016-07/133225.htm

Caffe + Ubuntu 14.04 64bit + CUDA 6.5 設定說明 http://www.linuxidc.com/Linux/2015-04/116444.htm

Ubuntu 16.04上安裝Caffe http://www.linuxidc.com/Linux/2016-08/134585.htm

Caffe設定簡明教學 ( Ubuntu 14.04 / CUDA 7.5 / cuDNN 5.1 / OpenCV 3.1 ) http://www.linuxidc.com/Linux/2016-09/135016.htm

Ubuntu 16.04上安裝Caffe(CPU only) http://www.linuxidc.com/Linux/2016-09/135034.htm