CUDA
安装失败,或者重新安装,清理残余文件
a).deb
安装失败的
sudo apt-get--purge remove nvidia*
b).run
安装失败的
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
sudo /usr/bin/nvidia-uninstall
c) 在 a) 或 b) 方法清除后,安装还是失败
sudo apt-get autoremove --purge nvidia-*
#把nvidia驱动清个干干净净
sudo reboot
1. PRE-INSTALLATION ACTION
1.1 Verify you have a CUDA-Capable GPU
lspci | grep -i nvidia
result show:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 1b81 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 10f0 (rev a1)
1.2 Verify you have a Supported Version of Linux
uname -m &&cat /etc/*release
result show:
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"
.....
1.3 Verify the System Has GCC Installed
gcc --version
result show:
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
1.4 Verify the System has the Correct Kernel Headers and Development Packages Installed
uname -r
result show:
4.4.0-66-generic
安装对应的kernels header和开发包:
sudo apt-get install linux-headers-$(uname -r)
1.5 Download the NVIDIA CUDA Toolkit
这里下载最新.run
版本
下载完后,用MD5 检验,如果序号不和,得重新下载
md5sum cuda_8.0.27_linux.run
2. RUNFILE INSTALLATION
2.1 Disabling Nouveau
lsmod | grep nouveau
如果有内容输出,则需禁掉nouveau
sudo gedit /etc/modprobe.d/blacklist-nouveau.conf
添加如下内容:
blacklist nouveau
options nouveau modeset=0
执行
sudo update-initramfs –u
验证是否成功禁用
lsmod | grep nouveau
重启电脑
2.2 Reboot Into Text Mode
重启后,进入登录界面的时候,不要登录进入桌面(否则可能会失败,若不小心进入,请重启电脑),直接按
Ctrl+Alt+F1
进入文本模式(命令行界面),登录账户。
关闭图形化界面
sudo service lightdm stop
切换到cuda_8.0.27_linux.run
的目录
sudo sh cuda_8.0.27_linux.run
!Note:安装的时候,要让你先看一堆文字(EULA),我们直接不停的按空格键到100%;
遇到提示是否安装
openGL
,选择NO,其他的可以一路accept, yes或回车
安装成功后,会显示installed,否则会显示failed
重启图形化界面
sudo service lightdm start
登录时能进入桌面,不会一直在重复登录,成功已近大半。
!Note:如果出现重复登陆情况,请卸载cuda,然后重装。
原因:是
OpenGL
与NVIDIA 冲突
。卸载:在登陆界面时,按Ctrl + Alt + f1,进入TUI
sudo /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl
sudo /usr/bin/nvidia-uninstall sudo reboot
2.3 Device Node Verification
ls /dev/nvidia*
可能出现a), b), c),d)三种结果,请对号入座。前方高能!
a)若结果显示
/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm
或显示出类似的信息,应该有三个(包含一个类似/dev/nvidia-nvm的),则安装成功
b)如果运气有点背,结果是这样的
ls: cannot access /dev/nvidia*: No such file or directory
或是这样的,只出现
/dev/nvidia0 /dev/nvidiactl
把下面的.sh
文件随便命个名Nka.sh
#!/bin/bash
/sbin/modprobe nvidia
if [ "$?" -eq 0 ]; then
# Count the number of NVIDIA controllers found.
NVDEVS=`lspci | grep -i NVIDIA`
N3D=`echo "$NVDEVS" | grep "3D controller" | wc -l`
NVGA=`echo "$NVDEVS" | grep "VGA compatible controller" | wc -l`
N=`expr $N3D + $NVGA - 1`
for i in `seq 0 $N`; do
mknod -m 666 /dev/nvidia$i c 195 $i
done
mknod -m 666 /dev/nvidiactl c 195 255
else
exit 1
fi
/sbin/modprobe nvidia-uvm
if [ "$?" -eq 0 ]; then
# Find out the major device number used by the nvidia-uvm driver
D=`grep nvidia-uvm /proc/devices | awk '{print $1}'`
mknod -m 666 /dev/nvidia-uvm c $D 0
else
exit 1
fi
然后执行
sudo chmod +x Nka.sh
sudo ./Nka.sh
ls /dev/nvidia*
添加启动脚本自动加载
sudo gedit /etc/rc.local
如果你是第一次打开这个文件,它应该是空的(除了一行又一行的#
注释项外)。这文件的第一行是
#!/bin/sh -e
把
-e
去掉(这步很重要,否则它不会加载这文本的内容)
然后把Nka.s
的内容除了#!/bin/bash
外复制到其中,(before exit 0)
保存退出。
下次重启时,你应该能直接看到/dev
目录下的三个nvidia
的文件
c) 如果人品实在不好(我就遇过几次。。。),结果是这样的
modprobe: ERROR: could not insert 'nvidia_uvm': Operation not permitted
当出现这种情况时,可能是驱动打起架来了。
sudo apt-get autoremove --purge nvidia-*
#把nvidia驱动清个干干净净
sudo reboot
#一定记得重启,不然你会后悔的!
然后
sudo ./Nka.sh
ls /dev/nvidia*
这时,应该可以见到
/dev/nvidia0 /dev/nvidiactl /dev/nvidia-uvm
3 POST-INSTALLATION ACTIONS
3.1 Environment Setup
sudo gedit /etc/profile
在文件最后添加
export PATH=/usr/local/cuda-8.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
如果启动tensorflow报错
Couldn't open CUDA library libcuda.so.
#手动查找添加进 LD_LIBRARY_PATH
ldconfig -p | grep libcuda
最后
source /etc/profile
3.2 Verify the Installation
3.2.1 Verify the Driver Version
nvidia-smi
cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 361.77 Sun Jul 17 21:18:18 PDT 2016
GCC version: gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)
3.2.2 Verify CUDA Toolkit
nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61
!Note: 如果是这样的: The program 'nvcc' is currently not installed. You can install itbytyping:sudo apt-get install nvidia-cuda-toolkit
莫方,确认下
/etc/profil
的配置环境是否正确即使什么都没改,可能忘了这一步,或是之前执行了,但过了有段时间,且又还没重启电脑。因为
source /etc/profile
是临时生效,重启电脑才是永久生效
source /etc/profile
再执行(应该就有显示了)
nvcc -V
3.2.3 Complie sample
cd
进NVIDIA_CUDA-8.0_Samples目录,执行
make
运行完后,编译结果会放在NVIDIA_CUDA-8.0_Samples
目录下的bin
目录
3.2.3 Running the Binaries
cd
进bin
目录里面的里面的里面,知道看到一堆可执行文件(菱形的图标),大概是 ~/NVIDIA_CUDA-8.0_Samples/bin/x86_64/linux/release
执行
./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 720M"
CUDA Driver Version / Runtime Version
CUDA Capability Major/Minor version number: 2.1
Total amount of global memory: 1985 MBytes (2081226752 bytes)
( 2) Multiprocessors, ( 48) CUDA Cores/MP: 96 CUDA Core
GPU Max Clock rate: 1250 MHz (1.25 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 64-bi
L2 Cache Size: 131072 bytes
Maximum memory pitch: 2147483647 byte
exture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: No
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
eviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = GeForce GT 720M
Result = PASS
或之类的东东,且最后是 Result = PASS
若失败 Result = FAIL
执行
./bandwidthTest
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: GeForce GT 720M
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3220.9
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 3271.9
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 9772.8
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
成功cuda
4.cudnn8.0
tar -zxvf cudnn-8.0-linux-x64-v*.tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/cudnn.h /usr/local/cuda/include/
(no need)更新软连接:
cd /usr/local/cuda/lib64
sudo rm -rf libcudnn.so libcudnn.so.5 #删除原有动态文件
sudo ln -s libcudnn.so.5.1.5 libcudnn.so.5
sudo ln -s libcudnn.so.5 libcudnn.so
之前安装了某个cuDNN的版本,想替换为新的:
解决方法是删除掉原先的,重新装新的cuDNN
删除旧的
cd /usr/local/cuda/lib64
sudo rm libcudnn*
然后再按照上述方法安装新的