Ubuntu 18.04 安装 CUDA 问题及解决

本文安装环境：

　　- 双显卡： intel 集显 + nvidia 独显

　　- Ubuntu 18.04.4

　　- CUDA 11.7

1. Deb 安装包是个坑 (不要用这种方法！)

使用 Deb 安装包 cuda-repo-ubuntu1404-8-0-local_8.0.44-1_amd64.deb，安装完成之后，重启出现黑屏，

　　- 出现黑屏后解决方法：

　　　　（1） Ctrl + Alt + F1 进入命令行模式， root 登录，执行命令，

　　　　　　# apt-get remove --purge nvidia* // 卸载已安装的所有 nvidia 组件

　　　　　　# apt-get autoremove

　　　　　　# reboot

为解决此问题，经过各种折腾……

1）完全卸载 nvidia 相关安装，重新用系统自带的 Software Update 安装 nvidia 驱动，不黑屏了，但是登陆界面循环不能进入

2）参考网上的相关博客，说 Ubuntu 14.04 安装完成后不能更新，否则才会出现黑屏或者循环登陆界面问题，想想我安装的是 Ubuntu 14.04.4 ，相对于 14.04.3 有较多更新，Kernel 也升级到 4.0 ……汗……事实证明这个结论是瞎扯（这里要说说，网上的很多博客可能针对博主自己的机器有效，但是如果不是个通用方法，写出来真实害人不浅。差点让我重装系统……）

3）……

无法解决的问题。具体根源是 Deb 默认将 intel 集显的 openGL lib 给覆盖了，导致 GUI 出现问题。nvidia 文档给出的解释如下，

2. 使用 runfile 安装

对 ubuntu 22.04 下安装 cuda_12.3.2_545.23.08_linux，需要 gcc-12，

$ sudo apt-get install gcc-12

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12

使用 runfile 文件 cuda_11.7.0_515.43.04_linux.run 安装前要先将系统自带的 nvidia 显卡驱动 nouveau 禁掉，

1）创建文件 /etc/modprobe.d/blacklist-nouveau.conf 添加如下内容，

blacklist nouveau
options nouveau modeset=0

2）重新生成 kernel initramfs

$ sudo update-initramfs -u

3）重启系统，进入登陆界面，不要登陆，直接按 Ctrl + Alt + F3 进入命令行模式，执行，

$ sudo service lightdm stop // 关闭桌面服务

$ sudo ./cuda_11.7.0_515.43.04_linux.run --no-opengl-libs

$ sudo service lightdm start // 重启桌面服务

安装完成后，根据最后打印的提示信息设置环境好变量。

4）在安装好上述 cuda 包后，重启系统，发现显卡驱动是 “Grapics llvm.......” 说明本机优先使用了 nvidia 显卡作为 display card,

但我们的目的是：使用 intel 集显作为 display card，而 nvidia 独显只作为 CUDA computing card，设置如下，

首先在任意目录运行如下命令，

$ sudo nvidia-xconfig

这样就会生成文件 /etc/X11/xorg.conf（注意，不要手动创建该文件，除非你知道你在干什么）。

然后查看本机 intel 显卡和 nvidia 显卡的 BUSID，如下命令，

$ lspci | grep -i intel
...
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 530 (rev 06)
...

$ lspci | grep -i nvidia
01:00.0 3D controller: NVIDIA Corporation GM107M [GeForce GTX 960M] (rev a2)

打开 /etc/X11/xorg.conf 查看当前默认使用 nvidia 显卡作为 display card，配置如下，

...
Section "Device"
    Identifier     "Device0"
    Driver         "nvidia"
    VendorName     "NVIDIA Corporation"
    BusID          "PCI:1:0:0"
EndSection
...

将其修改为，

...
Section "Device"
    Identifier     "Device0"
    Driver         "intel"
    VendorName     "Intel Corporation"
    BusID          "PCI:0:2:0"
EndSection
...

然后重启，再次查看系统信息，看到 intel 显卡已经是 display card 了,

为防止系统自动修改此文件，打开文件 /etc/default/grub，在 GRUB_CMDLINE_LINUX_DEFAULT 中增加选项 "nogpumanager"，之后更新 grub 即可，

$ sudo update-grub

5）验证安装，

$ cat /proc/driver/nvidia/version

$ nvcc -V

然后从 github 下载 cuda samples 代码 https://github.com/nvidia/cuda-samples ，直接 make 编译，运行如下，

peterpan@Rescuer:~/cuda-samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GTX 960M"
  CUDA Driver Version / Runtime Version          11.7 / 11.7
  CUDA Capability Major/Minor version number:    5.0
  Total amount of global memory:                 4046 MBytes (4242341888 bytes)
  (005) Multiprocessors, (128) CUDA Cores/MP:    640 CUDA Cores
  GPU Max Clock rate:                            1176 MHz (1.18 GHz)
  Memory Clock rate:                             2505 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        65536 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            No
  Supports Cooperative Kernel Launch:            No
  Supports MultiDevice Co-op Kernel Launch:      No
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.7, NumDevs = 1
Result = PASS
peterpan@Rescuer:~/cuda-samples/bin/x86_64/linux/release$

表示已经安装成功啦～

其它 Post installation 操作，可参考官方文档 https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

6）对 Ubuntu 系统， /dev/nvidia* 相关文件在首次使用 CUDA 时会自动创建，不用手动运行命令生成。

注意：由于安装时指定了 --no-opengl-libs 所以安装完成后会 warnings 如下，

所以要手动安装必要的 lib，如下，

$ sudo apt-get install freeglut3-dev

$ sudo apt-get install libxmu-dev

完。

posted @ 2016-11-16 11:22 Anonymous596 阅读(8460) 评论(2) 收藏举报

刷新页面返回顶部

Anonymous

only to record some tricky things ...

Ubuntu 18.04 安装 CUDA 问题及解决

公告