tensorflow物理机环境部署

官网推荐跑tensorlow用docker环境,不过老板要求用物理机,咱就搞就是了

参考官网英文文档

https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions

首先说下我的环境,centos7.8,1070ti显卡

1、

[root@xxxx maintenance-item-match]# uname -r
3.10.0-1160.el7.x86_64

[root@xxxx maintenance-item-match]# rpm -qa|grep kernel
kernel-3.10.0-1160.el7.x86_64
kernel-devel-3.10.0-1160.el7.x86_64
kernel-devel-3.10.0-1160.53.1.el7.x86_64
kernel-headers-3.10.0-1160.53.1.el7.x86_64
[root@node14 maintenance-item-match]#

 

如果这二个不一致,需要手动卸载安装,安装完重启

这边我就遇到了,因为环境不是干净的环境吧,导致我的内核需要降级到

3.10.0-1160.el7.x86_64

屏蔽默认带有的nouveau

[root@localhost 10:37:41 src]# vim /lib/modprobe.d/dist-blacklist.conf
blacklist nouveau
options nouveau modeset=0

[root@localhost 10:37:41 src]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@localhost 10:37:41 src]# dracut /boot/initramfs-$(uname -r).img $(uname -r)

[root@localhost 10:37:41 src]# systemctl set-default multi-user.target

reboot

ls mod | grep nouveau 没有输出内容就行了

 

2、

 

https://developer.nvidia.com/cuda-11.2.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=CentOS&target_version=7&target_type=rpmlocal

去下载安装文件

wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda-repo-rhel7-11-2-local-11.2.0_460.27.04-1.x86_64.rpm

sudo rpm -i cuda-repo-rhel7-11-2-local-11.2.0_460.27.04-1.x86_64.rpmsudo yum clean all

sudo yum -y install nvidia-driver-latest-dkms cuda

sudo yum -y install cuda-drivers

3、添加环境变量
export PATH=/usr/local/cuda-11.6/bin${PATH:+:${PATH}}

export LD_LIBRARY_PATH=/usr/local/cuda-11.6/lib64\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

source /etc/profile

4检查
systemctl start nvidia-persistenced
systemctl enable nvidia-persistenced

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Thu_Feb_10_18:23:41_PST_2022
Cuda compilation tools, release 11.6, V11.6.112
Build cuda_11.6.r11.6/compiler.30978841_0

 

 

[root@xxxx maintenance-item-match]# nvidia-smi
Wed Mar 9 16:54:41 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 Off | N/A |
| 0% 42C P8 17W / 180W | 2485MiB / 8192MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 32011 C python3 2481MiB |

 

 

ubuntu下环境安装

ubuntu 20.04.1

 

1

参考官网

https://docs.nvidia.com/cuda/archive/11.2.0/cuda-installation-guide-linux/index.html

安装gcc 9.4.0

sudo apt update

sudo apt install build-essential

 

hxqc@xxxx:~$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2 安装内核

sudo apt-get install linux-headers-$(uname -r)

 

屏蔽默认带有的nouveau

ls mod | grep nouveau

sudo vim /etc/modprobe.d/blacklist-nouveau.conf
写入以下内容
blacklist nouveau
blacklist lbm-nouveau
options nouveau modeset=0
alias nouveau off
alias lbm-nouveau off
更新内核
sudo update-initramfs -u
ls mod | grep nouveau
重启
sudo reboot
4

 环境变量

export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_VISIBLE_DEVICES=0

source /etc/profile

systemctl status nvidia-persistenced
sudo systemctl enable nvidia-persistenced

5

hxqc@xxxx:/home/www/htdocs$ nvidia-smi
Tue Aug 9 09:39:24 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... On | 00000000:01:00.0 Off | N/A |
| 0% 36C P8 17W / 250W | 1MiB / 11176MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+

带GPU版本
tensorflow -i https://pypi.tuna.tsinghua.edu.cn/simple

 测试

$ python
>>> import tensorflow as tf
>>> tf.add(1, 2).numpy()
3
>>> hello = tf.constant('Hello, TensorFlow!')
>>> hello.numpy()
b'Hello, TensorFlow!'



 

posted @ 2022-03-09 16:55  不敲代码  阅读(168)  评论(1)    收藏  举报