本文测试于 Linux Debian 12 安装Tesla P4 P40等GPU的显卡Cuda和驱动
在pve 8.2中使用q35(不可使用i440fx)机型创建的虚拟机中安装Nvidia gpu驱动的教程。
检查显卡存在
通常可以使用 lspci 命令来识别已安装显卡的 NVIDIA 图形处理单元 (GPU) 系列/代号。例如:
$ lspci | grep NVIDIA
为apt允许非自由软件源
vim.tiny /etc/apt/sources.list
# 添加 "contrib", "non-free" 和 "non-free-firmware" 组件到 /etc/apt/sources.list,例如:
# Debian Bookworm
deb http://deb.debian.org/debian/ bookworm main contrib non-free non-free-firmware
# 对于中国用户更换清华tuna源:
vim.tiny /etc/apt/sources.list
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib non-free non-free-firmware
deb https://mirrors.tuna.tsinghua.edu.cn/debian-security bookworm-security main contrib non-free non-free-firmware
# deb-src https://mirrors.tuna.tsinghua.edu.cn/debian-security bookworm-security main contrib non-free non-free-firmware
更新apt
apt update -y
安装显卡驱动
apt install nvidia-driver firmware-misc-nonfree
重启电脑
检查 Debian 12 上是否安装了显卡驱动
nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P4 On | 00000000:03:00.0 Off | Off |
| N/A 38C P8 6W / 75W | 0MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
显卡驱动已安装完毕
安装Cuda
apt install nvidia-cuda-dev nvidia-cuda-toolkit
检查 Debian 12 上是否安装了 NVIDIA CUDA
nvcc --version
安装 NVIDIA cuDNN
apt install nvidia-cudnn
看到窗口后,按
NVIDIA cuDNN 库需要从 NVIDIA 官方网站下载。需要一段时间。
关闭ECC
通过nvidia-smi | grep Tesla
查看前面GPU编号
d@d:/fuck$ nvidia-smi | grep Tesla
| 0 Tesla P40 On | 00000000:03:00.0 Off | Off |
-----------------------------------------------------------------------------------------
nvidia-smi -i n -e 0/1 可关闭(0)/开启(1) , n是GPU的编号。
执行关闭ECCsudo nvidia-smi -i 0 -e 0
重启后该设置生效。