Graham White: My Notes: Installing Tensorflow GPU on Fedora Linux

Friday 13 September 2019

Installing Tensorflow GPU on Fedora Linux

Following on from my previous notes on building Tensorflow for a GPU on Fedora, I find myself back at it again. I recently upgraded my GPU at home and time has moved on too so this is my current set of notes for what I'm doing with Tensorflow on Fedora. This method, however, differs from my previous notes in as much as I'm using the pre-built Tensorflow rather than building my own. I've found that Tensorflow is so brittle during the build process it's much easier to work with pre-built binaries and set up my system to match their build.

In my previous blog post I benchmarked the CPU versus GPU using the Keras MNIST CNN example and so I thought it would be interesting to offer the same for this new install on my home machine. The results are :

12 minutes and 14 seconds on my CPU
1 minutes and 14 seconds on my GPU

That's just over 9.9 as fast on my GPU as my CPU!

Some info on my machine and config:

Custom Built Home PC
Intel Core i5-3570K CPU @ 3.40GHz (4 cores)
16GB RAM
NVidia GeForce GTX 1660 (CUDA Compute Capability 7.5)
Fedora 30 Workstation running kernel 5.2.9-200.fc30.x86_64

Background Information for NVidia Drivers
Previously, I've always used the Negativo17 repository for all my NVidia driver and CUDA needs. However, the software versions available there are too up-to-date to allow Tensorflow GPU to be installed in a way that works. This repository provides CUDA 10.1 where as Tensorflow, currently at version 1.14, only supports CUDA 10.0. So we must use another source for the NVidia software that provides back-level versions. Fortunately, there is an official NVidia repository providing drivers and CUDA for Linux, so let's use that since it also works quite nicely with the RPM Fusion repositories as well. Hence, this method relies purely on RPM Fusion and the official NVidia repository and does not require or use the Negativo17 repository (although it would be possible to do so).

Install Required NVidia Driver
The RPM Fusion NVidia instructions can be used here for more detail, but in brief simply install the display drivers:

dnf install xorg-x11-drv-nvidia akmod-nvidia xorg-x11-drv-nvidia-cuda

There are some other bits you might want from this repository as well such as:

dnf install vdpauinfo libva-vdpau-driver libva-utils nvidia-modprobe

Wait for the driver to build and reboot to get things up and running.

Install Required NVidia CUDA and Machine Learning Libraries

This step relies on using the official nvidia repositories with a little more information available in the RPM Fusion CUDA instructions.

First of all, add a new yum configuration file. Copy the following to /etc/yum.repos.d/nvidia.repo:

[nvidia-cuda]

name=nvidia-cuda
enabled=1
gpgcheck=1
gpgkey=http://developer.download.nvidia.com/compute/cuda/repos/fedora27/x86_64/7fa2af80.pub
exclude=akmod-nvidia*,kmod-nvidia*,*nvidia*,nvidia-*,cuda-nvidia-kmod-common,dkms-nvidia,nvidia-libXNVCtrl

[nvidia-machine-learning]
name=nvidia-machine-learning
baseurl=http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/
enabled=1
gpgcheck=1
gpgkey=http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/7fa2af80.pub
exclude=libcudnn7*.cuda10.1,libnccl*.cuda10.1

Note that the configuration above deliberately targets the fedora27 repository from NVidia. This is because it is the location at which we can find CUDA 10.0 compatible libraries rather than CUDA 10.1 libraries that will be found in later repositories. So the configuration above is likely to need to change over time but essentially the message here is that we can match the version of CUDA required by targeting the appropriate repository from NVidia. These libraries will be binary compatible with future versions of Fedora so this action should be safe to do for some time yet.

With the following configuration in place we can now install CUDA 10.0 and the machine learning libraries required for Tensorflow GPU support and all of the libraries get installed in the correct places that Tensorflow expects.

To install, run:

dnf install cuda libcudnn7 libnccl

Install Tensorflow GPU

The final piece of the puzzle is to install Tensorflow GPU which is now as easy as:

pip3 install tensorflow-gpu

8 comments:

Anonymous said...: Thanks for this post Graham, that is very useful!
Basically it should work with negativo17 as well, right - only step 1 changes, while step 2 stays (basically, apart from package names) the same?; 23 September 2019 at 13:09:00 BST
Graham White said...: Yes, this process does work with negativo17 as well. I use the negativo17 approach on my work laptop so I have both working (with the RPM Fusion approach working at home). The only difference I can think of in addition to your observations is that the exclude line changes in the yum configuration you'll need. I have exclude=*cuda10.1* on the nvidia-machine-learning repository when using negativo17.; 23 September 2019 at 13:25:00 BST
Graham White said...: oh, and I have exclude=cuda* on the negativo17 repository in order to get the cuda packages from the NVidia repo instead.; 23 September 2019 at 13:27:00 BST
Martijn Faassen said...: A small note: your .repo file has 'enabled=0' in it. It took me a while to figure out that was why it wasn't installing anything. :); 9 December 2019 at 19:22:00 GMT
Graham White said...: Ah yes, thanks Martin! That would be a cut and paste error since I tend to leave it disabled on a day to day basis. Thanks for letting me know.; 9 December 2019 at 19:24:00 GMT
Graham White said...: Just to confirm these notes are still working for Fedora 33 and Tensorflow-gpu 2.4.1. However, you now need to "dnf install cuda-11-0.x86_64 libcudnn8 libnccl" and hook up your DNF to the RHEL 8 repositories [1][2] because these repositories are the ones with the correct current level of CUDA in them (i.e. CUDA 11.0)

[1] http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64
[2] https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/; 16 March 2021 at 15:55:00 GMT
Unknown said...: For Fedora 33, did you still need to have exclude lines in /etc/yum.repos.d/nvidia.repo? If so, do they need to be altered to make sure that the versions of tensorflow, cuda, and cuDNN all line up correctly?

Thanks for posting this!; 18 March 2021 at 04:52:00 GMT
Graham White said...: No need for the excludes any longer. I'll paste my repo file content below but note I leave them disabled by default (enabled=0) so I can pick and choose manually on the command line when I want to pull new bits from those repositories (using --enablerepo=nvidia-cuda --enablerepo=nvidia-ml)

[nvidia-cuda]
name=NVidia Cuda
baseurl=http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64
enabled=0
gpgcheck=1
gpgkey=http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/7fa2af80.pub

[nvidia-ml]
name=NVidia Machine Learning
baseurl=https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/
enabled=0
gpgcheck=1
gpgkey=http://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/7fa2af80.pub; 18 March 2021 at 08:59:00 GMT