Friday 13 September 2019

Installing Tensorflow GPU on Fedora Linux

Following on from my previous notes on building Tensorflow for a GPU on Fedora, I find myself back at it again.  I recently upgraded my GPU at home and time has moved on too so this is my current set of notes for what I'm doing with Tensorflow on Fedora.  This method, however, differs from my previous notes in as much as I'm using the pre-built Tensorflow rather than building my own.  I've found that Tensorflow is so brittle during the build process it's much easier to work with pre-built binaries and set up my system to match their build.

In my previous blog post I benchmarked the CPU versus GPU using the Keras MNIST CNN example and so I thought it would be interesting to offer the same for this new install on my home machine.  The results are  :
  • 12 minutes and 14 seconds on my CPU
  • 1 minutes and 14 seconds on my GPU
That's just over 9.9 as fast on my GPU as my CPU!

Some info on my machine and config:
  • Custom Built Home PC
  • Intel Core i5-3570K CPU @ 3.40GHz (4 cores)
  • 16GB RAM
  • NVidia GeForce GTX 1660 (CUDA Compute Capability 7.5)
  • Fedora 30 Workstation running kernel 5.2.9-200.fc30.x86_64
Background Information for NVidia Drivers
Previously, I've always used the Negativo17 repository for all my NVidia driver and CUDA needs.  However, the software versions available there are too up-to-date to allow Tensorflow GPU to be installed in a way that works.  This repository provides CUDA 10.1 where as Tensorflow, currently at version 1.14, only supports CUDA 10.0.  So we must use another source for the NVidia software that provides back-level versions.  Fortunately, there is an official NVidia repository providing drivers and CUDA for Linux, so let's use that since it also works quite nicely with the RPM Fusion repositories as well.  Hence, this method relies purely on RPM Fusion and the official NVidia repository and does not require or use the Negativo17 repository (although it would be possible to do so).

Install Required NVidia Driver
The RPM Fusion NVidia instructions can be used here for more detail, but in brief simply install the display drivers:
  • dnf install xorg-x11-drv-nvidia akmod-nvidia xorg-x11-drv-nvidia-cuda
There are some other bits you might want from this repository as well such as:
    • dnf install vdpauinfo libva-vdpau-driver libva-utils nvidia-modprobe
    Wait for the driver to build and reboot to get things up and running.

    Install Required NVidia CUDA and Machine Learning Libraries
    This step relies on using the official nvidia repositories with a little more information available in the RPM Fusion CUDA instructions.

    First of all, add a new yum configuration file.  Copy the following to /etc/yum.repos.d/nvidia.repo:

    [nvidia-cuda]
    name=nvidia-cuda
    enabled=1
    gpgcheck=1
    gpgkey=http://developer.download.nvidia.com/compute/cuda/repos/fedora27/x86_64/7fa2af80.pub
    exclude=akmod-nvidia*,kmod-nvidia*,*nvidia*,nvidia-*,cuda-nvidia-kmod-common,dkms-nvidia,nvidia-libXNVCtrl

    [nvidia-machine-learning]

    name=nvidia-machine-learning
    baseurl=http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/
    enabled=1
    gpgcheck=1
    gpgkey=http://developer.download.nvidia.com/compute/machine-learning/repos/rhel7/x86_64/7fa2af80.pub
    exclude=libcudnn7*.cuda10.1,libnccl*.cuda10.1



    Note that the configuration above deliberately targets the fedora27 repository from NVidia.  This is because it is the location at which we can find CUDA 10.0 compatible libraries rather than CUDA 10.1 libraries that will be found in later repositories.  So the configuration above is likely to need to change over time but essentially the message here is that we can match the version of CUDA required by targeting the appropriate repository from NVidia.  These libraries will be binary compatible with future versions of Fedora so this action should be safe to do for some time yet.



    With the following configuration in place we can now install CUDA 10.0 and the machine learning libraries required for Tensorflow GPU support and all of the libraries get installed in the correct places that Tensorflow expects.

    To install, run:
    • dnf install cuda libcudnn7 libnccl

    Install Tensorflow GPU
    The final piece of the puzzle is to install Tensorflow GPU which is now as easy as:
    • pip3 install tensorflow-gpu

    8 comments:

    Anonymous said...

    Thanks for this post Graham, that is very useful!
    Basically it should work with negativo17 as well, right - only step 1 changes, while step 2 stays (basically, apart from package names) the same?

    Graham White said...

    Yes, this process does work with negativo17 as well. I use the negativo17 approach on my work laptop so I have both working (with the RPM Fusion approach working at home). The only difference I can think of in addition to your observations is that the exclude line changes in the yum configuration you'll need. I have exclude=*cuda10.1* on the nvidia-machine-learning repository when using negativo17.

    Graham White said...

    oh, and I have exclude=cuda* on the negativo17 repository in order to get the cuda packages from the NVidia repo instead.

    Martijn Faassen said...

    A small note: your .repo file has 'enabled=0' in it. It took me a while to figure out that was why it wasn't installing anything. :)

    Graham White said...

    Ah yes, thanks Martin! That would be a cut and paste error since I tend to leave it disabled on a day to day basis. Thanks for letting me know.

    Graham White said...

    Just to confirm these notes are still working for Fedora 33 and Tensorflow-gpu 2.4.1. However, you now need to "dnf install cuda-11-0.x86_64 libcudnn8 libnccl" and hook up your DNF to the RHEL 8 repositories [1][2] because these repositories are the ones with the correct current level of CUDA in them (i.e. CUDA 11.0)

    [1] http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64
    [2] https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/

    Unknown said...

    For Fedora 33, did you still need to have exclude lines in /etc/yum.repos.d/nvidia.repo? If so, do they need to be altered to make sure that the versions of tensorflow, cuda, and cuDNN all line up correctly?

    Thanks for posting this!

    Graham White said...

    No need for the excludes any longer. I'll paste my repo file content below but note I leave them disabled by default (enabled=0) so I can pick and choose manually on the command line when I want to pull new bits from those repositories (using --enablerepo=nvidia-cuda --enablerepo=nvidia-ml)

    [nvidia-cuda]
    name=NVidia Cuda
    baseurl=http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64
    enabled=0
    gpgcheck=1
    gpgkey=http://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/7fa2af80.pub

    [nvidia-ml]
    name=NVidia Machine Learning
    baseurl=https://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/
    enabled=0
    gpgcheck=1
    gpgkey=http://developer.download.nvidia.com/compute/machine-learning/repos/rhel8/x86_64/7fa2af80.pub