TensorFlow 0.7.0 Dockerfile with Python 3

February 17, 2016

In 2015, Google released TensorFlow, a new deep learning framework and tensor library that was similar in many ways to Theano. I enjoy using it a lot more than Theano, mostly because of Theano’s long compile times when using Keras and TensorBoard. This post will not go into detail about using Theano, TensorFlow, or Keras. It is about how I built a Docker image for a slightly older NVIDIA card, which, for my purposes, is capable of using multiple GPUs in isolation and executing a model on one card without affecting the other.

Updates

12/2016

Everything has since been updated to TensorFlow 0.7.0, which I based on my base CUDA Dockerfile for use with TensorFlow or Theano. Depending on my goals, Keras gives me a lot of flexibility when switching between training and compiling workflows.

Background

TensorFlow 0.7.0 had just come out, and while I had built a Docker image for Python 3 with TensorFlow 0.6.0, it was using CUDA v7.0 and cuDNN v2. There is nothing vastly different about these versions, but from my testing with Keras, there is definitely an advantage to using a more recent cuDNN for CNNs. A Dockerfile can also avoid a lot of permission changes that otherwise cause frustrating issues. I tried to see if anyone had written up anything similar and ran into this post (but the linked post uses 7.0 and cuDNN v2):

installing-gpu-enabled-tensorflow-with-python-3-4-in-ec2

I had previously built a Docker image that I believe was faster than the stated ~75 minutes because it used a prebuilt image from NVIDIA and did not require going through the process to obtain cuDNN access. I had not posted much about it since I was unsure how useful it was, and there were already Python 2 Docker builds of TensorFlow available with GPU support. While this is not vastly different from using a prebuilt AMI or something similar, it allows quicker spin-up time than reinstalling for multiple VMs. It also makes a multi-GPU system easier to maintain: specific GPUs can be assigned through Docker, and multiple CUDA/cuDNN versions can be used on the same system with far less hassle. The more I use Docker, the more I enjoy using it for data science/deep learning tasks because it makes it easy to control and isolate models and create data pipelines.

The majority of my Dockerfile is based on the original Google Dockerfile, but the original configuration does not work with older NVIDIA cards or Python 3.

Docker Hub Link

TensorFlow 0.6.0 Dockerfile

I wanted to fix it for 0.7.0, found the blog post, and figured I would write this up.

Requirements

GPU with CUDA compute capability >3.0
Docker >= 1.9
NVIDIA drivers

Note: I am using an NVIDIA GRID K2, so a few parts of this are not the most efficient or generally recommended way to build Docker images for CUDA-capable GPUs. Some of that is just the tradeoff of getting things working with an older, not particularly great GPU.

NVIDIA Drivers:

For Docker and the NVIDIA drivers, I have a setup.sh file that uses driver 352 since driver 361 seems to cause issues. This type of problem, where specific versions do not work properly for a particular card, is more common than you would think. A simplified form of what I install via setup.sh is:

apt-get update && apt-get install -y sshfs curl wget git htop vim software-properties-common
add-apt-repository ppa:graphics-drivers/ppa -y
apt-get update && apt-get install -y nvidia-352 nvidia-settings


# Docker and Docker-Compose Stuff
curl -sSL https://get.docker.com/ | sh
curl -L https://github.com/docker/compose/releases/download/1.5.2/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

git clone https://github.com/NVIDIA/nvidia-docker
cd nvidia-docker && make install
nvidia-docker volume setup
nvidia-docker run nvidia/cuda nvidia-smi

New Dockerfile

The new Dockerfile isn’t terribly different from my original and would translate easily to a normal VM on Amazon or whatnot, but it requires a few changes from my original file that I am listing for my own clarity. The full Dockerfile is here: Dockerfile.

First, I changed to a get-pip.py install of pip3 instead of apt-get install python3-pip due to version-number issues and a protobuf error with the severely outdated official 14.04 repository version.
Second, I used the latest version of Bazel, which seems to work. Since Bazel is a Google product, I am assuming the TensorFlow team uses a reasonably recent version of it.
Third, I set the environment variables needed for the config build. This is straightforward using the official NVIDIA Dockerfile once the required paths are located:
- CUDA_TOOLKIT_PATH=/usr/local/cuda-7.5
- CUDNN_INSTALL_PATH=/usr/lib/x86_64-linux-gnu
- TF_NEED_CUDA=1
- PYTHON_BIN_PATH=/usr/bin/python3
- TF_CUDA_COMPUTE_CAPABILITIES="3.0"
Lastly, I symlinked the cudnn.h file since it is not in the default CUDNN_INSTALL_PATH:
- ln -s /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/cudnn.h

All of these environment variables can be found using find | grep (and this works for Python 3.5 as well).

The Dockerfile also includes some additional steps to install Keras and have it default to TensorFlow. I then use it with a custom script based on the official docker_run_gpu.sh from TensorFlow to choose which GPU each model uses and to restrict CPU usage as well. While there are other examples of people using TensorFlow with GPU through Docker, all the previous examples I found were using Python 2.7.

Any questions or need help?

I would love to help with anything related to Docker/Keras/TensorFlow. Email Me