edit: everything has since been updated to Tensorflow 0.7.0 which I based off of my base cuda dockerfile to use with tensorflow or theano (depending on my goals, Keras allows great flexibility in between training vs compiling)
In 2015 Google came out with a new deep learning framework/tensor library similar in many ways to Theano and I enjoy using it a lot more than Theano simply due to long compile times of Theano when using Keras and TensorBoard. This will not go into detail about using Theano or TensorFlow or Keras but instead is how I built a docker image that uses a slightly older nvidia card (which for my purposes is capable of using multiple gpu’s in isolation and exiting a model on one card and not effecting the other).
The most recent version of TensorFlow 0.7.0 just came out, and while i had built a docker image for python3 with tensorflow 0.6.0, it was using cuda v7.0 and cudnn2. While there is nothing vastly different about these, their is definitely an advantage to using a more recent cuDNN for CNN’s from my testing with Keras. Along with this, there is far less altering of permissions with a dockerfile that can cause frustrating issues. I had tried to see if anyone had written up anything similar and ran into this post (but this post uses 7.0 and cuDNN v2):
I had previously built a docker image prior to this post that I believe was faster than the stated ~75 minutes due to using a prebuilt image from nvidia (and not having to go through the process to obtain cudnn access) but had not really posted most about it since I was unsure how useful it was and there were python2 dockerbuilds of tensorflow available with GPU support. While this isn’t vastly different than using a prebuilt ami or something similar it allows quicker spinup time than reinstalling for multiple VM’s and more maintainability in a system with multiple gpu’s with Keras by allowing specific gpu’s to be used via a docker system (as well as using multiple different cuda/cudnn versions on the same system with far less hassle). Along with this, the more I use Docker, the more I come to enjoy using it for data science/deep learning tasks due to the ability to control and isolate models and the ability to create data pipelines with ease.
Majority of my dockerfile is based off the original google dockerfile but original configuration doesn’t work with older nvidia cards or python3.
Docker Hub Link
TensorFlow 0.6.0 Dockerfile
but wanted to fix it for 0.7.0 and found the blog post and figured I would write this up.
- GPU with 3.0> cuda compute capabilities
- Docker >= 1.9
- Nvidia drivers
Note: Since I am using the Nvidia Grid K2, this may not be the most efficient or best way to build docker images for gpu’s with cuda capabilities >3.0
For Docker and Nvidia drivers I have a setup.sh file that uses 352 since 361 seems to cause issues. A simplified form of what I install (via a setup.sh script) is:
apt-get update && apt-get install -y sshfs curl wget git htop vim software-properties-common
add-apt-repository ppa:graphics-drivers/ppa -y
apt-get update && apt-get install -y nvidia-352 nvidia-settings
# Docker and Docker-Compose Stuff
curl -sSL https://get.docker.com/ | sh
curl -L https://github.com/docker/compose/releases/download/1.5.2/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
git clone https://github.com/NVIDIA/nvidia-docker
cd nvidia-docker && make install
nvidia-docker volume setup
nvidia-docker run nvidia/cuda nvidia-smi
The new dockerfile isnt terribly different from my original (and would translate easy to just a normal VM on amazon or what not) but requires a few changes from my original file that I am listing for my own clarity. The full dockerfile is here: Dockerfile
First, changing to get-pip.py install of pip3 vs apt-get install python3-pip due to version number and error with protobuf with the severely outdated official 14.04 repo version.
Second, using the latest version of Bazel seems to work which seems smart to track along with due to it being a google product that I am somewhat assuming the tensorflow team uses a recent version of.
Third, setting env so config builds. Straight forward using the official nvidia dockerfile and locating required files including:
Lastly, symlink the cudnn.h file since it is not in the default CUDNN_INSTALL_PATH:
- ln -s /usr/include/cudnn.h /usr/lib/x86_64-linux-gnu/cudnn.h
All of these env variables are findable using find | grep (and this works for python 3.5 as well).
The dockerfile also includes a bunch of extra stuff to install Keras and have it default to TensorFlow (which I then use with a custom script based off the official docker_run_gpu.sh from TensorFlow to customize which GPU is being used by which model being run and allows CPU restriction as well). While there is other examples of people using TensorFlow with GPU through Docker, all the previous examples were using Python2.7
Any questions or need help?
I would love to help with any aspects regarding Docker/Keras/TensorFlow