June 23, 2024


Epicurean computer & technology

Deploy your containerized AI applications with nvidia-docker

5 min read


Extra and extra products and solutions and products and services are taking benefit of the modeling and prediction capabilities of AI. This write-up provides the nvidia-docker instrument for integrating AI (Synthetic Intelligence) software package bricks into a microservice architecture. The primary edge explored listed here is the use of the host system’s GPU (Graphical Processing Unit) assets to speed up many containerized AI applications.

To understand the usefulness of nvidia-docker, we will begin by describing what kind of AI can profit from GPU acceleration. Next we will current how to apply the nvidia-docker software. At last, we will describe what instruments are available to use GPU acceleration in your purposes and how to use them.

Why applying GPUs in AI programs?

In the field of synthetic intelligence, we have two principal subfields that are utilised: machine learning and deep mastering. The latter is aspect of a larger sized spouse and children of machine finding out strategies dependent on synthetic neural networks.

In the context of deep learning, where by operations are effectively matrix multiplications, GPUs are additional economical than CPUs (Central Processing Models). This is why the use of GPUs has grown in current several years. In fact, GPUs are deemed as the heart of deep discovering mainly because of their massively parallel architecture.

Having said that, GPUs can’t execute just any application. In truth, they use a unique language (CUDA for NVIDIA) to choose gain of their architecture. So, how to use and converse with GPUs from your apps?

The NVIDIA CUDA technologies

NVIDIA CUDA (Compute Unified System Architecture) is a parallel computing architecture combined with an API for programming GPUs. CUDA translates application code into an instruction set that GPUs can execute.

A CUDA SDK and libraries such as cuBLAS (Standard Linear Algebra Subroutines) and cuDNN (Deep Neural Network) have been produced to talk easily and effectively with a GPU. CUDA is readily available in C, C++ and Fortran. There are wrappers for other languages including Java, Python and R. For example, deep learning libraries like TensorFlow and Keras are dependent on these technologies.

Why using nvidia-docker?

Nvidia-docker addresses the requirements of developers who want to incorporate AI operation to their programs, containerize them and deploy them on servers run by NVIDIA GPUs.

The aim is to set up an architecture that will allow the progress and deployment of deep understanding types in companies readily available by way of an API. As a result, the utilization rate of GPU resources is optimized by creating them available to numerous application cases.

In addition, we gain from the positive aspects of containerized environments:

  • Isolation of circumstances of every single AI model.
  • Colocation of several models with their specific dependencies.
  • Colocation of the very same model less than numerous versions.
  • Steady deployment of versions.
  • Design effectiveness monitoring.

Natively, using a GPU in a container involves installing CUDA in the container and providing privileges to access the gadget. With this in mind, the nvidia-docker device has been developed, making it possible for NVIDIA GPU equipment to be uncovered in containers in an isolated and protected fashion.

At the time of composing this post, the newest version of nvidia-docker is v2. This model differs considerably from v1 in the subsequent techniques:

  • Edition 1: Nvidia-docker is executed as an overlay to Docker. That is, to produce the container you experienced to use nvidia-docker (Ex: nvidia-docker run ...) which performs the actions (between other individuals the generation of volumes) allowing to see the GPU products in the container.
  • Edition 2: The deployment is simplified with the substitution of Docker volumes by the use of Docker runtimes. Without a doubt, to start a container, it is now required to use the NVIDIA runtime by using Docker (Ex: docker run --runtime nvidia ...)

Observe that owing to their diverse architecture, the two versions are not suitable. An software written in v1 will have to be rewritten for v2.

Placing up nvidia-docker

The needed features to use nvidia-docker are:

  • A container runtime.
  • An out there GPU.
  • The NVIDIA Container Toolkit (primary component of nvidia-docker).



A container runtime is needed to operate the NVIDIA Container Toolkit. Docker is the encouraged runtime, but Podman and containerd are also supported.

The formal documentation offers the set up treatment of Docker.


Drivers are expected to use a GPU machine. In the scenario of NVIDIA GPUs, the drivers corresponding to a specified OS can be attained from the NVIDIA driver obtain web page, by filling in the facts on the GPU design.

The set up of the drivers is completed via the executable. For Linux, use the pursuing instructions by replacing the name of the downloaded file:

chmod +x NVIDIA-Linux-x86_64-470.94.run

Reboot the host machine at the end of the set up to acquire into account the mounted drivers.

Setting up nvidia-docker

Nvidia-docker is out there on the GitHub undertaking page. To set up it, follow the set up handbook based on your server and architecture particulars.

We now have an infrastructure that permits us to have isolated environments supplying obtain to GPU resources. To use GPU acceleration in purposes, several resources have been developed by NVIDIA (non-exhaustive checklist):

  • CUDA Toolkit: a established of resources for acquiring program/packages that can conduct computations working with equally CPU, RAM, and GPU. It can be made use of on x86, Arm and Power platforms.
  • NVIDIA cuDNN](https://developer.nvidia.com/cudnn): a library of primitives to speed up deep mastering networks and optimize GPU overall performance for significant frameworks these types of as Tensorflow and Keras.
  • NVIDIA cuBLAS: a library of GPU accelerated linear algebra subroutines.

By applying these applications in application code, AI and linear algebra responsibilities are accelerated. With the GPUs now seen, the application is able to mail the knowledge and functions to be processed on the GPU.

The CUDA Toolkit is the lowest level choice. It offers the most command (memory and guidelines) to establish tailor made applications. Libraries provide an abstraction of CUDA functionality. They permit you to concentrate on the application advancement relatively than the CUDA implementation.

After all these components are executed, the architecture making use of the nvidia-docker services is completely ready to use.

In this article is a diagram to summarize all the things we have witnessed:



We have set up an architecture permitting the use of GPU resources from our applications in isolated environments. To summarize, the architecture is composed of the pursuing bricks:

  • Operating system: Linux, Home windows …
  • Docker: isolation of the natural environment utilizing Linux containers
  • NVIDIA driver: set up of the driver for the hardware in problem
  • NVIDIA container runtime: orchestration of the former a few
  • Programs on Docker container:
    • CUDA
    • cuDNN
    • cuBLAS
    • Tensorflow/Keras

NVIDIA proceeds to produce instruments and libraries all around AI systems, with the target of creating by itself as a chief. Other systems may well complement nvidia-docker or may well be a lot more appropriate than nvidia-docker based on the use case.


Supply hyperlink