<aside> 💡 本文只是简单的学习验证,dockerfile并没有仔细研究,有问题可以使用GPT 这里镜像基于nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

</aside>

本次文件目录如下:

image.png

DockerTest.zip

环境用的nvidia官方的镜像(可以自己选其他的),额外安装一个miniconda

# Use the official NVIDIA CUDA image as base
FROM nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04

# Set non-interactive installation to avoid hanging prompts
ARG DEBIAN_FRONTEND=noninteractive

# Install necessary packages and clean up in one layer
RUN apt-get update && apt-get install -y wget bzip2 ca-certificates sudo \\
    && apt-get clean \\
    && rm -rf /var/lib/apt/lists/*

# Add a new user 'myuser' with sudo access
RUN useradd -m myuser && echo "myuser:myuser" | chpasswd && adduser myuser sudo

# Switch to the new user
USER myuser
WORKDIR /home/myuser

# Download and install Miniconda
RUN mkdir -p ~/miniconda3 \\
    && wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh> -O ~/miniconda3/miniconda.sh \\
    && bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 \\
    && rm -rf ~/miniconda3/miniconda.sh

# Set up the Conda environment
ENV PATH=/home/myuser/miniconda3/bin:$PATH

# Make RUN commands use the new environment
SHELL ["/bin/bash", "--login", "-c"]

# Initialize Conda for both bash and zsh shells
RUN conda init bash && conda init zsh && source ~/.bashrc

# Create a new Conda virtual environment named 'myenv'
RUN conda create -n myenv python=3.9 -y

# Install PyTorch and related packages using 'conda run' to ensure the environment is used
RUN conda run -n myenv conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y

# Ensure the virtual environment is activated in future shells
SHELL ["conda", "run", "-n", "myenv", "/bin/bash", "--login", "-c"]

# Copy the requirements.txt file to the container. Place your requirements.txt file in the same directory as the Dockerfile.
COPY requirements.txt /home/myuser/requirements.txt

# Install additional Python packages via pip in the Conda environment
RUN pip install -r /home/myuser/requirements.txt

要运行你的 Dockerfile 并挂载本地代码目录到 Docker 容器中,你需要执行几个步骤。

首先,确保你的 Dockerfile 保存在本地,然后使用 Docker 命令来构建镜像并运行容器,同时挂载本地目录。

这里是一个详细的步骤指导:

构建镜像

打开命令行界面,导航到包含 Dockerfile 的目录。运行以下命令来构建 Docker 镜像:

image.png

> sudo docker build -t my_cuda113_env .
[+] Building 187.4s (15/15) FINISHED                                                                                                     docker:default
 => [internal] load build definition from dockerfile                                                                                               0.0s
 => => transferring dockerfile: 1.88kB                                                                                                             0.0s
 => [internal] load metadata for docker.io/nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04                                                             1.8s
 => [internal] load .dockerignore                                                                                                                  0.0s
 => => transferring context: 2B                                                                                                                    0.0s
 => [ 1/10] FROM docker.io/nvidia/cuda:11.3.1-cudnn8-devel-ubuntu20.04@sha256:bf709b30743d7db557f22a6e10414660f4a4e3cd6ab91b2f9534556c728e5cf7     0.0s
 => [internal] load build context                                                                                                                  0.0s
 => => transferring context: 38B                                                                                                                   0.0s
 => CACHED [ 2/10] RUN apt-get update && apt-get install -y wget bzip2 ca-certificates sudo     && apt-get clean     && rm -rf /var/lib/apt/lists  0.0s
 => CACHED [ 3/10] RUN useradd -m myuser && echo "myuser:myuser" | chpasswd && adduser myuser sudo                                                 0.0s
 => CACHED [ 4/10] WORKDIR /home/myuser                                                                                                            0.0s
 => CACHED [ 5/10] RUN mkdir -p ~/miniconda3     && wget <https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh> -O ~/miniconda3/mi  0.0s
 => [ 6/10] RUN conda init bash && conda init zsh && source ~/.bashrc                                                                              0.6s
 => [ 7/10] RUN conda create -n myenv python=3.9 -y                                                                                               11.6s
 => [ 8/10] RUN conda run -n myenv conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch -y           147.1s
 => [ 9/10] COPY requirements.txt /home/myuser/requirements.txt                                                                                    0.1s 
 => [10/10] RUN pip install -r /home/myuser/requirements.txt                                                                                      16.0s 
 => exporting to image                                                                                                                            10.0s 
 => => exporting layers                                                                                                                           10.0s 
 => => writing image sha256:1714c3e4906c7087ae1bf1b9d021ee1964467367a327cebab57980077945091e                                                       0.0s 
 => => naming to docker.io/library/my_cuda113_env                                                                                                  0.0s 

~/Documents/DockerTest                                                                                                               took 3m 8s py base
> 

> sudo docker images
REPOSITORY        TAG              IMAGE ID       CREATED         SIZE
my_cuda113_env    latest           1714c3e4906c   3 minutes ago   17.7GB
hello-world       latest           d2c94e258dcb   16 months ago   13.3kB

运行容器


> sudo docker run -it -v /home/rylynn/Documents/DockerTest/WorkSpace:/home/myuser/workspace/project -p 20011:22 --gpus all --name hyperdta my_cuda113_env
# 这里的命令解释如下:
#  `docker run` 是运行新容器的命令。
#  `-it` 让 Docker 容器的运行在交互模式下,并分配一个伪终端。
#  `-gpus all` 允许 Docker 容器访问主机上的所有 GPU。这需要你的 Docker 版本支持 GPU,并且主机上配置了 NVIDIA Docker Toolkit。
#  `-name hyperdta` 给容器设定一个名称,使其更容易通过名字管理。
#  `-v /home/rylynn/Documents/DockerTest/WorkSpace` 将本地目录 `/path/to/your/code` 挂载到容器的 `/home/myuser/workspace/project` 目录。这意味着容器中的 `/home/myuser/workspace/project` 目录将映射到你的本地目录,你对此目录中文件的任何更改都会反映在本地。
#  `my_cuda113_env` 是你之前创建的 Docker 镜像的名称。
==========
== CUDA ==
==========

CUDA Version 11.3.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

(base) myuser@6f352176c34d:~$ conda env list
# conda environments:
#
base                  *  /home/myuser/miniconda3
myenv                    /home/myuser/miniconda3/envs/myenv

(base) myuser@6f352176c34d:~$ conda activate myenv
(myenv) myuser@6f352176c34d:~$ pip list
Package            Version
------------------ ---------
Brotli             1.0.9
certifi            2024.8.30
charset-normalizer 3.3.2
idna               3.7
joblib             1.4.2
mkl_fft            1.3.10
mkl_random         1.2.7
mkl-service        2.4.0
numpy              1.26.4
packaging          24.1
pillow             10.4.0
pip                24.2
prefetch_generator 1.0.3
protobuf           5.28.0
PySocks            1.7.1
requests           2.32.3
scikit-learn       1.5.1
scipy              1.13.1
setuptools         72.1.0
tensorboardX       2.6.2.2
threadpoolctl      3.5.0
torch              1.12.1
torchaudio         0.12.1
torchvision        0.13.1
tqdm               4.66.5
typing_extensions  4.11.0
urllib3            2.2.2
wheel              0.43.0
(myenv) myuser@6f352176c34d:~$ cd workspace/project/
(myenv) myuser@6f352176c34d:~/workspace/project$ ls
TestCuda
(myenv) myuser@6f352176c34d:~/workspace/project$ cd TestCuda/
(myenv) myuser@6f352176c34d:~/workspace/project/TestCuda$ ls
main.py
(myenv) myuser@6f352176c34d:~/workspace/project/TestCuda$ python main.py 
CUDA版本: 11.3
Pytorch版本: 1.12.1
显卡是否可用: 可用
显卡数量: 1
是否支持BF16数字格式: 支持
当前显卡型号: NVIDIA GeForce RTX 4060 Ti
当前显卡的CUDA算力: (8, 9)
当前显卡的总显存: 15.69818115234375 GB
是否支持TensorCore: 支持
当前显卡的显存使用率: 0.0 %
(myenv) myuser@6f352176c34d:~/workspace/project/TestCuda$ 

保存镜像


# ctrl p  ctrl q 挂起容器
> sudo docker ps -al
CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS         PORTS                                     NAMES
6f352176c34d   my_cuda113_env   "/opt/nvidia/nvidia_…"   2 minutes ago   Up 2 minutes   0.0.0.0:20011->22/tcp, :::20011->22/tcp   hyperdta
# 停止容器
> sudo docker stop hyperdta
hyperdta
> sudo docker ps -al
CONTAINER ID   IMAGE            COMMAND                  CREATED         STATUS                     PORTS     NAMES
6f352176c34d   my_cuda113_env   "/opt/nvidia/nvidia_…"   4 minutes ago   Exited (0) 9 seconds ago             hyperdta

# 保存镜像
> sudo docker save -o torch121_cu113.tar my_cuda113_env:latest
> ll -alh
total 17G
drwxrwxr-x  3 rylynn rylynn 4.0K 9月   6 18:08 .
drwxr-xr-x 10 rylynn rylynn 4.0K 9月   6 17:15 ..
-rw-rw-r--  1 rylynn rylynn 1.9K 9月   6 17:51 dockerfile
-rw-rw-r--  1 rylynn rylynn 1.6K 9月   6 17:22 environment.yml
-rw-rw-r--  1 rylynn rylynn  171 9月   6 17:27 requirements.txt
-rw-------  1 root   root    17G 9月   6 18:08 **torch121_cu113.tar**
drwxrwxr-x  3 rylynn rylynn 4.0K 9月   6 17:56 WorkSpace

# 删除镜像
> sudo docker rmi my_cuda113_env:latest
Error response from daemon: conflict: unable to remove repository reference "my_cuda113_env:latest" (must force) - container 6f352176c34d is using its referenced image 1714c3e4906c
> sudo docker rm hyperdta
hyperdta
> sudo docker rmi my_cuda113_env:latest
Untagged: my_cuda113_env:latest
Deleted: sha256:1714c3e4906c7087ae1bf1b9d021ee1964467367a327cebab57980077945091e

> sudo docker images
REPOSITORY        TAG              IMAGE ID       CREATED         SIZE
my_cuda_env       latest           e567a7c2759b   3 months ago    9.17GB
mortals/codeenv   conda-cuda11.8   2e3c9ce8b870   15 months ago   8GB
hello-world       latest           d2c94e258dcb   16 months ago   13.3kB

# 从文件加载镜像
> sudo docker load -i torch121_cu113.tar
Loaded image: my_cuda113_env:latest
> sudo docker images
REPOSITORY        TAG              IMAGE ID       CREATED          SIZE
my_cuda113_env    latest           1714c3e4906c   28 minutes ago   17.7GB
my_cuda_env       latest           e567a7c2759b   3 months ago     9.17GB
mortals/codeenv   conda-cuda11.8   2e3c9ce8b870   15 months ago    8GB
hello-world       latest           d2c94e258dcb   16 months ago    13.3kB

# 再次测试容器
> sudo docker run -it -v /home/rylynn/Documents/DockerTest/WorkSpace:/home/myuser/workspace/project -p 20011:22 --gpus all --name hyperdta my_cuda113_env

==========
== CUDA ==
==========

CUDA Version 11.3.1

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
<https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license>

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

(base) myuser@38b513481f4b:~$ conda env list
# conda environments:
#
base                  *  /home/myuser/miniconda3
myenv                    /home/myuser/miniconda3/envs/myenv

(base) myuser@38b513481f4b:~$ conda activate myenv
(myenv) myuser@38b513481f4b:~$ cd workspace/project/TestCuda/
(myenv) myuser@38b513481f4b:~/workspace/project/TestCuda$ python main.py 
CUDA版本: 11.3
Pytorch版本: 1.12.1
显卡是否可用: 可用
显卡数量: 1
是否支持BF16数字格式: 支持
当前显卡型号: NVIDIA GeForce RTX 4060 Ti
当前显卡的CUDA算力: (8, 9)
当前显卡的总显存: 15.69818115234375 GB
是否支持TensorCore: 支持
当前显卡的显存使用率: 0.0 %
(myenv) myuser@38b513481f4b:~/workspace/project/TestCuda$ 

> sudo docker stop hyperdta
hyperdta
> sudo docker rm hyperdta
hyperdta