목차
======================================================
0. 상황
1. NGC
2. NGC 공식 Tensorflow Image 및 Container 구성
3. 컨테이너 실행
4. Tensorflow1 수행 테스트
======================================================
0. 상황
OS : Ubuntu 20.04 LTS
GPU Device : rtx3090 x 2
부득이하게 Tensorflow 1버전의 코드를 RTX3090환경에서 수행을 해야 하는 상황이 발생.
RTX3090가 인식하는 CUDA의 가장 낮은 버전은 11.1 임. (참고 : 클릭)
그런데! Tensorflow1.x은 CUDA 버전이 10인 환경에서 수행이 가능. (참고 : 클릭)
따라서, CUDA 버전이 11인 환경에서 빌드 된 Tensorflow1 을 구현해야한다는 조건에 맞지않는 상황이 벌어진 것!
========================================================================
GPU Device, Nvidia Driver, CUDA, Python, Tensorflow 간 호환성의 전반적인 내용을 확인하고 싶다면 해당 포스팅을 참조할 것을 권합니다! (https://sseongju1.tistory.com/4)
========================================================================
1. NGC
그리하여 이번에 해결책으로 소개할 내용이 NGC(Nvidia GPU Cloud) 입니다!
NGC는 NVIDIA에서 제공하는 GPU 소프트웨어 패키지 플랫폼으로, Nvidia에서 공식으로 제공하는 Tensorflow 컨테이너와 벤치마킹을 위한 최신 모델을 제공합니다. NGC에서 제공하는 Tensorflow 컨테이너는 Nvidia의 GPU를 최대한 활용할 수 있도록 설계되었으며 Tensorflow가 구동되기 위한 모든 셋업이 완료 된 상태입니다. NGC에 대한 자세한 설명은 공식 홈페이지 를 참조하기 바랍니다.
넘어가서, 이 NGC에서 제공하는 Tensorflow1 컨테이너가 바로 CUDA 11.2 환경에서 빌드가 된 컨테이너 입니다!
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.
To avoid this, run the container by specifying your user's userid:
$ docker run -u $(id -u):$(id -g) args...
root@191aab62c04f:/# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
2. NGC 공식 Tensorflow Image 및 Container 구성
사전 준비 1 : Docker 설치
사전 준비 2 : nvidia-docker2 설치 (이거 하나만 수행해도 됨)
따라서, RTX3090 환경에서 Tensorflow1을 수행하기 위해서 NGC의 Tensorflow 공식 이미지를 사용하기로 했습니다. (공식 이미지를 그대로 사용해도 되지만, 추가적으로 설치해야하는 패키지가 있으므로 필자는 Dockerfile을 활용하여 Build 했습니다.)
# Dockerfile
######### 1. NGC의 Tensorflow 공식 이미지
ARG BASE_IMAGE=nvcr.io/nvidia/tensorflow:20.10-tf1-py3
FROM $BASE_IMAGE
# 필수 설치요소들 설치
######### 2. GPG Key 변경
RUN apt-key del 7fa2af80 \
&& apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/3bf863cc.pub \
&& apt-get update \
&& apt-get install -y build-essential \
&& pip install --upgrade pip \
&& pip install setuptools \
wheel \
matplotlib \
pandas \
numpy \
pymysql \
sqlalchemy \
jupyter \
scipy \
&& jupyter notebook --generate-config \
&& echo $'\
# from IPython.lib import passwd \n\
# password = passwd("비밀번호")\n\
# c.NotebookApp.password=password \n\
######### 3. 기본 Shell Bash로 변경
c.NotebookApp.terminado_settings = { "shell_command": ["/bin/bash"] }' \
>> /root/.jupyter/jupyter_notebook_config.py
Dockerfile 내 1번 항목 관련 링크 : 클릭
Dockerfile 내 2번 항목 관련 링크 : 클릭
Dockerfile 내 3번 항목 관련 링크 : 클릭
3. 컨테이너 실행
$ docker run -it --rm --gpus all --runtime=nvidia -p 8988:8988 -v "/home/dir:/home/dir" 이미지명:태그명 bash
________ _______________
___ __/__________________________________ ____/__ /________ __
__ / _ _ \_ __ \_ ___/ __ \_ ___/_ /_ __ /_ __ \_ | /| / /
_ / / __/ / / /(__ )/ /_/ / / _ __/ _ / / /_/ /_ |/ |/ /
/_/ \___//_/ /_//____/ \____//_/ /_/ /_/ \____/____/|__/
WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.
To avoid this, run the container by specifying your user's userid:
$ docker run -u $(id -u):$(id -g) args...
4. Tensorflow1 수행 테스트
import tensorflow as tf
print(tf.__version__)
# 1. check if tensorflow gpu is installed
print(tf.test.gpu_device_name())
# 2. tensorflow gpu test
tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
root@9cc5942d5a91:/workspace# python
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2022-05-12 08:31:56.425101: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.
>>> tf.__version__
'1.15.4'
>>> print(tf.test.gpu_device_name())
2022-05-12 08:32:00.792228: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2022-05-12 08:32:00.793078: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5af91b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-05-12 08:32:00.793101: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2022-05-12 08:32:00.795571: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-05-12 08:32:00.941439: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5afb200 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-05-12 08:32:00.941464: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2022-05-12 08:32:00.941469: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2022-05-12 08:32:00.942351: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2022-05-12 08:32:00.942956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2022-05-12 08:32:00.942979: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2022-05-12 08:32:00.951084: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2022-05-12 08:32:00.954832: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-05-12 08:32:00.956465: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-05-12 08:32:00.964051: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2022-05-12 08:32:00.965947: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2022-05-12 08:32:00.966057: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2022-05-12 08:32:00.968509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0, 1
2022-05-12 08:32:00.968955: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2022-05-12 08:32:01.909406: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-12 08:32:01.909439: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0 1
2022-05-12 08:32:01.909444: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N N
2022-05-12 08:32:01.909447: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 1: N N
2022-05-12 08:32:01.911201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/device:GPU:0 with 21965 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2022-05-12 08:32:01.911981: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/device:GPU:1 with 16811 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6)
/device:GPU:0
>>> tf.test.is_gpu_available(cuda_only=False, min_cuda_compute_capability=None)
2022-05-12 08:32:29.198005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2022-05-12 08:32:29.198520: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1665] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2022-05-12 08:32:29.198543: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2022-05-12 08:32:29.198556: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2022-05-12 08:32:29.198566: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2022-05-12 08:32:29.198575: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2022-05-12 08:32:29.198583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.11
2022-05-12 08:32:29.198594: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2022-05-12 08:32:29.198603: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2022-05-12 08:32:29.200663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1793] Adding visible gpu devices: 0, 1
2022-05-12 08:32:29.200690: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1206] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-05-12 08:32:29.200696: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] 0 1
2022-05-12 08:32:29.200699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 0: N N
2022-05-12 08:32:29.200702: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1225] 1: N N
2022-05-12 08:32:29.202326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/device:GPU:0 with 21965 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2022-05-12 08:32:29.202828: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1351] Created TensorFlow device (/device:GPU:1 with 16811 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6)
True
성공적 으로 완료 했습니다!!!
참조 : Tensorflow1 GPU Test Code
'Development > Docker' 카테고리의 다른 글
[Docker] iptables: No chain/target/match by that name 에러 해결 (0) | 2022.06.07 |
---|---|
[Docker] Tensorflow Container 빌드 시 에러 해결 (W: GPG error) (6) | 2022.05.10 |
[Docker] 도커 이미지와 컨테이너 조회 및 삭제 방법 (사용X 포함) (0) | 2022.05.10 |
[Docker] 컨테이너의 Jupyter Notebook 기본 Shell을 bash로 변경 및 설정하는 방법 (0) | 2022.05.10 |
[Docker] WARNING: Retrying after connection broken by NewConnectionError 문제 해결 (0) | 2022.05.04 |
댓글