반응형
이 글은 해당 포스팅을 정리한 것임을 명시한다.
현상 : 텐서플로를 import 하면 다음과 같은 메세지가 출력 됨.
"successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero"
2021-09-30 06:01:45.056379: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-09-30 06:01:45.109357: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.110002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:01:45.110055: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.110641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2021-09-30 06:01:45.110786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:01:45.111547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:01:45.112221: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:01:45.112385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:01:45.113269: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:01:45.113946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:01:45.116032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:01:45.116081: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.116689: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.117274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.117852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.118436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:01:45.118778: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-09-30 06:01:45.140487: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2021-09-30 06:01:45.141043: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b624f0 executing computations on platform Host. Devices:
2021-09-30 06:01:45.141055: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2021-09-30 06:01:45.268408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.275972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.276820: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4c15030 executing computations on platform CUDA. Devices:
2021-09-30 06:01:45.276836: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:01:45.276840: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:01:45.277066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.277710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:01:45.277760: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.278429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86 pciBusID: 0000:03:00.0
2021-09-30 06:01:45.278458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:01:45.278468: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:01:45.278477: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:01:45.278486: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:01:45.278500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:01:45.278514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:01:45.278530: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:01:45.278574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.279231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.279897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.280563: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.281211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:01:45.281237: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:01:45.282765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-30 06:01:45.282777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1
2021-09-30 06:01:45.282782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N
2021-09-30 06:01:45.282785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N
2021-09-30 06:01:45.282876: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.283543: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.284214: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.284874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22687 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-09-30 06:01:45.285229: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.285885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22803 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6)
Numa 라는 것이 디바이스를 못잡는다! 제대로 할당이 안되어있다! 라는 것 같다.
따라서 글 대로 할당을 해줬다.
1.1. Device 확인
$ lspci | grep -i nvidia
# 체크
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
# 체크
03:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
1.2. Device 확인
$ cd /sys/bus/pci/devices
$ ll
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:01:00.0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:01:00.1 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:02:00.0 -> ../../../devices/pci0000:00/0000:00:1b.0/0000:02:00.0/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:03:00.0 -> ../../../devices/pci0000:00/0000:00:1b.4/0000:03:00.0/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:03:00.1 -> ../../../devices/pci0000:00/0000:00:1b.4/0000:03:00.1/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:05:00.0 -> ../../../devices/pci0000:00/0000:00:1c.7/0000:05:00.0/
2. NUMA 할당 상태 확인 (-1이 제대로 할당이 되지 않았다는 의미)
$ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
-1
$ cat /sys/bus/pci/devices/0000\:03\:00.0/numa_node
-1
3. NUMA 할당
### Main
$ echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0
$ echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:03\:00.0/numa_node
0
#### 번외 (인지된 Device모두에 대해서 수행하는 명령어)
$ for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done
4. NUMA 할당 확인
$ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0
$ cat /sys/bus/pci/devices/0000\:03\:00.0/numa_node
0
5. 텐서플로 실행해서 다시 확인
2021-09-30 06:16:26.425963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-09-30 06:16:26.474219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:16:26.474847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2021-09-30 06:16:26.474989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:16:26.475760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:16:26.476429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:16:26.476603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:16:26.477497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:16:26.478204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:16:26.480328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:16:26.482672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:16:26.483023: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-09-30 06:16:26.504654: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2021-09-30 06:16:26.507280: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5053250 executing computations on platform Host. Devices:
2021-09-30 06:16:26.507342: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2021-09-30 06:16:26.625317: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5105d90 executing computations on platform CUDA. Devices:
2021-09-30 06:16:26.625337: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:16:26.625341: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:16:26.626046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:16:26.626606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2021-09-30 06:16:26.626632: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:16:26.626640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:16:26.626648: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:16:26.626655: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:16:26.626662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:16:26.626668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:16:26.626675: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:16:26.628835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:16:26.628856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:16:26.630177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-30 06:16:26.630187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1
2021-09-30 06:16:26.630191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N N
2021-09-30 06:16:26.630195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: N N
2021-09-30 06:16:26.631893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22686 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-09-30 06:16:26.632673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22803 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6)
추가 참고 : https://lifesaver.codes/answer/successful-numa-node-read-from-sysfs-had-negative-value-1-42738
반응형
'Development > Python' 카테고리의 다른 글
[Python] 사용자 정의 함수 사용 시 주의 할 점 (0) | 2021.10.06 |
---|---|
Pandas Merge시 행 증가하는 문제 원인 및 해결 (0) | 2021.10.01 |
[Python] 오늘 날짜, 영업일 계산(Business day, Working day), 두 날짜 사이의 리스트, 달의 마지막 날 출력하기 (2) | 2021.09.30 |
[Python] 한 Column에 대해서 Dictionary를 이용해 값을 바꾸는 방법 (0) | 2021.09.30 |
[Python] Pandas Dataframe 중복제거하기 (0) | 2021.09.29 |
댓글