본문 바로가기
Development/Python

[Python] Numa Node 0 에러 메세지 해결

by 성딱이 2021. 9. 30.
반응형

이 글은 해당 포스팅을 정리한 것임을 명시한다.

 

현상 : 텐서플로를 import 하면 다음과 같은 메세지가 출력 됨.

"successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero"

2021-09-30 06:01:45.056379: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-09-30 06:01:45.109357: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.110002: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:01:45.110055: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.110641: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2021-09-30 06:01:45.110786: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:01:45.111547: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:01:45.112221: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:01:45.112385: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:01:45.113269: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:01:45.113946: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:01:45.116032: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:01:45.116081: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.116689: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.117274: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.117852: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.118436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:01:45.118778: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-09-30 06:01:45.140487: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2021-09-30 06:01:45.141043: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4b624f0 executing computations on platform Host. Devices:
2021-09-30 06:01:45.141055: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2021-09-30 06:01:45.268408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.275972: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.276820: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4c15030 executing computations on platform CUDA. Devices:
2021-09-30 06:01:45.276836: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:01:45.276840: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:01:45.277066: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.277710: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:01:45.277760: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.278429: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86 pciBusID: 0000:03:00.0
2021-09-30 06:01:45.278458: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:01:45.278468: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:01:45.278477: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:01:45.278486: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:01:45.278500: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:01:45.278514: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:01:45.278530: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:01:45.278574: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.279231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.279897: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.280563: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.281211: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:01:45.281237: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:01:45.282765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-30 06:01:45.282777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1
2021-09-30 06:01:45.282782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N
2021-09-30 06:01:45.282785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N
2021-09-30 06:01:45.282876: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.283543: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.284214: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.284874: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22687 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-09-30 06:01:45.285229: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-09-30 06:01:45.285885: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22803 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6)

 

 

Numa 라는 것이 디바이스를 못잡는다! 제대로 할당이 안되어있다! 라는 것 같다.

따라서 글 대로 할당을 해줬다.

 

1.1. Device 확인  
$ lspci | grep -i nvidia
# 체크
01:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)

# 체크
03:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
03:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)

 

1.2. Device 확인  
$ cd /sys/bus/pci/devices
$ ll
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:01:00.0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:01:00.1 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.1/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:02:00.0 -> ../../../devices/pci0000:00/0000:00:1b.0/0000:02:00.0/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:03:00.0 -> ../../../devices/pci0000:00/0000:00:1b.4/0000:03:00.0/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:03:00.1 -> ../../../devices/pci0000:00/0000:00:1b.4/0000:03:00.1/
lrwxrwxrwx 1 root root 0 Oct 13 08:33 0000:05:00.0 -> ../../../devices/pci0000:00/0000:00:1c.7/0000:05:00.0/

 

 

2. NUMA 할당 상태 확인 (-1이 제대로 할당이 되지 않았다는 의미)
$ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
-1

$ cat /sys/bus/pci/devices/0000\:03\:00.0/numa_node
-1

 

 

3. NUMA 할당 
### Main
$ echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0

$ echo 0 | sudo tee -a /sys/bus/pci/devices/0000\:03\:00.0/numa_node
0

#### 번외 (인지된 Device모두에 대해서 수행하는 명령어)
$ for a in /sys/bus/pci/devices/*; do echo 0 | sudo tee -a $a/numa_node; done

 

 

4. NUMA 할당 확인
$ cat /sys/bus/pci/devices/0000\:01\:00.0/numa_node
0

$ cat /sys/bus/pci/devices/0000\:03\:00.0/numa_node
0

 

 

5. 텐서플로 실행해서 다시 확인
2021-09-30 06:16:26.425963: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-09-30 06:16:26.474219: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:16:26.474847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2021-09-30 06:16:26.474989: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:16:26.475760: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:16:26.476429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:16:26.476603: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:16:26.477497: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:16:26.478204: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:16:26.480328: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:16:26.482672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:16:26.483023: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2021-09-30 06:16:26.504654: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3699850000 Hz
2021-09-30 06:16:26.507280: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5053250 executing computations on platform Host. Devices:
2021-09-30 06:16:26.507342: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
2021-09-30 06:16:26.625317: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5105d90 executing computations on platform CUDA. Devices:
2021-09-30 06:16:26.625337: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:16:26.625341: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (1): NVIDIA GeForce RTX 3090, Compute Capability 8.6
2021-09-30 06:16:26.626046: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:01:00.0
2021-09-30 06:16:26.626606: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties:
name: NVIDIA GeForce RTX 3090 major: 8 minor: 6 memoryClockRate(GHz): 1.86
pciBusID: 0000:03:00.0
2021-09-30 06:16:26.626632: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:16:26.626640: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-09-30 06:16:26.626648: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-09-30 06:16:26.626655: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-09-30 06:16:26.626662: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-09-30 06:16:26.626668: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-09-30 06:16:26.626675: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-09-30 06:16:26.628835: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
2021-09-30 06:16:26.628856: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-09-30 06:16:26.630177: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-30 06:16:26.630187: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0 1
2021-09-30 06:16:26.630191: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N N
2021-09-30 06:16:26.630195: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1:   N N
2021-09-30 06:16:26.631893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22686 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:01:00.0, compute capability: 8.6)
2021-09-30 06:16:26.632673: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22803 MB memory) -> physical GPU (device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:03:00.0, compute capability: 8.6)

 

 

추가 참고 : https://lifesaver.codes/answer/successful-numa-node-read-from-sysfs-had-negative-value-1-42738

 

 

 

반응형

댓글