Ask questionstensorflow-gpu CUPTI errors
<em>Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template</em>
System information
Describe the problem I upgraded TensorFlow from 1.x to 2.0 and tried to run the same model as I successfully ran with TF 1.x.
Previously, I already had NVIDIA drivers and CUDA toolkit installed, and therefore I just installed tensorflow-gpu in new virtual environment. Upon running, I got the following output with errors/warnings:
2019-10-03 00:16:43.397431: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-10-03 00:16:45.096352: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once. 2019-10-03 00:17:16.303969: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started. 2019-10-03 00:17:16.304314: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcupti.so.10.0'; dlerror: libcupti.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 2019-10-03 00:17:16.304328: W tensorflow/core/profiler/lib/profiler_session.cc:192] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found. 2019-10-03 00:17:17.242794: I tensorflow/core/platform/default/device_tracer.cc:588] Collecting 0 kernel records, 0 memcpy records. 2019-10-03 00:17:17.276983: E tensorflow/core/platform/default/device_tracer.cc:70] CUPTI error: CUPTI could not be loaded or symbol could not be found.
In order to fix these warnings, I ran the NVIDIA/CUDA-related commands, listed on tensorflow-gpu installation page:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub sudo apt-get update wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt-get update
sudo apt-get install --no-install-recommends nvidia-driver-418
sudo apt-get install --no-install-recommends
cuda-10-0
libcudnn7=7.6.2.24-1+cuda10.0
libcudnn7-dev=7.6.2.24-1+cuda10.0
sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0
libnvinfer-dev=5.1.5-1+cuda10.0
after runnug this, I got the following errir:
: Version '7.6.2.24-1+cuda10.1' for 'libcudnn7' was not found E: Unable to locate package libcudnn7-dev
nvidia-smi outputs the following:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A | | 0% 40C P8 4W / 260W | 11012MiB / 11019MiB | 2% Default | +-------------------------------+----------------------+----------------------+
Provide the exact sequence of commands / steps that you executed before running into the problem
Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
Actually the biggest issue, that might not be related to this error is that the performance of the exact same model is much worse when trying to execute using TF2.0 and am not sure whether these errors might be the cause, although I doubt it. I guess I should open another issue for this, but am not sure which category is most suitable for this issue.
Answer
questions
ymodak
You need to add CUPTI path to your environment variable.
It should be something like /usr/local/cuda/extras/CUPTI/lib64
to LD_LIBRARY_PATH
Related questions