Ask questionstensorflow-gpu CUPTI errors

<em>Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template</em>

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 18.04):
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version: 2.0
  • Python version:3
  • Installed using virtualenv? pip? conda?: pip
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: GeForce RTX 2080 Ti

Describe the problem I upgraded TensorFlow from 1.x to 2.0 and tried to run the same model as I successfully ran with TF 1.x.

Previously, I already had NVIDIA drivers and CUDA toolkit installed, and therefore I just installed tensorflow-gpu in new virtual environment. Upon running, I got the following output with errors/warnings:

2019-10-03 00:16:43.397431: I tensorflow/stream_executor/platform/default/] Successfully opened dynamic library 2019-10-03 00:16:45.096352: W tensorflow/stream_executor/cuda/] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once. 2019-10-03 00:17:16.303969: I tensorflow/core/profiler/lib/] Profiler session started. 2019-10-03 00:17:16.304314: W tensorflow/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.0/lib64::/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64 2019-10-03 00:17:16.304328: W tensorflow/core/profiler/lib/] Encountered error while starting profiler: Unavailable: CUPTI error: CUPTI could not be loaded or symbol could not be found. 2019-10-03 00:17:17.242794: I tensorflow/core/platform/default/] Collecting 0 kernel records, 0 memcpy records. 2019-10-03 00:17:17.276983: E tensorflow/core/platform/default/] CUPTI error: CUPTI could not be loaded or symbol could not be found.

In order to fix these warnings, I ran the NVIDIA/CUDA-related commands, listed on tensorflow-gpu installation page:

Add NVIDIA package repositories

wget sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb sudo apt-key adv --fetch-keys sudo apt-get update wget sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb sudo apt-get update

Install NVIDIA driver

sudo apt-get install --no-install-recommends nvidia-driver-418

Reboot. Check that GPUs are visible using the command: nvidia-smi

Install development and runtime libraries (~4GB)

sudo apt-get install --no-install-recommends

Install TensorRT. Requires that libcudnn7 is installed above.

sudo apt-get install -y --no-install-recommends libnvinfer5=5.1.5-1+cuda10.0

after runnug this, I got the following errir:

: Version '' for 'libcudnn7' was not found E: Unable to locate package libcudnn7-dev

nvidia-smi outputs the following:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:01:00.0 On | N/A | | 0% 40C P8 4W / 260W | 11012MiB / 11019MiB | 2% Default | +-------------------------------+----------------------+----------------------+

Provide the exact sequence of commands / steps that you executed before running into the problem

Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Actually the biggest issue, that might not be related to this error is that the performance of the exact same model is much worse when trying to execute using TF2.0 and am not sure whether these errors might be the cause, although I doubt it. I guess I should open another issue for this, but am not sure which category is most suitable for this issue.


Answer questions ymodak

You need to add CUPTI path to your environment variable. It should be something like /usr/local/cuda/extras/CUPTI/lib64 to LD_LIBRARY_PATH


Related questions

ModuleNotFoundError: No module named 'tensorflow.contrib' hot 8
Error occurred when finalizing GeneratorDataset iterator hot 7
Error loading tensorflow
ModuleNotFoundError: No module named 'tensorflow.contrib'
module 'tensorflow' has no attribute 'ConfigProto'
TF 2.0 'Tensor' object has no attribute 'numpy' while using .numpy() although eager execution enabled by default
When importing TensorFlow, error loading Hadoop
AttributeError: module &#39;tensorflow.python.framework.op_def_registry&#39; has no attribute &#39;register_op_list&#39;
tf.keras.layers.Conv1DTranspose ?
[TF 2.0] tf.keras.optimizers.Adam hot 4
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning. hot 4
TF2.0 AutoGraph issue hot 4
Tf.Keras metrics issue hot 4
ModuleNotFoundError: No module named 'tensorflow.examples.tutorials' hot 4
module 'tensorflow.python._pywrap_tensorflow_internal' has no attribute 'TFE_NewContextOptions' hot 4
Github User Rank List