profile
viewpoint

rachellim/docs 0

TensorFlow documentation

rachellim/tensorflow 0

An Open Source Machine Learning Framework for Everyone

pull request commenttensorflow/tensorflow

Clarified the DispatchServer creation process in tf.data distribute service docs

Thanks for the ping. @aaudiber - can you review?

kvignesh1420

comment created time in 10 days

issue commenttensorflow/tensor2tensor

Stuck after printing 'Successfully opened dynamic library libcublas.so.10.0'

@sanjoy, can you reassign this to someone on the GPU team to investigate?

zhez6

comment created time in 12 days

issue commenttensorflow/tensor2tensor

Stuck after printing 'Successfully opened dynamic library libcublas.so.10.0'

@harishkashyap - what version of tensorflow as you using? If you use an older version, does it still work? (Trying to diagnose whether it's an issue with your CUDA installation or a regression in TF)

zhez6

comment created time in 13 days

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

@aynesss , that's a different problem. Namely, your generator dataset has two components (output_types = (tf.float32, tf.float32)), but the function you supply to map only takes one argument. In this case, assuming your generator produces an (image, label) tuple, you probably want to rewrite your distort_simclr function to take two arguments (def distort_simclr(image, label): ...).

See the documentation of dataset.map for more details ("The input signature of map_func is determined by the structure of each element in this dataset.")

dineshdharme

comment created time in 18 days

startedMaggieAppleton/digital-gardeners

started time in 21 days

IssuesEvent

issue commenttensorflow/tensorflow

DatasetVariantWrapper "No unary variant device copy function found"

Ah, I see that there are two subtly different issues here:

(1)

    dataset = tf.data.Dataset.range(10)

    @tf.function
    def f():
      for i in tf.range(1):
        for x in dataset:
          tf.print(x)

    f()

(2)

    @tf.function
    def f():
      dataset = tf.data.Dataset.range(10)
      for i in tf.range(1):
        for x in dataset:
          tf.print(x)
    f()

(1) has been fixed as of TF 2.3. (see #34519)

(2) is still an open issue.

Depending on how your tf.function is defined, you might still be seeing this bug. I'm reopening this issue to track (2).

mwalmsley

comment created time in 25 days

issue commenttensorflow/tensorflow

tf.range + for x,y in dataset issue

This specific issue (dataset defined outside the @tf.function, and used inside a loop in the tf.function) has been fixed as of TF 2.3. There are still remaining issues with a subtly different setup (dataset defined inside the @tf.function and used inside a loop in the tf.function), see #34112.

SSSxCCC

comment created time in 25 days

issue closedtensorflow/tensorflow

DatasetVariantWrapper "No unary variant device copy function found"

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes (but running on an official image)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04, running the Ubuntu 18.04 tensorflow-gpu Docker image provided by GCloud
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): tensorflow-gpu-2.0.0
  • Python version: 3.5
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source): NA
  • CUDA/cuDNN version: 10.0
  • GPU model and memory: P100 x1

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

I'm running Tensorflow on GCloud using the gcr.io/deeplearning-platform-release/tf2-gpu.2-0 image. I'm attempting to train a subclassed tf.keras.Model.

I can train my image both on my local machine and GCloud, provided the --runtime=nvidia arg is NOT provided. When I add that argument, the GCloud image fails with the following error:

tensorflow.python.framework.errors_impl.InternalError: No unary variant device copy function found for direction: 1 and Variant type_index: tensorflow::data::(anonymous namespace)::DatasetVariantWrapper [[{{node MapDataset/_8}}]] [Op:__inference_get_input_516]

Describe the expected behavior

I expect TensorFlow to continue to run successfully when using the --runtime=nvidia arg, i.e. enabling CUDA.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Because I have no understanding of the origin of this internal error, I am not sure how to create such a minimal case. Please advise.

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

Full failing log:

File "/home/zoobot/zoobot/estimators/run_estimator.py", line 33, in run_estimator train_dataset = input_utils.get_input(config=config.train_config) File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in call result = self._call(*args, **kwds) File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/eager/def_function.py", line 526, in _call return self._concrete_stateful_fn._filtered_call(canon_args, canon_kwds) # pylint: disable=protected-access File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call self.captured_inputs) File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat ctx, args, cancellation_manager=cancellation_manager) File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/eager/function.py", line 511, in call ctx=ctx) File "/root/miniconda3/lib/python3.5/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InternalError: No unary variant device copy function found for direction: 1 and Variant type_index: tensorflow::data::(anonymous namespace)::DatasetVariantWrapper [[{{node MapDataset/_8}}]] [Op:__inference_get_input_516]

Full successful log, without using --runtime=nvidia:

2019-11-08 23:34:49.753010: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2019-11-08 23:34:59.531666: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 2019-11-08 23:34:59.531739: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303) 2019-11-08 23:34:59.531778: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist 2019-11-08 23:34:59.532239: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-11-08 23:34:59.540322: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2300000000 Hz 2019-11-08 23:34:59.540738: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55cf105c9a70 executing computations on platform Host. Devices: 2019-11-08 23:34:59.540776: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version WARNING:root:Loading multiple tfrecords with interleaving, shuffle=False WARNING:root:Loading multiple tfrecords with interleaving, shuffle=False 2019-11-08 23:35:37.148322: I tensorflow/core/profiler/lib/profiler_session.cc:184] Profiler session started. 2019-11-08 23:35:37.148432: E tensorflow/core/platform/default/device_tracer.cc:70] CUDA error: <unknown> ... Epoch 1/100 ...

closed time in a month

mwalmsley

issue commenttensorflow/tensorflow

DatasetVariantWrapper "No unary variant device copy function found"

This issue should be fixed as of TF 2.3.

@Ryan-Rudes , @ppwwyyxx, @mwalmsley etc - can you see if you're still encountering this and reopen if so?

mwalmsley

comment created time in a month

issue commenttensorflow/tensorflow

tf.range + for x,y in dataset issue

This issue should be fixed as of TF 2.3.

SSSxCCC

comment created time in a month

startedcommaai/openpilot

started time in a month

issue commenttensorflow/tensor2tensor

Stuck after printing 'Successfully opened dynamic library libcublas.so.10.0'

Based on https://github.com/tensorflow/tensorflow/issues/38100 and https://github.com/f90/FactorGAN/issues/1, I suspect this may be a problem with your CUDA installation.

zhez6

comment created time in 2 months

issue commenttensorflow/tensor2tensor

Stuck after printing 'Successfully opened dynamic library libcublas.so.10.0'

@ramonemiliani93 , what version of tensorflow are you running? I was not able to reproduce this issue with the following dataset:

def make_tensor(sizes):
  return np.asarray([f * 1.0 for f in range(1, np.prod(sizes) + 1)]).reshape(sizes)

filter = make_tensor([1, 1, 1, 3, 3])
x = make_tensor([10, 2, 3, 1, 3])
dataset = tf.data.Dataset.from_tensors((x, filter))
dataset = dataset.map(lambda input, filter: tf.nn.conv3d(input, filter, strides=[1, 1, 1, 1, 1], padding="VALID")
print(list(dataset))

So it doesn't seem to be an issue with using tf.nn.conv3d inside map. Can you provide a minimal repro?

zhez6

comment created time in 2 months

issue commenttensorflow/tensor2tensor

Stuck after printing 'Successfully opened dynamic library libcublas.so.10.0'

@ramonemiliani93 , what version of tensorflow are you running?

zhez6

comment created time in 2 months

more