profile
viewpoint
Gaurav Jain jaingaurav Google Mountain View, CA gauravjain.org Software Engineer at Google Brain

jaingaurav/Diamond 0

Diamond is a python daemon that collects system metrics and publishes them to Graphite (and others). It is capable of collecting cpu, memory, network, i/o, load and disk metrics. Additionally, it features an API for implementing custom collectors for gathering metrics from almost any source.

jaingaurav/django-grappelli 0

A jazzy skin for the Django Admin-Interface (official repository).

jaingaurav/docs 0

TensorFlow documentation

jaingaurav/dotfiles 0

YADR - The best vim,git,zsh plugins and the cleanest vimrc you've ever seen

jaingaurav/example 0

Go example projects

jaingaurav/models 0

Models and examples built with TensorFlow

jaingaurav/pyutmp 0

Python binding to Un*x UTMP functionality

issue commenttensorflow/tensorflow

tensorflow 2.0 variable slice assign_add not supported

ziofil@: This is definitely something we want to address as it is a usability issue. We're still trying to allocate time to work on this. In the meantime, contributions are welcome.

motionlife

comment created time in 6 minutes

Pull request review commenttensorflow/tensorflow

[Features] DLPack functions

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""DLPack modules for Tensorflow"""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from tensorflow.python import pywrap_tfe+from tensorflow.python.util.tf_export import tf_export+++# tf.dlpack.to_dlpack/from_dlpack doesn't work. How to fix?

I think you should make this as tf.experimental.dlpack.*. The tf.experimental.tensorrt CL will show a bit of what is needed: https://github.com/tensorflow/tensorflow/commit/7eee3d7db64af14e9a9ded49031505fa135861b5.

bazel test tensorflow/tools/api/tests:api_compatibility_test -- --update_goldens=True should generate the necessary generated files.

VoVAllen

comment created time in 20 hours

Pull request review commenttensorflow/tensorflow

[Features] DLPack functions

+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/c/eager/dlpack.h"+#include "include/dlpack/dlpack.h"  // TF:dlpack+#include "tensorflow/c/eager/c_api_internal.h"+#include "tensorflow/c/tf_status_helper.h"+#include "tensorflow/core/framework/tensor.h"+#include "tensorflow/core/platform/casts.h"++#include "tensorflow/core/framework/tensor_reference.h"+#include "tensorflow/core/platform/logging.h"++namespace tensorflow {++namespace {++struct TFDLManagedTensorCtx {+  TensorReference* handle;+  std::vector<int64_t> shape;+  DLManagedTensor tensor;+};++const Tensor* GetTensorFromHandle(TFE_TensorHandle* h, TF_Status* status) {+  if (h == nullptr || !h->handle->IsValid(&status->status)) {+    status->status = tensorflow::errors::InvalidArgument(+        "The passed in handle is a nullptr");+    return nullptr;+  }+  tensorflow::TensorHandle* handle =+      tensorflow::down_cast<tensorflow::TensorHandleInterface*>(h->handle.get())+          ->Handle();++  if (handle->IsRemote()) {+    status->status = tensorflow::errors::InvalidArgument(+        "DLPack doesn't support remote tensor");+    return nullptr;+  }+  const tensorflow::Tensor* tensor;+  status->status = handle->Tensor(&tensor);+  if (!status->status.ok()) {+    return nullptr;+  }+  return tensor;+};++void DLManagedTensorDeleter(DLManagedTensor* arg) {+  TFDLManagedTensorCtx* owner =+      static_cast<TFDLManagedTensorCtx*>(arg->manager_ctx);+  owner->handle->Unref();+  delete owner->handle;+  delete owner;+}++DLDataType GetDLDataType(TF_DataType data_type, TF_Status* status) {+  DLDataType dtype;+  dtype.lanes = 1;+  dtype.bits = TF_DataTypeSize(data_type) * 8;+  switch (data_type) {+    case TF_DataType::TF_HALF:+    case TF_DataType::TF_FLOAT:+    case TF_DataType::TF_DOUBLE:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_INT8:+    case TF_DataType::TF_INT16:+    case TF_DataType::TF_INT32:+    case TF_DataType::TF_INT64:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_BOOL:+    case TF_DataType::TF_UINT8:+    case TF_DataType::TF_UINT16:+    case TF_DataType::TF_UINT32:+    case TF_DataType::TF_UINT64:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_BFLOAT16:+      dtype.code = DLDataTypeCode::kDLBfloat;+      break;+    default:+      status->status = tensorflow::errors::InvalidArgument(+          DataType_Name(static_cast<DataType>(data_type)),+          " is not supported by dlpack");+      break;+  }+  return dtype;+}++DLContext GetDLContext(TFE_TensorHandle* h, TF_Status* status) {+  DLContext ctx;+  const char* device_name = h->handle->DeviceName(&status->status);+  DeviceNameUtils::ParsedName parsed_name;+  tensorflow::DeviceNameUtils::ParseFullName(device_name, &parsed_name);+  std::string device_type = parsed_name.type;+  int device_id = -1;+  if (parsed_name.has_id) {+    device_id = parsed_name.id;+  }  // Question: Is it possible that it doens't have id?

@sanjoy: Took a scan at various places in the code and I think it's safe to say you should use 0 as the default.

VoVAllen

comment created time in 21 hours

pull request commenttensorflow/tensorflow

NFC - minor spelling tweaks under python directory

@kiszk: Seems like there are still some conflicts. Are you seeing any on your end?

kiszk

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

fix the assert_shapes issue

 def assert_rank(x, rank, data=None, summarize=None, message=None, name=None):     ValueError:  If static checks determine `x` has wrong rank.   """   with ops.name_scope(name, 'assert_rank', (x, rank) + tuple(data or [])):-    x = ops.convert_to_tensor(x, name='x')+    x = x if sparse_tensor.is_sparse(x) else ops.convert_to_tensor(x, name='x')

Are numpy arrays supported here? If so, it seems like the correct fix is to only call ops.convert_to_tensor with numpy arrays.

As mentioned earlier, let's just drop the name from the error message as it seems unnecessary.

Leslie-Fang

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

fix the assert_shapes issue

 def assert_rank_in(   """   with ops.name_scope(       name, 'assert_rank_in', (x,) + tuple(ranks) + tuple(data or [])):-    x = ops.convert_to_tensor(x, name='x')+    x = x if sparse_tensor.is_sparse(x) else ops.convert_to_tensor(x, name='x')

Similar comment as above.

Leslie-Fang

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

fix the assert_shapes issue

 def assert_rank_at_least(   """   with ops.name_scope(       name, 'assert_rank_at_least', (x, rank) + tuple(data or [])):-    x = ops.convert_to_tensor(x, name='x')+    x = x if sparse_tensor.is_sparse(x) else ops.convert_to_tensor(x, name='x')

Similar comment as above.

Leslie-Fang

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

fix the assert_shapes issue

 def assert_rank(x, rank, data=None, summarize=None, message=None, name=None):     ValueError:  If static checks determine `x` has wrong rank.   """   with ops.name_scope(name, 'assert_rank', (x, rank) + tuple(data or [])):-    x = ops.convert_to_tensor(x, name='x')+    x = x if sparse_tensor.is_sparse(x) else ops.convert_to_tensor(x, name='x')

So this is very interesting. Why do we need the ops.convert_to_tensor at all? If it's just for name that seems bad. I would suggest simply removing name altogether here since it is empty with eager execution anyways. Then I don't believe we need to call ops.convert_to_tensor since we expect x to be a tensor.

Leslie-Fang

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

fix the assert_shapes issue

 def test_rank_zero_rank_one_size_one_equivalence(self):         (rank_zero, (1,)),     ]) +  @test_util.run_in_graph_and_eager_modes+  def test_sparse_tensor_input(self):+    A = array_ops.ones([2, 2], name="rank_two")

I believe we want the tensor name to be lower snake case to abide by python code style guidelines.

Leslie-Fang

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

fix the assert_shapes issue

 def shape(self):     """     return tensor_util.constant_value_as_shape(self._dense_shape) +  @property+  def name(self):+    """Get the name of the sparse tensor"""+    return self.__str__()

This does not seem correct. Unless we add a name parameter to SparseTensor this does not seems like a reasonable value of name.

Leslie-Fang

comment created time in 22 days

pull request commenttensorflow/tensorflow

Add NVTX Ranges

I know very little about NVTX but with just a cursory look at the PR I'm a bit concerned about how it is injected in the code. It seems like we need a bit more of a general solution where tracing hooks can be added and then different tracers can be registered to receive notifications for critical events. This would avoid the need for environment variables and inflating dependancies.

We're trying to see if we can get someone more knowledgable internally to help with this PR.

nluehr

comment created time in a month

Pull request review commenttensorflow/tensorflow

[Intel Mkl] Updating MKL implementation of Eager API.

 class EagerOpRewriteTest {             GetDefaultCustomKernelCreator()));      EagerExecutor executor_(false);-    const tensorflow::AttrTypeMap* types;-    bool is_function = false;-    EXPECT_EQ(Status::OK(), tensorflow::AttrTypeMapForOp(op_name.c_str(),-                                                         &types, &is_function));     std::unique_ptr<tensorflow::EagerOperation> op(-        new tensorflow::EagerOperation(eager_ctx.get(), op_name.c_str(),-                                       is_function, types, &executor_));+        new tensorflow::EagerOperation(eager_ctx.get()));+    op.get()->Reset(op_name.c_str(), nullptr, false, &executor_);

Please wrap with EXPECT_EQ(Status::OK(), ...)

claynerobison

comment created time in a month

Pull request review commenttensorflow/tensorflow

[Intel Mkl] Updating MKL implementation of Eager API.

 Status MklEagerOpRewrite::Run( Status MklEagerOpRewrite::SetupNewOp(     EagerOperation* orig_op, const string mkl_op_name,     std::unique_ptr<EagerOperation>* new_mkl_op) {-  const tensorflow::AttrTypeMap* types;-  bool is_function = false;-  TF_RETURN_IF_ERROR(-      tensorflow::AttrTypeMapForOp(mkl_op_name.c_str(), &types, &is_function));-  EagerContext* ctx = orig_op->EagerContext();-  new_mkl_op->reset(new tensorflow::EagerOperation(ctx, mkl_op_name.c_str(),-                                                   is_function, types));+  bool is_remote = false;+  new_mkl_op->reset(new tensorflow::EagerOperation(&orig_op->EagerContext()));+  new_mkl_op->get()->Reset(mkl_op_name.c_str(), nullptr, is_remote, nullptr);

I believe the Reset call should be wrapped in a TF_RETURN_IF_ERROR since the AttrTypeMapForOp is moved there now.

claynerobison

comment created time in a month

pull request commenttensorflow/tensorflow

[Intel Mkl] Updating MKL implementation of Eager API.

@gunan: Do we have an internal test target that tests --config=mkl? It would have helped avoid this failure.

claynerobison

comment created time in a month

pull request commenttensorflow/tensorflow

Add complex number support for tf.extract_image_patches

@yongtang: can you rewrite history to strip & merge the commit messages down to something simple? There seems to be a special character in there somewhere which is messing with our infra. Sorry for the inconvenience.

yongtang

comment created time in a month

Pull request review commenttensorflow/tensorflow

Add complex number support for tf.extract_image_patches

 REGISTER_OP("ExtractImagePatches")     .Attr("ksizes: list(int) >= 4")     .Attr("strides: list(int) >= 4")     .Attr("rates: list(int) >= 4")-    .Attr("T: realnumbertype")+    .Attr("T: numbertype")

I believe this would also include quantized types but that is not included in TF_CALL_GPU_ALL_TYPES

yongtang

comment created time in a month

Pull request review commenttensorflow/tensorflow

Minor optimization

 void TF_OperationGetAttrString(TF_Operation* oper, const char* attr_name,         InvalidArgument("Attribute '", attr_name, "' is not a string");     return;   }-  if (max_length <= 0) {+  if (max_length == 0) {+    status->status = InvalidArgument("Attribute '", max_length, "' is zero");

We had an internal review of this and unfortunately this breaks the API a bit since we never enforce in the documentation the max_length needs to be a certain size. Could you instead just remove this if condition altogether and we'll just do a memcpy of 0 bytes?

gaurav1086

comment created time in a month

pull request commenttensorflow/tensorflow

NFC - minor spelling tweaks under python directory

@jaingaurav I am afraid that one future big PR (currently more than 400 files) may increase the possibility of the conflict. What do you think?

Whatever you think is best. Either way we can make sure we get all the necessary owner approvals.

kiszk

comment created time in 2 months

pull request commenttensorflow/tensorflow

Expose `assign_moving_average` via public API

@robieta: Does it make sense to export this API?

Squadrick

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_strictly_increasing(x, name=None):    See also:  `is_non_decreasing` +  ```python

Please apply all the same comments from tf.math. is_non_decreasing

WilliamHYZhang

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_non_decreasing(x, name=None):    See also:  `is_strictly_increasing` +  ```python+  >>> x1 = tf.constant([1.0, 1.0, 3.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))+  1++  >>> x2 = tf.constant([3.0, 1.0, 2.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))+  0+

Remove blank line?

WilliamHYZhang

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_non_decreasing(x, name=None):    See also:  `is_strictly_increasing` +  ```python+  >>> x1 = tf.constant([1.0, 1.0, 3.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))+  1++  >>> x2 = tf.constant([3.0, 1.0, 2.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))

Remove tf.print

WilliamHYZhang

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_non_decreasing(x, name=None):    See also:  `is_strictly_increasing` +  ```python+  >>> x1 = tf.constant([1.0, 1.0, 3.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))+  1+

Remove blank line?

WilliamHYZhang

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_non_decreasing(x, name=None):    See also:  `is_strictly_increasing` +  ```python

I don't believe the starting or ending triple backticks are needed.

WilliamHYZhang

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_non_decreasing(x, name=None):    See also:  `is_strictly_increasing` +  ```python+  >>> x1 = tf.constant([1.0, 1.0, 3.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))+  1++  >>> x2 = tf.constant([3.0, 1.0, 2.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))+  0++  ```

Remove triple backticks

WilliamHYZhang

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

add usage examples/doctests

 def is_non_decreasing(x, name=None):    See also:  `is_strictly_increasing` +  ```python+  >>> x1 = tf.constant([1.0, 1.0, 3.0])+  >>> tf.print(tf.math.is_non_decreasing(x1))

Can you please simply remove the tf.print so we see the output boolean tensor?

WilliamHYZhang

comment created time in 2 months

issue commenttensorflow/tensorflow

TF 2.0 distribution strategy throws invalid argument error

I just verified and I believe this is resolved in the nightly and the fix should be available in 2.1.0. Please let me know if you are able to still reproduce with the nightly or the 2.1.0-rc1 release.

SumNeuron

comment created time in 2 months

issue closedtensorflow/tensorflow

TF 2.0 distribution strategy throws invalid argument error

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): somewhat
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): DGX
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: n/z
  • TensorFlow installed from (source or binary): docker image
  • TensorFlow version (use command below): 2.0
  • Python version: 3.6
  • Bazel version (if compiling from source): n/a
  • GCC/Compiler version (if compiling from source): n/a
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: Telsa V100-SXM2

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior From the docs:

tf.debugging.set_log_device_placement(True)

strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
  inputs = tf.keras.layers.Input(shape=(1,))
  predictions = tf.keras.layers.Dense(1)(inputs)
  model = tf.keras.models.Model(inputs=inputs, outputs=predictions)
  model.compile(loss='mse',
                optimizer=tf.keras.optimizers.SGD(learning_rate=0.2))

I adapted this (I hope correctly for multiple gpus)

gpus = tf.config.experimental.list_physical_devices('GPU')
gpus_to_use = gpus[-3:]

if gpus:
    # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
    try:
        tf.config.experimental.set_visible_devices(gpus_to_use, 'GPU')
        for gpu in gpus_to_use:
            tf.config.experimental.set_memory_growth(gpu, True)        
            gb = 1024
            tf.config.experimental.set_virtual_device_configuration(
                gpu,
                [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=12*gb)]
            )
        logical_gpus = tf.config.experimental.list_logical_devices('GPU')
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
    except RuntimeError as e:
        # Virtual devices must be set before GPUs have been initialized
        print(e)

which prints 8 Physical GPUs, 3 Logical GPUs as expected

Then, calling just this line:

strategy = tf.distribute.MirroredStrategy()

throws:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-18-2f6e99f3473c> in <module>
----> 1 strategy = tf.distribute.MirroredStrategy()

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py in __init__(self, devices, cross_device_ops)
    354   def __init__(self, devices=None, cross_device_ops=None):
    355     extended = MirroredExtended(
--> 356         self, devices=devices, cross_device_ops=cross_device_ops)
    357     super(MirroredStrategy, self).__init__(extended)
    358 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py in __init__(self, container_strategy, devices, cross_device_ops)
    394                      "any local devices.")
    395     self._cross_device_ops = cross_device_ops
--> 396     self._initialize_strategy(devices)
    397 
    398     # TODO(b/128995245): Enable last partial batch support in graph mode.

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py in _initialize_strategy(self, devices)
    408         "No duplicates allowed in `devices` argument: %s" % (devices,))
    409     if _is_device_list_local(devices):
--> 410       self._initialize_local(devices)
    411     else:
    412       self._initialize_multi_worker(devices)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/mirrored_strategy.py in _initialize_local(self, devices)
    418     self._input_workers = input_lib.InputWorkers(self._device_map)
    419     self._inferred_cross_device_ops = None if self._cross_device_ops else (
--> 420         cross_device_ops_lib.choose_the_best(devices))
    421     self._host_input_device = numpy_dataset.SingleDevice("/cpu:0")
    422 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/cross_device_ops.py in choose_the_best(devices, session_config)
   1194   """
   1195   requested_devices = set([device_util.canonicalize(d) for d in devices])
-> 1196   machine_devices = device_lib.list_local_devices(session_config=session_config)
   1197   using_devices = set()
   1198   for d in machine_devices:

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/device_lib.py in list_local_devices(session_config)
     39   return [
     40       _convert(s)
---> 41       for s in pywrap_tensorflow.list_devices(session_config=session_config)
     42   ]

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/pywrap_tensorflow_internal.py in list_devices(session_config)
   2247     return ListDevicesWithSessionConfig(session_config.SerializeToString())
   2248   else:
-> 2249     return ListDevices()
   2250 
   2251 

InvalidArgumentError: device CUDA:0 not supported by XLA service
	while setting up XLA_GPU_JIT device number 0

Describe the expected behavior

It just works as in the docs

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. See above. Docker image tensorflow/tensorflow:2.0.0-gpu-py3-jupyter with nvidia-docker

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 2 months

SumNeuron

issue commenttensorflow/tensorflow

non_max_suppression GPU version is 3x slower than CPU version in TF 1.15

@sgambient: Could you please verify if int64 helps?

sgambient

comment created time in 2 months

pull request commenttensorflow/tensorflow

NFC - minor spelling tweaks under python directory

@kiszk: I don't think that'll be necessary. We can review this as one bulk PR.

kiszk

comment created time in 3 months

issue closedtensorflow/tensorflow

Possible tf.matmul bug (wrong results) on tensorflow-gpu

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v1.14.0-rc1-22-gaf24dc91b5 1.14.0
  • Python version: 3.6.7
  • GCC/Compiler version (if compiling from source): 8.2.0
  • CUDA/cuDNN version: cuda-10.0
  • GPU model and memory:

Describe the current behavior tf.matmul on tensorflow-gpu gave wrong results. Here is a simplified version of the code.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

r = [[ 1.0, 0.0],    [0.0, 1.0]]

x, y = np.meshgrid(list(range(400)), list(range(400)))
coords = np.stack([x,y],-1).reshape((400,400,2,1))
coords = tf.convert_to_tensor(coords,dtype=tf.float32)

r1 = tf.constant(r)

newCoords = tf.matmul(r1, coords)

sess = tf.Session()
ret = sess.run(newCoords,feed_dict={r1:r})

plt.matshow(ret[:,:,0,0])
plt.show()

When I ran it on my tensorflow-gpu, here is the result: bug Looks like it stops computing halfway through and gave the rest 0 as a result.

Describe the expected behavior Here is the result with CPU: CPU


Below is my old post. Initially, I thought the problem was related to TFRecord, but seems like this problem occurs without even using tfrecord too. For completeness, I keep the old example code with tfrecord.

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

def parser(serialized_example):
      fs = tf.io.parse_single_example(
          serialized_example,
          features={ "r": tf.FixedLenFeature([4], tf.float32) })
      fs["r"] = tf.reshape(fs["r"], [2, 2])
      return fs

r = [[ 1.0, 0.0],[0.0, 1.0]]

with tf.io.TFRecordWriter("cc.test") as tfrecord_writer:
    feature = {"r": tf.train.Feature(float_list=tf.train.FloatList(value=np.array(r).flatten() ))}
    example = tf.train.Example(features=tf.train.Features(feature=feature))
    tfrecord_writer.write(example.SerializeToString())
dataset = tf.data.TFRecordDataset(["cc.test"])
dataset = dataset.map(parser).repeat().make_one_shot_iterator()
features = dataset.get_next()

x, y = tf.meshgrid(list(range(400)), list(range(400)))
coords = tf.stack([x, y], -1)     #(h,w,2)
coords = tf.expand_dims(tf.cast(coords,tf.float32),-1) #(h,w,2,1)

r1 = features["r"]
r2 = tf.constant(r)

newCoords = tf.matmul(r1, coords)

sess = tf.Session()
ret = sess.run(newCoords[:,:,0,0])
plt.matshow(ret)
plt.show()

The code will create "cc.test" file to save the variable r and load it as r1. Then matmul r1 with some big varibles.

closed time in 3 months

sWizad

issue commenttensorflow/tensorflow

Possible tf.matmul bug (wrong results) on tensorflow-gpu

@fxtentacle: Thank you for your analysis. As the fix is is part of CUDA 10.1, we recommend users to upgrade to a TF release with support for CUDA 10.1+. We do not plan to release a new 1.x based on CUDA 10.1.

sWizad

comment created time in 3 months

issue commenttensorflow/tensorflow

[2.0] Disable usage of GPU using the new config APIs

We were able to confirm that a small amount of memory is being used by XLA. We're working on a fix now.

llan-ml

comment created time in 3 months

issue commenttensorflow/tensorflow

[2.0] Disable usage of GPU using the new config APIs

Actually hold on, I was using a different build. It seems like memory is still being used even if no visible devices are being set.

llan-ml

comment created time in 3 months

issue commenttensorflow/tensorflow

[2.0] Disable usage of GPU using the new config APIs

tf.config.experimental.set_visible_devices([], 'GPU') seems to work for me. What does the output of tf.config.experimental.list_logical_devices() return? If you are seeing a GPU there, then the visible devices configuration did not properly apply.

llan-ml

comment created time in 3 months

issue commenttensorflow/tensorflow

Applying certain ImgAug augmenters inside of a `tf.py_function` causes AutoGraph errors.

It is likely there is some existing issue in your augment_batch code that is not surfacing clearly due to autograph. Could you try annotating augment_batch with @tf.function(autograph=False) and see if you get a clearer error message?

jamesonthecrow

comment created time in 3 months

issue closedtensorflow/tensorflow

Tensorflow 2.0 too slow when minimizing a custom cost function

System information

  • OS : Windows 10
  • TensorFlow version: v2.0.0-rc2-26-g64c3d382ca 2.0.0
  • Python version: 3.7.4

I have a code that looks like the following, where I want to minimize a custom cost function with respect to parameters w. However, when running the code, it appears to me that it is very slow (like more than 30 times slower) compared to the same code implemented without tensorflow (by explicitly defining a function that gives the gradient of the cost).

I am not sure if it's a problem with tf or if I am doing something wrong and unnecessarily re-computing the tf graph each time. I posted this issue on stackoverflow and have been advised to open an issue here.

In the following code, I am using a simple dummy cost function just as an example to show the big difference in performance.

Code with Tensorflow:

import numpy as np
import tensorflow as tf
import time

class ExampleTF:
    def __init__(self, n=100, m=10):
        Z = np.random.randn(n, m)
        self.Z = tf.convert_to_tensor(Z, dtype=tf.float32)
        self.w = tf.Variable(np.ones((m, 1)), dtype=tf.float32)

    # =====================================
    def cost(self, P):
        # This is a simple dummy cost function just as an example
        return tf.reduce_sum((self.Z @ self.w) - P)

    # =====================================
    def optimize_w(self, cost_func, parameters, lr=0.01, iterations=2000):
        optimizer = tf.optimizers.Adam(lr)
        for _ in range(iterations):
            optimizer.minimize(cost_func, var_list=parameters)

    # =====================================
    def update(self, P):
        P = tf.convert_to_tensor(P, dtype=tf.float32)

        self.optimize_w(
            cost_func = lambda: self.cost(P),
            parameters = [self.w]
        )

        #print("===> cost:", self.cost(P).numpy())
        #print("w:", self.w.numpy().reshape(-1)[:10])

# =====================================
n, m = 1000, 100
ex_tf = ExampleTF(n, m)
for _ in range(50):
    P = np.random.uniform(size=n).reshape((-1, 1))

    start = time.time()
    ex_tf.update(P)
    elapsed = time.time() - start

    print("elapsed time:", elapsed)

Code without Tensorflow (just numpy) :

import numpy as np
import tensorflow as tf
import time

class ExampleNonTF:
    def __init__(self, n=100, m=10):
        self.Z = np.random.randn(n, m)
        self.w = np.ones((m, 1))

    # =====================================
    def cost(self, P):
        # This is a simple dummy cost function just as an example
        return np.sum(self.Z @ self.w - P)

    # =====================================
    def gradient_cost(self, P):
        # This is the gradient of the dummy cost function with respect to self.w
        return np.sum(self.Z, axis=0).reshape(self.w.shape)

    # =====================================
    def optimize_w(self, P, lr=0.01, iterations=2000): # This is the ADAM optimizer
        avg_grad1 = 0; avg_grad2 = 0
        beta1 = 0.9; beta2 = 0.999; eps = 1e-07
        for itr in range(iterations):
            grad = self.gradient_cost(P)
            avg_grad1 = beta1 * avg_grad1 + (1 - beta1) * grad
            avg_grad2 = (beta2 * avg_grad2 + (1 - beta2) * (grad ** 2))
            avg_grad1_corr = avg_grad1 / (1 - beta1 ** (itr + 1))
            avg_grad2_corr = avg_grad2 / (1 - beta2 ** (itr + 1))
            self.w = self.w - lr * (avg_grad1_corr / (np.sqrt(avg_grad2_corr) + eps))

    # =====================================
    def update(self, P):
        self.optimize_w(P)

        #print("===> cost:", self.cost(P))
        #print("w:", self.w.reshape(-1)[:10])

# =====================================
n, m = 1000, 100
ex_nontf = ExampleNonTF(n, m)
for _ in range(50):
    P = np.random.uniform(size=n).reshape((-1, 1))

    start = time.time()
    ex_nontf.update(P)
    elapsed = time.time() - start

    print("elapsed time:", elapsed)

closed time in 3 months

HTCode

issue commenttensorflow/tensorflow

Tensorflow 2.0 too slow when minimizing a custom cost function

There are known limitations of TF eager execution performance that is being actively worked on by the team. I'd suggest seeing if you can use tf.function to improve performance.

HTCode

comment created time in 3 months

issue commenttensorflow/tensorflow

Euclidean distance transform add on to support 3D images?

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there.

If you think we've misinterpreted a bug, please comment again with a clear explanation, as well as all of the information requested in the issue template. Thanks!

helwilliams

comment created time in 3 months

issue closedtensorflow/tensorflow

Euclidean distance transform add on to support 3D images?

Hello as the euclidean distance transform has been implemented with specific input shapes, I was wondering if this would be expanded to more general shapes. I am currently trying to generate a distance matrix based on 3D ground truth data (i.e. W X H X D) which could easily be 4D (Batch size x W x H x D). Is this something that would be possible?

closed time in 3 months

helwilliams

pull request commenttensorflow/tensorflow

Update documentation of tf.debugging.assert_shapes

@amitkumarj441: I'm really sorry but I think I had a typo in my previous review. My comment should have read Though we accept tuples in v2 we only allow dictionaries in v1. Oops.

As the code has changed quite a bit, it seems like the v2 documentation has already been fixed. If you want you can go ahead and change the v1 documentation https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/check_ops.py#L1661 to:

  tf.assert_shapes([
    (x, ('N', 'Q')),
    (y, ('N', 'D')),
    (param, ('Q',)),
    (scalar, ())
  ])

Sorry again for the confusion.

amitkumarj441

comment created time in 3 months

issue commenttensorflow/tensorflow

GradientTape: Allow to execute backward functions on same device as forward functions

@olesalscheider: We unfortunately hit a number of model performance regressions due to this change. We're still trying to investigate but this is isn't surprising given how careful one needs to be about device placement.

I'm not sure what all models we have available publicly for you to look into. But the issue seemed pretty wide-spread (at least for pure eager workloads).

olesalscheider

comment created time in 3 months

IssuesEvent

Pull request review commenttensorflow/tensorflow

Update documentation of tf.debugging.assert_shapes

 def assert_shapes_v2(shapes, data=None, summarize=None, message=None,    ```python   tf.assert_shapes([-    (x: ('N', 'Q')),-    (y: ('N', 'D')),-    (param: ('Q',)),-    (scalar: ()),+    (x, ('N', 'Q')),

@amitkumarj441: Seems like the code has changed since your changes. Could you please rebase and try again? Seems like there are 2 call sites to update in the example now.

amitkumarj441

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Update documentation of tf.debugging.assert_shapes

 def assert_shapes_v2(shapes, data=None, summarize=None, message=None,    ```python   tf.assert_shapes([-    (x: ('N', 'Q')),-    (y: ('N', 'D')),-    (param: ('Q',)),-    (scalar: ()),+    (x, ('N', 'Q')),

Shouldn't this be:

tf.assert_shapes({
   x: ('N', 'Q'),
   y: ('N', 'D'),
   param: ('Q',),
   scalar: (),
 })
amitkumarj441

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[ROCm] enumerate ROCm GPU devices in grappler.

 limitations under the License. #include "tensorflow/core/platform/byte_order.h" #include "tensorflow/core/platform/cpu_info.h" -#if GOOGLE_CUDA+#if GOOGLE_CUDA || TENSORFLOW_USE_ROCM #include "tensorflow/core/common_runtime/gpu/gpu_init.h" #include "tensorflow/core/platform/stream_executor.h"-#endif  // GOOGLE_CUDA+#endif  // GOOGLE_CUDA || TENSORFLOW_USE_ROCM  namespace tensorflow { namespace grappler {  int GetNumAvailableGPUs(-    const std::pair<int, int>& min_cuda_compute_capability) {+    const GpuVersion& min_gpu_version) {   int num_eligible_gpus = 0;+#if GOOGLE_CUDA || TENSORFLOW_USE_ROCM

Using ifdef's in the code doesn't seem right here. Can we just have 3 separate files? device_default.cc, device_cuda.cc & device_rocm.cc and fix the build rules to include the correct one?

whchung

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[ROCm] enumerate ROCm GPU devices in grappler.

 limitations under the License. #include <functional> #include <utility> +#include "absl/types/variant.h"+ #include "tensorflow/core/lib/core/status.h" #include "tensorflow/core/lib/core/threadpool.h" #include "tensorflow/core/platform/types.h"  namespace tensorflow { namespace grappler { -// Get the number of available GPUs whose number of multiprocessors is no less-// than 8 and whose CUDA compute capability is no less than-// min_cuda_compute_capability.+// GpuVersion is used to abstract Gpu hardware version. On Cuda platform,+// it comprises a pair of integers denoting major and minor version.+// On ROCm platform, it comprises one integer for AMD GCN ISA version.+using GpuVersion = absl::variant<std::pair<int, int>, int>;++// Get the number of available GPUs.+// On CUDA platform, look for GPUs whose number of multiprocessors is no less+// than 8 and whose CUDA compute capability is no less than min_gpu_version,+// represented as a pair of integers.+// On ROCm platform, look for GPUs whose ISA version number is no less than+// min_gpu_version, represented as a single integer. int GetNumAvailableGPUs(-    const std::pair<int, int>& min_cuda_compute_capability = {0, 0});+#if GOOGLE_CUDA+    const GpuVersion& min_gpu_version = std::pair<int, int>(0, 0)

Same comment goes here. I don't think we should have preprocessor macros that change the function signature. This should be broken out into different files.

whchung

comment created time in 3 months

pull request commenttensorflow/tensorflow

Fixing decode_raw_op for complex numbers on big endian

Thanks for the fix @namrata-ibm!

namrata-ibm

comment created time in 3 months

pull request commenttensorflow/tensorflow

Support dynamic shapes for IndexedSlices in the _num_elements function

@alexeyr: Yes, but I think ideally we'll want something a bit more isolated to this change if possible.

alexeyr

comment created time in 3 months

push eventjaingaurav/tensorflow

wenxizhu

commit sha 22e6ba72f94185fbecdee1300013f8e74a1bdaa1

Fuse "Transpose + Maxpool3d + Transpose".

view details

wenxizhu

commit sha 7659a2f82ef5512f9bdb84e6017fa99f635e09ba

Clang format fix.

view details

wenxizhu

commit sha 59fc4a41b447c6effe62f2eea69d12d4e1d3a6e1

Add test cases for transpose+maxpool3d+transpose.

view details

wenxizhu

commit sha 01d7d186e6c90562ae988404eca02277a2a2c99f

Change code foramt in CopyAttrsPooling().

view details

wenxizhu

commit sha a3a8322d29e13450805b82e1df2b31ccf8717c47

A negative case for "transpose + maxpool3d + transpose" added.

view details

wenxizhu

commit sha ed06859189722af4dc8e4abd655926df066e587a

Add format check.

view details

wenxizhu

commit sha 7f13d5c2238a61fa3f6be9f6a694692de82a7874

Clang format fix for mkl_layout_pass_test.cc

view details

Evgeniy Zheltonozhskiy

commit sha 8873a49e669ec2c3a6b08b3ca7fc2df7540e465c

Fix android demo build

view details

TengLu

commit sha 33e5c3d78cbd15fda78662711faeaeb74b895b34

Refine the code of Transpose and MaxPooling fusion.

view details

Li, Guizi

commit sha 053e9004da6f307401f6bbace2b66534a435ee1d

[Intel MKL] add missing attr

view details

Li, Guizi

commit sha 18818f86fc44aa85d0ac6fbe7c2a85a520bf1d1c

update API for QuantizedDepthwiseConv2DWithBiasAndReluAndRequantize

view details

Bas Aarts

commit sha 56ec786351eeb4267e6d93bc93f8840417982787

add a 'dbg' build configuration as a shorthand for '--confi=opt -c dbg' Disable arm-neon by default for now to work around a gcc issue

view details

Bas Aarts

commit sha 9198633368d9d8cefefeed4ed465e101e7169bb5

Don't compile CUDA kernels in debug mode. -G causes kernels to use more registers and memory. This results in lots of kernels using too many resources, causing them to fail when launched This makes it impossible to run a TensorFlow debug build.

view details

Kaustubh Maske Patil

commit sha 5006295cf7a20de9ef9087127569e9d58b28022d

Updated DeviceSpec docstring and fixed typos Updated docstring (as per #34124), adding instructions for using `device_spec.to_string()` method with eager execution enabled. Fixed missing parentheses in example.

view details

Kaustubh Maske Patil

commit sha 96662d167df4256097308f9b2f4d926da3dca32d

Update device_spec.py

view details

Kaustubh Maske Patil

commit sha 411535b082dc0e6a18279749a0cf17607e747a2b

Update device_spec.py

view details

Kaustubh Maske Patil

commit sha 1f2f65befd5823ed18d88fe651b4b27e49bb0e43

Removed trailing whitespace

view details

Kaustubh Maske Patil

commit sha 78fa57e0fde06ad2fed87987c6cb575e99d89900

Update device_spec.py

view details

Kaustubh Maske Patil

commit sha 1aaed9b1ef604e970d182662f14b5728cb12f5ba

Fixed one trailing whitespace and one misalignment

view details

Li, Guizi

commit sha 2c749f5b6bb0134d97d27e87fa109d1f53939ed3

change padding_list to paddings

view details

push time in 3 months

issue commenttensorflow/tensorflow

Seemingly unavailable datatypes for Ops.

@Davidvdrm: As you can see from the code snippet you posted, these complex ops are not supported on windows due to compiler incompatibilities. I think we'll have better luck when we upgrade to MSVC 2019. For now these ops are available only on linux with clang as the CUDA compiler.

mcourteaux

comment created time in 3 months

issue commenttensorflow/tensorflow

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

@clementpoiret: Please note that the tf.config.experimental.set_memory_growth call is unnecessary since tf.config.experimental.set_virtual_device_configuration overrides that flag since it slices up the GPU memory and pre-allocates the allocated memory.

6etacat

comment created time in 3 months

delete branch jaingaurav/tensorflow

delete branch : cherry-2.0

delete time in 3 months

delete branch jaingaurav/tensorflow

delete branch : cherry-1.15

delete time in 3 months

delete branch jaingaurav/tensorflow

delete branch : cherry-1.15-2

delete time in 3 months

delete branch jaingaurav/tensorflow

delete branch : cherry-2.0-2

delete time in 3 months

delete branch jaingaurav/tensorflow

delete branch : dep-2.0

delete time in 3 months

push eventjaingaurav/tensorflow

A. Unique TensorFlower

commit sha db1dce813fcf8b1672574b04694068f297287d38

Lowering from tf.rfft for 1D to xla_hlo.fft. TF_Complex64 and TF_Complex128 were changed to complex<f32> and complex<f64> respectively, along with corresponding tests. PiperOrigin-RevId: 277160944 Change-Id: I14100b7653b60b9b377cc1f1b29e9573bc84cf6f

view details

Lei Zhang

commit sha 9f4f37a2c268e5982ef5c47a1e1726ddd226dde0

[spirv] Support OpPhi using block arguments This CL adds another control flow instruction in SPIR-V: OpPhi. It is modelled as block arguments to be idiomatic with MLIR. See the rationale.md doc for "Block Arguments vs PHI nodes". Serialization and deserialization is updated to convert between block arguments and SPIR-V OpPhi instructions. PiperOrigin-RevId: 277161545 Change-Id: I120410922f1eac422e3dad77e5693b6055f77874

view details

TensorFlower Gardener

commit sha 6cbcfc5538a6df81b8caf10553579605815ef5ad

Merge pull request #33259 from AyanmoI:amoitra/cudnn_batchnorm_rewriter PiperOrigin-RevId: 277161751 Change-Id: I85d8342974efa0b093fce8c446d2a697132ab14a

view details

Gaurav Jain

commit sha 44cda863cbcd2cbc2849e72fe82b0349c15c4cca

Clean up EagerExecutor to avoid anonymous blocks PiperOrigin-RevId: 277162529 Change-Id: I2401d89eab512460eb9aabd206548f41c0061e7c

view details

Reed Wanderman-Milne

commit sha 82f4a53775ba338dbaed0c329959d9ed53d428e4

Clarify error message when batch size not divisible by num_replicas PiperOrigin-RevId: 277162934 Change-Id: Ic65ae36bdb78f2e75ee14e69d24ac4005b6a7dde

view details

Nat Jeffries

commit sha 284aaf60c5888ca94cbcd2dcbd478145244452e7

Refactor micro activations_test. PiperOrigin-RevId: 277163099 Change-Id: I4b03faa09c27088739192ba3162454357e06d611

view details

Dmitry Kovalev

commit sha 17c9274f9f82742ee1f1016ca4223e6d76ea2ecc

Build binary distribution (bdist) for tflite-runtime. Current code builds both bdist (.tar.gz) and bdist_wheel (.whl) files now. PiperOrigin-RevId: 277164088 Change-Id: I7254ec0ed3a38777bca879f62ca22329188c1e0c

view details

TensorFlower Gardener

commit sha 52bb37cc4972d91e8110244875acaa95afad43e1

Merge pull request #33632 from Williscool13:williscool PiperOrigin-RevId: 277165462 Change-Id: I5cbfdfffdebcba4233a8a2a6325b769ec8470330

view details

A. Unique TensorFlower

commit sha b3479e611c7ad8bff2858a0f30cc60ac0639ee79

TF.ComplexAbs added to generated ops, with hlo lowering PiperOrigin-RevId: 277166556 Change-Id: If099221244333661b0b2d7416876bf860ba52357

view details

Tiezhen WANG

commit sha 9ae3853ccabef46d77a1ef16b6a8d7a28121a7bd

TFLM: nit: Fix an obvious issue in the test. PiperOrigin-RevId: 277166720 Change-Id: Iff23f196c61f837d65bbda30e7c125281bb12ce6

view details

Dmitry Kovalev

commit sha 108178dcb4b51ea635a2f41491d8395645654b49

Add list of classifiers for tflite-runtime package PiperOrigin-RevId: 277167816 Change-Id: Ib057a95d3973e52dfecae15de7e4138585669ff7

view details

Penporn Koanantakool

commit sha 4c762c4018a06e762c0f0a69513d7214fdc4933f

Delay enabling MatrixDiagV2, MatrixDiagPartV2, and MatrixSetDiagV2 since they have conflicting default behaviors with V3 ops (coming soon). PiperOrigin-RevId: 277168805 Change-Id: I0c65b4b4fb981f4cceeac9d2406e8d192f897e9e

view details

George Karpenkov

commit sha de5b7b45f9688e54578ca7bb652eeefa2c15da6f

[XLA] Unify attribute parsing code between ParseAttributesAsProtoMessage and ParseAttributes Consequently, correctly ignore values ParseAttributes already ignores. PiperOrigin-RevId: 277169837 Change-Id: I74f9e6fea16bdc5bd45311baf83602cb819b5ed4

view details

TensorFlower Gardener

commit sha 888077e65538291964837a2b85e4965a39a21771

Merge pull request #32917 from lamarrr:patch-1 PiperOrigin-RevId: 277170913 Change-Id: Idd5cbd4fa09d99f01b5f8b69082c3cf63a1af036

view details

A. Unique TensorFlower

commit sha 44514b3257836b913e0bcb896fdd6ec0b4798c06

[Tensorflow Metrics] Track the usage of model_to_estimator and its version. NO_IFTTT=Only add tracking metrics. PiperOrigin-RevId: 277171165 Change-Id: I5319e9ca7affac0a1ebe26a363719bd49b32eb3b

view details

Scott Zhu

commit sha ee3bc64a51e4c3dcdf4b69a8d44508c707fcb05d

Disallow Keras LSTMCell to be used with DropoutWrapper. See https://github.com/tensorflow/tensorflow/issues/33690 for details. PiperOrigin-RevId: 277171614 Change-Id: I9925801a2b0c055c595ff20ecf063d6dd876e18e

view details

Reed Wanderman-Milne

commit sha 159d5f6dae0d7e795c57cf4dbe9fa5163c692070

Fix incorrect RNN gradients with TPUStrategy and mixed precision. The core issue is that if you call Variable.value() when a TPUStrategy is used, gradients with respect to that Variable are None (I'm not sure why). All the operator overloads in AutoCastVariable used Variable.value(), so they were broken with TPUStrategy. I fixed this by calling Variable.read_value() instead. This fixes any case where an AutoCastVariable operator is used, such as most RNNs. PiperOrigin-RevId: 277176725 Change-Id: I43e8abcd69f99708d47ec5b3b82b67bab4494db1

view details

Akshay Modi

commit sha cb9319253d81374e6c9b0dc27c28fe8f5ba2ebb1

UnsortedSegmentMax uses outputs to calculate the gradients Fixes #33425 PiperOrigin-RevId: 277176938 Change-Id: I6226bfdd603092aecd8842b43ac04490b7cb8984

view details

TensorFlower Gardener

commit sha 64f06f945503aa6e8cfa41e25132698b972bb403

Merge pull request #33652 from tensorflow:release_tf_java_1.15 PiperOrigin-RevId: 277177739 Change-Id: I47ae88b3800f96e7adfdcdbf8cff7c463a3c2ff0

view details

Deven Desai

commit sha 0b8ff049d2a58409af2141407b9770791b5038f2

[ROCm] Fix for the broken ROCm CSB. The following commit breaks the `--config=rocm` build https://github.com/tensorflow/tensorflow/commit/ab6524320e616774ce00e195b9cf0efbb991834e The commit above introduces the test "test_opt_einsum_cached" in //tensorflow/python:special_math_ops_test_gpu The order of execution of other tests within that file can dictate whether or not the newly added test will pass or fail. The failure (caught byt he ROCm Nighty CSB run) does not seem specific to the ROCm platform. The "fix" is to explicitly clear the lru_cache of the routine "special_math_ops._get_opt_einsum_contract_path" (before running the test) to gurantee that the test will pass, irrespective of the order in which it is run relative to the other tests.

view details

push time in 3 months

pull request commenttensorflow/tensorflow

Fixing decode_raw_op for complex numbers on big endian

@namrata-ibm: Yes, that looks perfect.

namrata-ibm

comment created time in 3 months

create barnchjaingaurav/tensorflow

branch : cherry-list

created branch time in 3 months

PR opened tensorflow/tensorflow

Reviewers
Support LogicalDevice in MirroredStrategy config

PiperOrigin-RevId: 280290757 Change-Id: I52dfff634e6e0ccdc81cd5cce682d7df3499b618

+30 -20

0 comment

5 changed files

pr created time in 3 months

issue commenttensorflow/tensorflow

Seemingly unavailable datatypes for Ops.

@Davidvdrm: The main option needed to be set is the "Do you want to use clang as CUDA compiler?" to be set when running ./configure. See https://www.tensorflow.org/install/source#sample_session for an example session. Note in the example it is using nvcc instead of clang for the CUDA compiler, which is NOT what you want.

mcourteaux

comment created time in 3 months

issue commenttensorflow/tensorflow

a bizarre mistake-----InvalidArgumentError: slice index -1 of dimension 0 out of bounds.()

@ouy160: With the fixed tutorial is there any further action needed for this issue or should we close it?

ouy160

comment created time in 3 months

issue commenttensorflow/tensorflow

[TF2.0] GradientTape.gradient raise SystemError when calling embedding_column layer multiple times with tf.function

I believe this is resolved by https://github.com/tensorflow/tensorflow/pull/33912. I'll test it out and confirm.

GoSz

comment created time in 3 months

pull request commenttensorflow/tensorflow

Fixing decode_raw_op for complex numbers on big endian

Thanks @namrata-ibm! Could you please extend DecodeRawOpTest.testEndianness in https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/kernel_tests/decode_raw_op_test.py to include tests for complex values so that we can ensure your code is tested? Note I have recently cleaned up the file to be more 2.0 friendly.

namrata-ibm

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Fixing decode_raw_op for complex numbers on big endian

 class DecodeRawOp : public OpKernel {     } else {       // Otherwise, the data is not in the host's byte order, and rather than a       // direct copy, we need to reverse the byte ordering of each element.+      int64 element_size;+      if (DataTypeString(out_type_) == "complex64" || DataTypeString(out_type_) == "complex128") {

Can you change the test to (out_type_ == DT_COMPLEX64) || (out_type_ == DT_COMPLEX128) in order to avoid the string conversion?

namrata-ibm

comment created time in 3 months

issue commenttensorflow/tensorflow

non_max_suppression GPU version is 3x slower than CPU version in TF 1.15

sgambient@: This could be due to a lot of copies between GPU & CPU. Could you see if using int64 instead of int32 helps?

sgambient

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Update documentation of tf.debugging.assert_shapes

 def assert_shapes_v2(shapes, data=None, summarize=None, message=None,    ```python   tf.assert_shapes([-    (x: ('N', 'Q')),-    (y: ('N', 'D')),-    (param: ('Q',)),-    (scalar: ()),+    (x, ('N', 'Q')),

Though we accept tuples in v1 we only allow dictionaries in v2. I think the previous code needs to just use curly brackets instead.

amitkumarj441

comment created time in 3 months

issue commenttensorflow/tensorflow

Seemingly unavailable datatypes for Ops.

@Davidvdrm: Were you able to verify if the problem was resolved with the clang CUDA compiler?

mcourteaux

comment created time in 3 months

issue commenttensorflow/tensorflow

Seemingly unavailable datatypes for Ops.

@Davidvdrm: Unfortunately compiler support for many complex operations was suboptimal. We need to provide the necessary complex overrides in order to provide wider support. Did you try compiling with nvcc of the clang CUDA compiler? Clang should give you access to those kernels.

mcourteaux

comment created time in 4 months

issue commenttensorflow/tensorflow

Are complex variables supported in eager mode?

@Davidvdrm: The fix is already committed and available in the nightly pip package. It will also be included in the upcoming 2.1 release.

ziofil

comment created time in 4 months

issue commenttensorflow/tensorflow

Tensorflow v2 Limit GPU Memory usage

germanjke@ what are you executing prior to this call? If you've executed any ops its too late for us to reconfigure the virtual device. The virtual device is simply a way of slicing up a physical device in separate devices with a fixed memory limit.

cassianocasagrande

comment created time in 4 months

issue commenttensorflow/tensorflow

tensorflow 2.0 variable slice assign_add not supported

I haven't had a chance to make much progress on this issue, but it requires doing similar tricks we do here for assign: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/array_ops.py#L1091. We'll likely have to add other operators beyond just assign_add.

motionlife

comment created time in 4 months

pull request commenttensorflow/tensorflow

Address problems with use_deterministic_cudnn test decorator

@chsigg: You might be a better person for this PR

duncanriach

comment created time in 4 months

issue commenttensorflow/tensorflow

complex gradient update in optimization

This has been fixed in bf9c196f37b9cbb3109b2891aaf9da85bf5f712a. I should be included in the 2.1 release.

kctezcan

comment created time in 4 months

issue closedtensorflow/tensorflow

complex gradient update in optimization

Is there a plan to allow complex optimization in Tensorflow in the future?

When you try to do it with version 1.3, you can calculate and evaluate gradients, but you cannot apply them with opt.apply_gradients(grds_and_vars). The error message you get is:

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'ApplyAdadelta' with these attrs. Registered devices: [CPU,GPU], Registered kernels: device='GPU'; T in [DT_DOUBLE] device='GPU'; T in [DT_FLOAT] device='GPU'; T in [DT_HALF] device='CPU'; T in [DT_DOUBLE] device='CPU'; T in [DT_FLOAT] device='CPU'; T in [DT_HALF]


System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No

  • **OS Platform and Distribution Distributor ID: Debian Description: Debian GNU/Linux 8.9 (jessie) Release: 8.9

  • TensorFlow installed from (source or binary): using pip install in a virtual conda environment

  • TensorFlow version (use command below): 1.3

  • Python version: Python 3.5.2 |Anaconda 4.3.0 (64-bit)| (default, Jul 2 2016, 17:53:06)

  • Bazel version (if compiling from source): not applicable

  • CUDA/cuDNN version: 8.0/6.0

  • GPU model and memory: Tesla K40c, 11439MiB

  • Exact command to reproduce: not necessary, since feature request/question

Describe the problem

problem description above

Source code / logs

not applicable

closed time in 4 months

kctezcan

Pull request review commenttensorflow/tensorflow

Make tape only watch the tensor with the floating dtype

 def __call__(self, device, token, args):     """Passes `args` to `self._func`, which is executed eagerly."""      with context.eager_mode(), backprop.GradientTape() as tape:+      # Only watch tensors with a floating dtype.       for tensor in args:-        tape.watch(tensor)+        for t in nest.flatten(tensor):+          if t.dtype.is_floating:

This had been fixed in 507325c5b3fa5943485f3f994048fe7683e0f95d

feihugis

comment created time in 4 months

issue commenttensorflow/tensorflow

Are complex variables supported in eager mode?

@ziofil: That issue is slightly unrelated. I have a fix that add support for complex values in (some) optimizers that should be landing shortly.

ziofil

comment created time in 4 months

issue closedtensorflow/tensorflow

tf.reduce_mean gives incorrect results on CPU

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d38 2.0.0
  • Python version: 3.7.4

Current Behavior

The following script:

import tensorflow as tf
import numpy as np

(x_train, _), (_, _) = tf.keras.datasets.cifar10.load_data()
x_train = x_train.astype('float32')

tf_mean = tf.reduce_mean(x_train, axis=[0, 1, 2], keepdims=False)
np_mean = np.mean(x_train, (0,1,2))
print('channel means:')
print('  tf:', tf_mean)
print('  np:', np_mean)

prints:

channel means:
  tf: tf.Tensor([83.88608 83.88608 83.88608], shape=(3,), dtype=float32)
  np: [125.3069  122.95015 113.866  ]

Note: the numpy results are the correct channel-wise means for CIFAR10.

Expected Behavior Tensorflow and numpy should give at least vaguely similar results.

closed time in 4 months

kazimuth

issue commenttensorflow/tensorflow

tf.reduce_mean gives incorrect results on CPU

@kazimuth: I believe the problem here is the TensorFlow is doing the computation with 32-bit float values whereas numpy is using 64-bit floating almost all the time. Even if you change np.mean to use dtype=float32, it still seems to perform the calculation with 64-bit float values. If you want the same behavior I suggest you change x_train to use float64 and then cast to float32 in the very end like so:

import tensorflow as tf
import numpy as np

(x_train, _), (_, _) = tf.keras.datasets.cifar10.load_data()
x_train = x_train.astype('float64')

tf_mean = tf.cast(tf.reduce_mean(x_train, axis=[0, 1, 2,], keepdims=False), dtype=tf.float32)
np_mean = np.mean(x_train, (0,1,2), dtype=np.float32)
print('channel means:')
print('  tf:', tf_mean)
print('  np:', np_mean)
kazimuth

comment created time in 4 months

pull request commenttensorflow/tensorflow

suppress unused result in s3_filesystem (-Wunused-result)

luxe@: It is pending approval by a developer internally and it should be merged into GitHub soon.

luxe

comment created time in 4 months

pull request commenttensorflow/tensorflow

[Intel MKL] Adding a unit test for MKL eager rewrite

Okay we had to disable the test on macos but the PR is merged again.

mahmoud-abuzaina

comment created time in 4 months

pull request commenttensorflow/tensorflow

[Intel MKL] Adding a unit test for MKL eager rewrite

ashraf-bhuiyan@ It was some macos build failure, but the error seems unrelated. I'm trying to roll-forward the fix.

mahmoud-abuzaina

comment created time in 4 months

pull request commenttensorflow/tensorflow

[r1.15-CherryPick]:Use correct casts to get right dimensions on s390x

@rthadur: We unfortunately missed the cut-off for 1.15. Unless this is a critical fix it's unlikely to be pulled into the release at the moment.

shahidhs-ibm

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

fix typo

 def reverse_sequence(input,   >>> seq_lengths = [7, 2, 3, 5]   >>> input = [[1, 2, 3, 4, 5, 0, 0, 0], [1, 2, 0, 0, 0, 0, 0, 0],   ...          [1, 2, 3, 4, 0, 0, 0, 0], [1, 2, 3, 4, 5, 6, 7, 8]]-  >>> output = reverse_sequence(input, seq_lens, seq_dim=1, batch_dim=0)+  >>> output = reverse_sequence(input, seq_lengths, seq_dim=1, batch_dim=0)

@yashk2810: Is there a reason these errors weren't caught in our doctects?

autoih

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

fix typo

 def reverse_sequence(input,   >>> seq_lengths = [7, 2, 3, 5]   >>> input = [[1, 2, 3, 4, 5, 0, 0, 0], [1, 2, 0, 0, 0, 0, 0, 0],   ...          [1, 2, 3, 4, 0, 0, 0, 0], [1, 2, 3, 4, 5, 6, 7, 8]]-  >>> output = reverse_sequence(input, seq_lens, seq_dim=1, batch_dim=0)+  >>> output = reverse_sequence(input, seq_lengths, seq_dim=1, batch_dim=0)

Also it seems like this should be seq_axis instead of seq_dim

autoih

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

fix typo

 def reverse_sequence(input,   >>> seq_lengths = [7, 2, 3, 5]   >>> input = [[1, 2, 3, 4, 5, 0, 0, 0], [1, 2, 0, 0, 0, 0, 0, 0],   ...          [1, 2, 3, 4, 0, 0, 0, 0], [1, 2, 3, 4, 5, 6, 7, 8]]-  >>> output = reverse_sequence(input, seq_lens, seq_dim=1, batch_dim=0)+  >>> output = reverse_sequence(input, seq_lengths, seq_dim=1, batch_dim=0)

Could you please update the code to call tf.reverse_sequence instead of just reverse_sequence

autoih

comment created time in 4 months

issue closedtensorflow/tensorflow

TF2.0 gradient result with some minor difference

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • I have written very simple custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Mac OS 10.13.16 and Windows 10):
  • No mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source with tag of v2.0.0 ):
  • TensorFlow version (use command below):
  • Python version: 3.7
  • Bazel version (0.26.1 & 0.25.3):
  • GCC/Compiler version (Apple LLVM version 9.1.0 (clang-902.0.39.2)):
  • CUDA/cuDNN version: CUDA 10.0 and cuDNN 7.4.x
  • GPU model and memory: ()

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: `python -c tf_env.txt

"import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"`

Describe the current behavior tf.Tensor(31.999998, shape=(), dtype=float32) tf.Tensor(30.0, shape=(), dtype=float32) tf.Tensor(3.0, shape=(), dtype=float32) tf.Tensor(1.0, shape=(), dtype=float32)

Describe the expected behavior As dy/dx = a^2 + b, I expect result of 32.0 for dy_dx

Code to reproduce the issue import tensorflow as tf

x = tf.constant(3.0) a = tf.constant(5.0) b = tf.constant(7.0) c = tf.constant(9.0)

with tf.GradientTape() as tape: tape.watch([x, a, b, c]) y = a**2 * x + b * x + c

[dy_dx, dy_da, dy_db, dy_dc] = tape.gradient(y, [x, a, b, c]) print(dy_dx, dy_da, dy_db, dy_dc)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 4 months

XiagenFeng

issue commenttensorflow/tensorflow

TF2.0 gradient result with some minor difference

Unless I'm missing something 31.999998 is when considering floating point precision. If you want further precision you can change the constants to be dtype=tf.float64 and you'll get 32.0.

XiagenFeng

comment created time in 4 months

PR closed tensorflow/tensorflow

Reviewers
cleared numpy depreciation warnings cla: yes size:S

fixed numpy compatibility issues with tensorflow, no longer displays depreciation messages when importing tensorflow on systems with np version > 1.14.5

Tested with numpy versions 1.14.5, 1.16.5, 1.17.0

+6 -6

2 comments

1 changed file

LordGhostX

pr closed time in 4 months

pull request commenttensorflow/tensorflow

cleared numpy depreciation warnings

@LordGhostX: Could you provide a bit more context on why this change is needed? What was the deprecation message you were seeing?

LordGhostX

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

Deprecate tf.test.is_gpu_available

 def decorated(self, *args, **kwargs):   return decorator  +@deprecation.deprecated(+    None, "Use `tf.config.experimental.list_physical_devices('GPU')` instead.")

The behavior is subtly different so I can't apply a re-write rule.

On second thought I'll close this PR and first make the new API not experimental in time for 2.1.

jaingaurav

comment created time in 5 months

PR closed tensorflow/tensorflow

Reviewers
Deprecate tf.test.is_gpu_available cla: yes size:XS

The tf.config.experimental.list_physical_devices API is preferable as the current API results in an initialization of the runtime which may be undesirable.

PiperOrigin-RevId: 270812165 (cherry picked from commit b8e6bc6a6980f79eae332ba9b01e722f571f2c05)

+2 -0

0 comment

1 changed file

jaingaurav

pr closed time in 5 months

PR opened tensorflow/tensorflow

Reviewers
Deprecate tf.test.is_gpu_available

The tf.config.experimental.list_physical_devices API is preferable as the current API results in an initialization of the runtime which may be undesirable.

PiperOrigin-RevId: 270812165 (cherry picked from commit b8e6bc6a6980f79eae332ba9b01e722f571f2c05)

+2 -0

0 comment

1 changed file

pr created time in 5 months

create barnchjaingaurav/tensorflow

branch : dep-2.0

created branch time in 5 months

issue commenttensorflow/tensorflow

No documentation for ConfigProto

@MarkDaoust: ConfigProto has been replaced by many tf.config APIs for 2.0. Under the hood it uses ConfigProto, but that is an implementation detail. Please let me know if there is any known functionality missing from the tf.config APIs, as not all fields may have been exposed in the new APIs.

nbro

comment created time in 5 months

issue commenttensorflow/tensorflow

Problem Passing Tensor Attr to Custom Op in Eager Execution Mode

dtarakanov1@: I agree that your need to use image summaries differs slightly. However, isn't the recommendation for the original issue to use tensors as an input vs an attribute?

oracle3001

comment created time in 5 months

issue commenttensorflow/tensorflow

Problem Passing Tensor Attr to Custom Op in Eager Execution Mode

Thanks for digging up the code @dtarakanov1. Regarding the image summary, you should be able to use tf.compat.v2.summary.image which should be compatible with Eager mode execution.

As @josh11b has mentioned, it seems like using a tensor attr is recommended. Instead, the tensor should be an input.

Thus, I am closing the issue, but please feel free to re-open if need to support tensor attributes in eager mode is necessary.

oracle3001

comment created time in 5 months

issue closedtensorflow/tensorflow

Problem Passing Tensor Attr to Custom Op in Eager Execution Mode

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Windows 10:
  • TensorFlow installed from binary:
  • TensorFlow version 1.14:
  • Python version 3.7:
  • CUDA/cuDNN version 10:

I am defining a new custom Op in C++, which takes in a single attribute of type tensor and a single input tensor variable. A stripped version of the Op code is below:

#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"

using namespace tensorflow;

REGISTER_OP("DoStuff")
    .Attr("attr: tensor = { dtype: DT_FLOAT }")
    .Input("in: float")
    .Output("out: float");

class DoStuffOp : public OpKernel {
public:
    explicit DoStuffOp(OpKernelConstruction *context) : OpKernel(context) {
        OP_REQUIRES_OK(context, context->GetAttr("attr", &attr_));
        // ...
    }

    void Compute(OpKernelContext *context) override {
        // ...
    }

private:
    Tensor attr_;
};

REGISTER_KERNEL_BUILDER(Name("DoStuff").Device(DEVICE_CPU), DoStuffOp);

I can compile the Op into a .so file fine. Now, the following code runs.

import tensorflow as tf
dostufflib = tf.load_op_library('build/do_stuff.so')
sess = tf.InteractiveSession() 

sample_in = np.random.rand(3,3)
sample_in_t = tf.convert_to_tensor(sample_in, dtype=np.float32)
sample_atrr = np.zeros([3,3], dtype=np.float32)
sample_attr_t = tf.contrib.util.make_tensor_proto(sample_atrr)

Y = dostufflib.do_stuff(in=sample_in_t, attr=sample_attr_t)

However, if I try to use eager execution mode i.e.

import tensorflow as tf
tf.compat.v1.enable_eager_execution()
dostufflib = tf.load_op_library('build/do_stuff.so')

sample_in = np.random.rand(3,3)
sample_in_t = tf.convert_to_tensor(sample_in, dtype=np.float32)
sample_atrr = np.zeros([3,3], dtype=np.float32)
sample_attr_t = tf.contrib.util.make_tensor_proto(sample_atrr)

Y = dostufflib.do_stuff(in=sample_in_t, attr=sample_attr_t)

I get the following error,

tensorflow.python.framework.errors_impl.UnimplementedError: Attr sample_locs has unhandled type 6

closed time in 5 months

oracle3001

push eventjaingaurav/tensorflow

Gaurav Jain

commit sha c09880bd0fb175bf4e938d5b6279e0225328e06c

Fix merge error in 688a03d639

view details

Gaurav Jain

commit sha f882f551d6fcd85cba9ffe3da1ce01453a68ef27

Disallow comparing ObjectIdentityWrapper to others When using the experimental_ref() API in Tensors & Variables. A common bug I hit was incorrectly comparing a wrapped object with an unwrapped object instead of first calling deref(). To avoid this we raise an exception now instead of returning False. This implies that if Tensors and Variables are kept in the same set or dictionary as other objects, an exception can be raised if there is a hash collision. PiperOrigin-RevId: 268837575 (cherry picked from commit 57e8769bc4ef1c94ddbcfbe4a39afe8f73b433c5)

view details

push time in 5 months

more