profile
viewpoint
Jiri Simsa jsimsa Google, Inc. California, USA

feihugis/tensorflow 0

Computation using data flow graphs for scalable machine learning

jsimsa/alluxio 0

Memory-Centric Virtual Distributed Storage System

jsimsa/community 0

Stores documents used by the TensorFlow developer community

jsimsa/doomtrooper 0

Scraper and data for Czech Doomtrooper

jsimsa/flink 0

Mirror of Apache Flink

jsimsa/incubator-zeppelin 0

Mirror of Apache Zeppelin (Incubating)

jsimsa/mesos 0

Mirror of Apache Mesos

jsimsa/spark 0

Mirror of Apache Spark

jsimsa/tensorflow 0

Computation using data flow graphs for scalable machine learning

jsimsa/thrift 0

Mirror of Apache Thrift

issue commenttensorflow/tensorflow

Decoupling preprocessing and training

@yourtheron is better suited to be posted on Stack Overflow. Github is meant to be used for reporting bugs or requesting functionality. I am also going to close this feature request as it has been addressed by tf.data service, which was released in TF 2.3.

jsimsa

comment created time in 9 days

issue closedtensorflow/tensorflow

Decoupling preprocessing and training

Describe the feature and the current behavior/state.

Currently, TensorFlow performs preprocessing and training on the same host. In situations where the data preprocessing is efficient and yet the CPU resources available on the host are not sufficient to keep up with the training workload on the accelerator, the preprocessing becomes a bottleneck.

The proposed feature is to extend the tf.data / tf.distribute APIs to make it possible to decouple the preprocessing from training.

Will this change the current api? How?

Yes. In the least, the users will need to specify the set of "input" hosts that should perform the preprocessing and the set of "training" hosts to perform the training.

The preferred solution would allow users to express their input pipeline in tf.data as if it was executing on the same host as training and through the means of tf.data / tf.distribute configuration express how it should be distributed.

Who will benefit with this feature?

Users that execute preprocessing intensive training jobs.

closed time in 9 days

jsimsa

issue commenttensorflow/tensorflow

tf.data.Dataset - You must feed a value for placeholder tensor

@MarkDaoust could you please take a look? thank you

leaguilar

comment created time in 9 days

issue commenttensorflow/tensorflow

Memory leak with tf.shuffle, doesn't release buffer memory

@kamalkraj and @evancasey please create a separate issue with instructions on how to reproduce (and evidence that leads you to believe that the shuffle buffer is not released. You can for instance run your workload with --vmodule=dataset=2 to check whether the shuffle dataset iterator (which owns the buffer) is destructed at the end of each epoch. As per my response from January 2nd, I am not able to reproduce the issue with the instructions posted on this issue.

kindernerd

comment created time in 13 days

issue commenttensorflow/tensorflow

tf.data.Dataset.map() ignores eager execution

There is no eager execution of tf.data.

In TF 2 eager mode, tf.data input pipeline graph is constructed eagerly, an iterator for the input pipeline graph is created eagerly, and then, the "give me the next element" is (repeatedly) executed eagerly. The "give me the next element" op executes the input pipeline graph (and often this execution in fact happens asynchronously ahead time so that by the time data is requested, it has already been precomputed).

In other words, given how tf.data works, executing user-defined functions passed to tf.data transformations eagerly is not trivially possible. For that to make sense, there would need to be no asynchrony in the input pipeline and the tf.data (mostly C++) implementation would either need to be updated to support switching between C++ and Python execution (for which the current mechanism is tf.py_function) or have alternative Python backend which would be used for this "eager" mode.

The tf.data team has no plans to support this.

habernal

comment created time in 13 days

issue commenttensorflow/tensorflow

Returning tf.data.UNKNOWN_CARDINALITY when the cardinality can be easily computed

I agree that the behavior of the method should be documented better. @aaudiber could you please update the cardinality documentation to address this issue. In particular, to call out that:

  1. A known cardinality is returned only when it can be inferred statically (i.e. the computation does not execute the input pipeline). Consequently, input pipelines that make use of file-based source datasets or transformations which construct datasets from their input using a user-defined function are expected to return unknown cardinality.

  2. The users can provide cardinality hints through the assert_cardinality transformation.

@andrescodas as for your suggestion about inspecting flat_map (or interleave for that matter). It is generally not possible to statically determine what he cardinality would be. The cardinality of flat_map is a function of the values generated by the flat_map input dataset, which are generally not known statically (i.e. without executing the input dataset) and so we do not attempt to do it.

andrescodas

comment created time in 14 days

issue commenttensorflow/tensorflow

Returning tf.data.UNKNOWN_CARDINALITY when the cardinality can be easily computed

How do you propose the cardinality could be "easily" computed in this case?

Executing the input pipeline does not qualify as an acceptable solution because it can be in general expansive (and possibly not terminate).

andrescodas

comment created time in 14 days

pull request commenttensorflow/tensorflow

[tf.data] Use output_shapes from python for batch dataset

Hi @zhuzilin sorry for the delay in response. I was out of office for an extended period of time and am still catching up.

What is the motivation for your PR? I would prefer to keep the shape inference in C++ for the following reasons.

In general, we could rewrite the tf.data graph in a way that would allow the C++ shape inference to be more accurate. For instance, if the rewrite would set the drop_remainder attribute to True. If use the shapes from the original Python shape inference, than we lose this ability.

In other words, there are reasons for having separate C++ and Python shape inference and I would prefer to keep both.

zhuzilin

comment created time in 15 days

issue closedtensorflow/tensorflow

windows build error(makedataset)

<em>Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template</em>

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 x64
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: 2.4.0
  • Python version: 3.8.3
  • Installed using virtualenv? pip? conda?: N/A
  • Bazel version (if compiling from source): 3.4.1
  • GCC/Compiler version (if compiling from source): Visual Studio 2019
  • CUDA/cuDNN version: 11.0/8.0.2
  • GPU model and memory: RTX2070 GDDR6 8GB

Describe the problem

build error ( link error )

Provide the exact sequence of commands / steps that you executed before running into the problem

./configure
bazel build --copt=-nvcc_options=disable-warnings --define=no_tensorflow_py_deps=true //tensorflow/tools/pip_package:build_pip_package

Any other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

optimize_dataset_op.lo.lib(optimize_dataset_op.obj) : error LNK2019: unresolved external symbol "class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > __cdecl tensorflow::port::JobName(void)" (?JobName@port@tensorflow@@YA?AV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ) referenced in function "protected: virtual void __cdecl tensorflow::data::OptimizeDatasetOp::MakeDataset(class tensorflow::OpKernelContext *,class tensorflow::data::DatasetBase *,class tensorflow::data::DatasetBase * *)" (?MakeDataset@OptimizeDatasetOp@data@tensorflow@@MEAAXPEAVOpKernelContext@3@PEAVDatasetBase@23@PEAPEAV523@@Z)
bazel-out\x64_windows-opt\bin\tensorflow\python\_pywrap_tensorflow_internal.so : fatal error LNK1120: 1 unresolved externals
Target //tensorflow/tools/pip_package:build_pip_package failed to build

closed time in 16 days

alanpurple

issue commenttensorflow/tensorflow

windows build error(makedataset)

This has been fixed by https://github.com/tensorflow/tensorflow/commit/a912a8ed6cc873e1b4ed5de0fb0524d2e499ea34

alanpurple

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

Adding log_warning option in tf.data.experimental.ignore_errors

 tf_kernel_library(     deps = [         "//tensorflow/core:experimental_dataset_ops_op_lib",         "//tensorflow/core:framework",+	"//tensorflow/core/platform:logging",

fix indent

stjohnso98

comment created time in 17 days

Pull request review commenttensorflow/community

Policy to require public types for API

 latter prevents the implementation from distinguishing between the caller not setting the argument vs. the caller setting the argument to the default value, which may be needed when the default behavior is changing. +#### Documented types++Arguments and return values to public APIs must be either be of public types, or inherit from a public type. This ensures that the arguments and return value types are documented and gives users clearer guidance on what can be passed to a public API, and what can they do with the returned values. If it is not desirable for the user to construct these types on their own, one can choose to expose superclass with no constructor, but adequate docstrings.

"clearer" => "clear"

guptapriya

comment created time in 20 days

Pull request review commenttensorflow/community

Policy to require public types for API

 latter prevents the implementation from distinguishing between the caller not setting the argument vs. the caller setting the argument to the default value, which may be needed when the default behavior is changing. +#### Documented types++Arguments and return values to public APIs must be either be of public types, or inherit from a public type. This ensures that the arguments and return value types are documented and gives users clearer guidance on what can be passed to a public API, and what can they do with the returned values. If it is not desirable for the user to construct these types on their own, one can choose to expose superclass with no constructor, but adequate docstrings.

"the arguments and return value types" => "the argument types and return value types"

guptapriya

comment created time in 20 days

Pull request review commenttensorflow/community

Policy to require public types for API

 latter prevents the implementation from distinguishing between the caller not setting the argument vs. the caller setting the argument to the default value, which may be needed when the default behavior is changing. +#### Documented types++Arguments and return values to public APIs must be either be of public types, or inherit from a public type. This ensures that the arguments and return value types are documented and gives users clearer guidance on what can be passed to a public API, and what can they do with the returned values. If it is not desirable for the user to construct these types on their own, one can choose to expose superclass with no constructor, but adequate docstrings.

"either be" => "either"

guptapriya

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Adding log_warning option in tf.data.experimental.ignore_errors

 from tensorflow.python.data.ops import dataset_ops from tensorflow.python.ops import gen_experimental_dataset_ops from tensorflow.python.util.tf_export import tf_export-+from tensorflow.python.compat import compat  @tf_export("data.experimental.ignore_errors")-def ignore_errors():+def ignore_errors(log_warning=False):

In addition, the docstring should be updated with an "Args:" section (see other source files for examples) that describe the purpose of the argument.

stjohnso98

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

Adding log_warning option in tf.data.experimental.ignore_errors

 from tensorflow.python.data.ops import dataset_ops from tensorflow.python.ops import gen_experimental_dataset_ops from tensorflow.python.util.tf_export import tf_export-+from tensorflow.python.compat import compat  @tf_export("data.experimental.ignore_errors")-def ignore_errors():+def ignore_errors(log_warning=False):

You will need to regenerate the golden API files as your change is changing the public API.

You can do so by running:

$ bazel build tensorflow/tools/api/tests:api_compatibility_test
$ bazel-bin/tensorflow/tools/api/tests/api_compatibility_test --update_goldens True
stjohnso98

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

Adding log_warning option in tf.data.experimental.ignore_errors

 limitations under the License. #include "tensorflow/core/framework/dataset.h" #include "tensorflow/core/framework/partial_tensor_shape.h" #include "tensorflow/core/framework/tensor.h"+#include "tensorflow/core/platform/logging.h"

Internal tests are failing because this file does not directly depend on a module that provides this header file.

Please add the dependency "tensorflow/core/platform:logging" to this target.

stjohnso98

comment created time in 21 days

pull request commenttensorflow/tensorflow

Per-op tf32 plumbing for GPUs (Stage1: API change)

For @tensorflow/api-owners:

@reedwm and/or @sanjoy could you please review the functionality in this PR? thank you

kaixih

comment created time in 22 days

create barnchtensorflow/tensorflow

branch : jsimsa-test

created branch time in 23 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

 Status DatasetBaseIterator::GetNext(IteratorContext* ctx,   return s; } +Status DatasetBaseIterator::Skip(IteratorContext* ctx, int num_to_skip,+                                 bool* end_of_sequence, int* num_skipped) {+  profiler::TraceMe activity([&] { return BuildTraceMeName(); },+                             profiler::TraceMeLevel::kInfo);+  DVLOG(3) << prefix() << " Skip enter";+  RecordStart(ctx, /*stop_output=*/true);+  Status s = SkipInternal(ctx, num_to_skip, end_of_sequence, num_skipped);+  if (s.ok() && !*end_of_sequence) RecordElement(ctx);

Sorry for the long turnaround on this PR @zhuzilin. I was out of office for extended period of time and @aaudiber was waiting for my feedback.

The autotuning implementation assumes that the sum of CPU time spent executing logic of a given iterator GetNext(Internal) call across all of its invocations divided by the number of computed elements (as recorded by RecordElement) represents the average time to process a single element. My concern is that the current PR creates an opportunity for this assumption to be violated.

It seems to me that the element should be recorded only if the GetNextInternal-based default implementation of SkipInternal is invoked. tf.data kernels which provide "efficient" version of SkipInternal should not increment the element count (which means we will assume that skipping an element has negligible cost compared to computing it).

So my suggestion would be to move RecordElement to the default implementation of SkipInternal and explain in a comment there why we do that.

zhuzilin

comment created time in 24 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "Whether to fuse filter dataset that predicts random_uniform < rate into "       "a sampling dataset. If None, defaults to False.") +  hoist_data_discarding_ops = options.create_option(+      name="hoist_data_discarding_ops",

hoist discard

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {

There are other cardinality preserving transformations in the core API and it would be good if your implementation works for those too: concatenate, enumerate, shuffle, and zip.

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {+      if (IsDataDiscarding(node)) {+        NodeDef* start = &node;+        NodeDef* start_parent = graph_utils::GetInputNode(*start, graph);+        while (IsCardinalityPreserving(*start_parent) &&+               NumOutputs(*start_parent, graph.graph()) == 1) {+          start = start_parent;+          start_parent = graph_utils::GetInputNode(*start, graph);+        }+        // no cardinality preserving op with indegree 1.+        if (start->name() == node.name()) {+          continue;+        }+        NodeDef hoisted_node = node;

Why do we need to create a new node? I would expect that we can simply adjusts the fanins and fanouts of the existing nodes.

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "Whether to fuse filter dataset that predicts random_uniform < rate into "       "a sampling dataset. If None, defaults to False.") +  hoist_data_discarding_ops = options.create_option(+      name="hoist_data_discarding_ops",+      ty=bool,+      docstring=+      "Whether to hoist ops that will discard data (such as skip, take, shard)"+      "out of map transformations. If None, defaults to False.")

"out of map transformations" is misleading (and incomplete)

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {+      if (IsDataDiscarding(node)) {+        NodeDef* start = &node;+        NodeDef* start_parent = graph_utils::GetInputNode(*start, graph);+        while (IsCardinalityPreserving(*start_parent) &&+               NumOutputs(*start_parent, graph.graph()) == 1) {+          start = start_parent;+          start_parent = graph_utils::GetInputNode(*start, graph);+        }+        // no cardinality preserving op with indegree 1.+        if (start->name() == node.name()) {+          continue;+        }+        NodeDef hoisted_node = node;+        if (!absl::StartsWith(node.name(), "hoist_data_dsicarding_op/")) {

typo in the string, change it to simply "hoist_discard".

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {

Would it make sense to change the order of the loops to make this more efficient? Have an outer loop for all the nodes and in the inner loop, if the node is a discarding op, traverse the parent pointers as long as cardinality preserving ops are encountered.

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {+      if (IsDataDiscarding(node)) {+        NodeDef* start = &node;+        NodeDef* start_parent = graph_utils::GetInputNode(*start, graph);+        while (IsCardinalityPreserving(*start_parent) &&+               NumOutputs(*start_parent, graph.graph()) == 1) {

What does this check guard against?

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}

Make the collection of discarding / cardinality preserving transformations a set so that we can check membership in constant (as opposed to linear) time.

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+#define TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_++#include "tensorflow/core/grappler/optimizers/data/optimizer_base.h"++namespace tensorflow {+namespace grappler {++// This optimization hoists the data discarding ops (such as `skip`, `take` and+//  `shard`) to avoid unnecessary computation.+class HoistDataDiscardingOps : public TFDataOptimizerBase {

Rename to HoistDiscard

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+#define TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_++#include "tensorflow/core/grappler/optimizers/data/optimizer_base.h"++namespace tensorflow {+namespace grappler {++// This optimization hoists the data discarding ops (such as `skip`, `take` and+//  `shard`) to avoid unnecessary computation.+class HoistDataDiscardingOps : public TFDataOptimizerBase {+ public:+  HoistDataDiscardingOps() = default;+  ~HoistDataDiscardingOps() override = default;++  string name() const override { return "hoist_data_discarding_ops"; };

Rename to hoist_discard

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 cc_library(     ] + tf_protos_all(), ) +cc_library(+    name = "hoist_data_discarding_ops",

For consistence with the naming of other targets, please rename this to hoist_discard and the source file (and similiarly for the test target).

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",

In the presence of errors, you can assume that map transformations are cardinality preserving only when their preserve_cardinality attribute is set to true.

zhuzilin

comment created time in 2 months

issue closedtensorflow/tensorflow

How to store tf dataset object to file?

URL(s) with the issue:

https://www.tensorflow.org/guide/data

Description of issue (what needs changing):

How to store tf.dataset object to file? For instance,

dataset1 = tf.data.Dataset.from_tensor_slices(
    tf.random.uniform([4, 10], minval=1, maxval=10, dtype=tf.int32))
dataset1

How to store the dataset1 to file?

Clear description

For me, a saved copy of tokenized dataset saves lot of training time.

from transformers import AlbertTokenizer
import tensorflow as tf
import DataReader
import Tokenizer


def encode(type, dataPath='./qgdata/nq-train-sample.json'):
    entries = DataReader.read(dataPath)
    encoding = []
    for entry in entries:
        if type == 'context':
            context = Tokenizer.encode(
                entry['passage'], entry['answer'], entry['question'], True)
            encoding.append(context)
        else:
            question = Tokenizer.encode(
                entry['passage'], entry['answer'], entry['question'], False)
            encoding.append(question)
    data = tf.data.Dataset.from_generator(
        lambda: encoding, tf.int64, output_shapes=512)
    return data


def make_dataset(dataPath='./qgdata/nq-train-sample.json', batch_size=1):
    contextData = encode('context', dataPath)
    questionData = encode('question', dataPath)
    dataset = tf.data.Dataset.zip((contextData, questionData))
    return dataset.batch(batch_size)

Instead of running this batching script before each training, it would be very efficient to store the tokenzied dataset object to file and avoid retokenizing.

Usage example

Maybe like:

dataset1 = tf.data.Dataset.from_tensor_slices(
    tf.random.uniform([4, 10], minval=1, maxval=10, dtype=tf.int32))
dataset1.save_dataset(path_to_store)

closed time in 2 months

zzj0402

issue commenttensorflow/tensorflow

How to store tf dataset object to file?

My changes were submitted as https://github.com/tensorflow/tensorflow/commit/4d58a67a9f19ab8d0cfbb2d8e461ebb73ce06db6

zzj0402

comment created time in 2 months

pull request commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

For @tensorflow/api-owners, this looks good.

wwwind

comment created time in 2 months

issue closedtensorflow/tensorflow

Support for .next() on tf.data.Dataset

System information

  • TensorFlow version (you are using): 2.1.0
  • Are you willing to contribute it (Yes/No): Yes (If approved)

Describe the feature and the current behavior/state. Currently when working with https://www.tensorflow.org/api_docs/python/tf/data/Dataset You can iterate a Dataset with a loop e.g.

for x,y in train:
   tf.print(x,y)

However if you want to get a single batch, the only option would be:

train.__iter__().next()

Currently there is only support for creating a numpy iterator, Example

train.as_numpy_iterator().next()

and the output is not a tf object.

The closest to the expected behavior of .next() would be train.take(1) which still needs to be converted to an iterator.

The current methods of getting a single batch are tedious and create bug prone code, multiple lines, and/or functions called.

Will this change the current api? How?

It will create a .next() function for tf.data.Dataset e.g.

https://www.tensorflow.org/api_docs/python/tf/data/Dataset#next

Who will benefit with this feature? Hopefully the entire community

Any Other info.

The feature could be implemented by using an existing iterator for the dataset object and when calling .next() simply picking an element from that iterator rather than creating an iterator every time for performance benefits.

closed time in 2 months

fostiropoulos

issue commenttensorflow/tensorflow

Support for .next() on tf.data.Dataset

@lgeiger suggestion is idiomatic use of tf.data APIs.

tf.data.Dataset.next() is problematic. What should happen when you call next() multiple times? I assume you would like different elements to be produced. But in that case tf.data would need to maintain internal iterator object to provide this functionality. If that's the case, how is the lifetime of this iterator object managed? When you create iterator explicitly via iter, the lifetime is tied to the lifetime of Python object which makes it possible to decide when the iterator object is no longer needed and can be destroyed. In contrast, if the iterator object was hidden inside of tf.data internals, then you would need an API for explicitly destroying it (since the iterator objects may be allocating a large amount of memory). There is tens of bugs opened against Keras leaking memory because of exactly this -- Keras backend session allocating memory that needs to be explicitly clear -- and we will not repeat that mistake for tf.data.

fostiropoulos

comment created time in 2 months

issue commenttensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

@zhuzilin the inefficiency related to asychronously pre-computing more data than is needed is not specific to this issue and applies to any of the existing asynchronous tf.data transformations (parallel map, parallel interleave, fused map + batch, and prefetch).

Pushing transformations that discard data (such as skip, take, and shard) as close to the data source as possible through graph rewrites makes sense and the tf.data team would welcome your contributions to that end. To get started, I suggest you take a look at existing tf.data graph rewrites, such as the no-op elimination optimization.

eriikj

comment created time in 2 months

issue commenttensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

Thank you for your explanation. I acknowledge the problem. A possible solution for it would be to decouple the configuration of parallelism from the configuration of the buffering. I was hoping to achieve using existing building blocks through my suggestion to use prefetch but I realized it would not work because of head-of-the-line blocking.

As you pointed out, it is possible to avoid this issue by using non-determinism. What you are requesting is to improve the performance of deterministic parallel map (at the expense of increased memory usage). However, there is a fundamental trade-off between performance and determinism. Even if tf.data would allow for extra buffering in its map transformation, the performance might not be as good as for the non-deterministic transformation (unless the buffering is unlimited but even then non-deterministc execution might achieve better cache locality).

For what it is worth, most users are happy with the "fast and non-deterministic" and "slower and deterministic" options. Having said that, I think that we should be able to investigate opportunities for the decoupling the parallelism and buffer size configurations in the context of our ongoing autotuning efforts.

eriikj

comment created time in 2 months

issue closedtensorflow/tensorflow

TensorFlow tf.data.Dataset API extremely slow for 3D-shaped pipeline

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.3 LTS
  • TensorFlow installed from (source or binary): binary pip
  • TensorFlow version (use command below): v2.1.0-rc2-17-ge5bf8de 2.1.0
  • Python version: Python 3.7.6
  • CUDA/cuDNN version: 7.6.4.38-1
  • GPU model and memory: Quadro RTX 6000 22752 MB

Describe the current behavior In the data pipeline given below, the performance is extremely low. The pipeline produces output of the shape ((128, None, 128), (128, None, 5)) and if either dimension 0 (batching) or dimension 1 (windowing) is removed, the performance is much higher. Unfortunately both batching and windowing are required for efficient and meaningful training.

Additionally, the iteration of the Dataset does not work as expected with 3D output. The shape of the output tensor (loop body at the end: print([x.shape for x in e])) is not printed at all. When removing either windowing or batching, the shape is printed as expected. From this I suspect this issue could also be a bug.

Timings for comparison (note: number of time steps / dim 1 (None) is avg. 312.5 = 40000 / 128):

  • 1x Shape ((128, None, 128), (128, None, 5)): 546.8976650238037 s
  • 1000x Shape ((128, 128), (128, 5)) (no windowing): 8.29618239402771 s * 312.5 / 1000 = 2.592556998133659375 s Speedup: 210x faster
  • 1000x Shape ((None, 128), (None, 5)) (no batching): 65.43172574043274 * 128 / 1000 = 8,37526089477539072 s Speedup: 65x faster

Describe the expected behavior It is expected for the performance to scale linearly with the amount of data. The shape of the output data should not have a big negative effect on performance.

Standalone code to reproduce the issue

import time
import random
import numpy as np
import tensorflow as tf

print("TensorFlow {}".format(tf.__version__))

num_features = 128
num_labels = 5
batch_size = 128
win_size = 2048
len_max = 600000

# create some sample .tfrecord files with different length

l1 = len_max - 44260
X1, Y1 = (np.ones((l1, num_features + 2)), np.zeros((l1, num_labels)))
l2 = len_max - 121340
X2, Y2 = (np.ones((l2, num_features + 2)), np.zeros((l2, num_labels)))

def write_tfrecord(X, Y, fn):
    with tf.Graph().as_default():
        ds = tf.data.Dataset.from_tensor_slices((X, Y))

        sstring = ds.map(lambda *x: 
           tf.reshape(tf.py_function(lambda *v:
               tf.train.Example(features=tf.train.Features(feature={
                   "features": tf.train.Feature(float_list=tf.train.FloatList(value=v[0].numpy())),
                   "label": tf.train.Feature(float_list=tf.train.FloatList(value=v[1].numpy())),
               })).SerializeToString(), x, tf.string
           ), ())
        )

        writer = tf.data.experimental.TFRecordWriter(fn)
        writer_op = writer.write(sstring)

        sess = tf.compat.v1.Session()
        sess.run(tf.compat.v1.global_variables_initializer())
        sess.run(writer_op)
        sess.close()

files_base = ["./temp1.tfrecord", "./temp2.tfrecord"]

write_tfrecord(X1, Y1, files_base[0])
write_tfrecord(X2, Y2, files_base[1])

# take 200 random files as dataset

files = random.choices(files_base, k=200)

ds_fs = tf.data.Dataset.list_files(files, shuffle=True, seed=1)
fs_len = 0
for f in ds_fs:
    fs_len += 1
print("Reading {} tfrecord files...".format(fs_len))

# create a StaticHashTable holding the length for trimming

ds_f_len_table = tf.lookup.StaticHashTable(tf.lookup.KeyValueTensorInitializer(
        tf.constant(files_base),
        tf.constant([random.randint(30000, 50000), random.randint(30000, 50000)], dtype=tf.int64)
), -1)

# prepare the Dataset

def prep_ds_file(file):
    _ds = tf.data.TFRecordDataset(file)
    _ds = _ds.map(lambda x: tf.io.parse_single_example(x, {
        "features": tf.io.FixedLenFeature([num_features + 2], tf.float32),
        "label": tf.io.FixedLenFeature([num_labels], tf.float32),
    }), num_parallel_calls=tf.data.experimental.AUTOTUNE)
    print(_ds)

    _ds = _ds.flat_map(lambda v: tf.data.Dataset.from_tensors((v["features"][2:], v["label"])))
    print(_ds)

    _trunc = ds_f_len_table.lookup(file)
    _ds = _ds.take(_trunc)
    print(_ds)
    _num_tsteps = _trunc // batch_size

    ####################################################################################################
    # WINDOWING                                                                                        #
    ####################################################################################################
    _ds = _ds.window(size=_num_tsteps, shift=win_size//2, stride=1, drop_remainder=True)               #
    print(_ds)                                                                                         #
    _ds = _ds.flat_map(lambda x, y: tf.data.Dataset.zip((x.batch(_num_tsteps), y.batch(_num_tsteps)))) #
    print(_ds)                                                                                         #
    ####################################################################################################

    ##################################################
    # BATCHING                                       #
    ##################################################
    _ds = _ds.batch(batch_size, drop_remainder=True) #
    print(_ds)                                       #
    ##################################################

    return _ds


def prep_ds(files):
    _ds = files.flat_map(prep_ds_file)
    print(_ds)
    return _ds


ds = prep_ds(ds_fs)

# read/use the Dataset

ts = time.time()
for e in ds.take(1):
    print([x.shape for x in e])
te = time.time()
print("Duration: {} s".format(te - ts))

Output:

TensorFlow 2.1.0
Reading 200 tfrecord files...
<ParallelMapDataset shapes: {features: (130,), label: (5,)}, types: {features: tf.float32, label: tf.float32}>
<FlatMapDataset shapes: ((128,), (5,)), types: (tf.float32, tf.float32)>
<TakeDataset shapes: ((128,), (5,)), types: (tf.float32, tf.float32)>
<WindowDataset shapes: (DatasetSpec(TensorSpec(shape=(128,), dtype=tf.float32, name=None), TensorShape([])), DatasetSpec(TensorSpec(shape=(5,), dtype=tf.float32, name=None), TensorShape([]))), types: (DatasetSpec(TensorSpec(shape=(128,), dtype=tf.float32, name=None), TensorShape([])), DatasetSpec(TensorSpec(shape=(5,), dtype=tf.float32, name=None), TensorShape([])))>
<FlatMapDataset shapes: ((None, 128), (None, 5)), types: (tf.float32, tf.float32)>
<BatchDataset shapes: ((128, None, 128), (128, None, 5)), types: (tf.float32, tf.float32)>
<FlatMapDataset shapes: ((128, None, 128), (128, None, 5)), types: (tf.float32, tf.float32)>
Duration: 546.8976650238037 s

closed time in 2 months

mimxrt

issue commenttensorflow/tensorflow

TensorFlow tf.data.Dataset API extremely slow for 3D-shaped pipeline

Sorry, I didn't realize you are normalizing the runtime by the missing shape. I think I found the problem, which is that your input pipeline with both batch and window, does not produce any elements. This is because each dataset will be truncated to ds = ds.take(trunc) elements. Let's say that this is 40k (which is the average). Your batch size is 128, so you window size will be 40000 // 128 = 312. So far so good. The problem is that you are using shift = win_size // 2 and win_size is fixed to be 2048 (as opposed to be being relative to the window size). Because of this, each window will be taken at 2048 offset and to create a batch of 128 windows, you would need at least 128 * 2048 elements, but you only have 40k. Because you are using batch with drop_remainder=True, partial batches will be dropped.

In other words, the performance you are listing as the baseline (using both window + batch), is the time needed to process all input data (and then throw it all away). You can validate my hypothesis by changing the loop bound of the baseline to an arbitrarily large number, which will not change the E2E time. Another way to validate this is to to set drop_remainder=False, which will result in only one partial batch being computed.

mimxrt

comment created time in 2 months

issue commenttensorflow/tensorflow

How to store tf dataset object to file?

I do not follow your use case. The loaded dataset will be a tf.data.Dataset so you could apply further transformations to it.

zzj0402

comment created time in 2 months

issue commenttensorflow/tensorflow

How to store tf dataset object to file?

Indeed. tf.data snapshot will be released in TF 2.3 as well and the aforementioned save and load API will share its implementation.

zzj0402

comment created time in 2 months

issue commenttensorflow/tensorflow

How to store tf dataset object to file?

Yes, it will be a streaming API. You will be able to do the following:

 # Save a dataset
  dataset = tf.data.Dataset.range(10).
  tf.data.experimental.save(dataset, "/path/to/data")

  # Load a previously saved dataset
  new_dataset = tf.data.experimental("/path/to/data",
       element_spec=tf.TensorSpec(shape=(), dtype=tf.int64))

The load API will require you to specify the type signature for the elements to load, which is required so that shape inference does not have to perform I/O.

zzj0402

comment created time in 2 months

issue commenttensorflow/tensorflow

How to store tf dataset object to file?

I am working on providing support for save and load and expect it to be available later this month (and certainly for TF 2.3).

zzj0402

comment created time in 2 months

pull request commenttensorflow/addons

super_serial: automate saving and restoring tfrecords

I am actually working on a PR that will provide support for save and load of datasets. It should be available later this month (and before TF 2.3).

markemus

comment created time in 2 months

issue commenttensorflow/tensorflow

tf.data.experimental.prefetch_to_device("/gpu:0") moves tensors back to CPU

This issue should be fixed by https://github.com/tensorflow/tensorflow/commit/8be4d61574f29568c8699708d88945b441bfd317

OutSorcerer

comment created time in 2 months

issue commenttensorflow/tensorflow

Using tf.Dataset in non-eager mode impossible

@tomerk could you please take a look (or triage to someone on the Keras team)? Thank you.

Flamefire

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add shape and type check for IteratorGetNextOp and ToSingleElementOp

 class ToSingleElementOp : public HybridAsyncOpKernel {       return errors::InvalidArgument("Dataset was empty.");     }     for (int i = 0; i < components.size(); ++i) {-      // TODO(mrry): Check that the shapes match the shape attrs.+      if (components[i].dtype() != output_types_[i]) {+        return errors::InvalidArgument(+            "The result does not match the expected type for "+            "component ",+            i, ". Expected: ", DataTypeString(output_types_[i]),+            ". Actual: ", DataTypeString(components[i].dtype()), ".");+      }+      if (!output_shapes_[i].IsCompatibleWith(components[i].shape())) {+        return errors::InvalidArgument(+            "The result does not match the expected shape "+            "for component ",+            i, ". Expected: ", output_shapes_[i].DebugString(),+            ". Actual: ", components[i].shape().DebugString(), ".");+      }

You could achieve what I am asking for (which in essence is to avoid code duplication) by the following refactoring:

Status VerifyTypesMatch(const DataType& expected, const DataType& received, int index) {
  if (expected != received) {
    return errors::InvalidArgument("Data type mismatch at component ", index,
        ": expected ", DataTypeString(expected), " but got ", DataTypeString(received), ".");
  }
  return Status::OK();
}

Status VerifyTypesMatch(const DataTypeVector& expected, const DataTypeVector& received) {
  if (expected.size() != received.size()) {
    return errors::InvalidArgument(
        "Number of components does not match: expected ", expected.size(),
        " types but got ", received.size(), ".");
  }
  for (size_t i = 0; i < expected.size(); ++i) {
    TF_RETURN_IF_ERROR(VerifyTypesMatch(expected[i], received[i], i));
  }
  return Status::OK();
}

Status VerifyTypesMatch(const DataTypeVector& expected, const std::vector<Tensor>& received) {
  if (expected.size() != received.size()) {
    return errors::InvalidArgument(
        "Number of components does not match: expected ", expected.size(),
        " types but got ", received.size(), ".");
  }
  for (size_t i = 0; i < expected.size(); ++i) {
    TF_RETURN_IF_ERROR(VerifyTypesMatch(expected[i], received[i].dtype(), i));
  }
  return Status::OK();
}

(and similarly for VerifyShapesCompatible).

Once you perform this refactoring you will be able to avoid code duplication (without incurring allocation). Feel free to also update ReduceDatasetOp::DoCompute to replace the currently duplicating code for type and shape checking with the new utilities.

zhuzilin

comment created time in 2 months

issue commenttensorflow/tensorflow

TensorFlow tf.data.Dataset API extremely slow for 3D-shaped pipeline

TLDR: Your expectation that shape of the output data should not affect the performance is not grounded is reality. tf.data is a streaming API and the time it takes for "batch" transformations such as window and batch to produce an output is proportional to the "batch" dimension.

Both window and batch end up increasing the number of input elements needed to produce an output by a constant -- window size and batch size respectively. For instance, if it takes 1s to fetch a single output from ds, then -- assuming there is not asynchrony and parallelism within ds -- fetching a single output from ds.window(size=10) would be expected to take 10 seconds and fetching a single output from ds.window(size=10).batch(batch_size=16) would be expected to take 160 seconds. This is expected.

Increasing shift beyond size means that your input pipeline will need to read even more input elements per output. For example, to produce a single output from ds.window(size=10, shift=20), you will need to consume 20 input elements per a single output element.

I recommend taking a look at the tf.data performance analysis guide to help you understand the performance of your input pipeline.

mimxrt

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add shape and type check for IteratorGetNextOp and ToSingleElementOp

 Status IteratorGetNextOp::DoCompute(OpKernelContext* ctx) {     return errors::OutOfRange("End of sequence");   }   for (int i = 0; i < components.size(); ++i) {-    // TODO(mrry): Check that the shapes match the shape attrs.+    if (components[i].dtype() != output_types_[i]) {+      return errors::InvalidArgument(+          "The result does not match the expected type for "+          "component ",+          i, ". Expected: ", DataTypeString(output_types_[i]),+          ". Actual: ", DataTypeString(components[i].dtype()), ".");+    }+    if (!output_shapes_[i].IsCompatibleWith(components[i].shape())) {+      return errors::InvalidArgument(+          "The result does not match the expected shape "+          "for component ",+          i, ". Expected: ", output_shapes_[i].DebugString(),+          ". Actual: ", components[i].shape().DebugString(), ".");+    }

Same comment.

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add shape and type check for IteratorGetNextOp and ToSingleElementOp

 class ToSingleElementOp : public HybridAsyncOpKernel {       return errors::InvalidArgument("Dataset was empty.");     }     for (int i = 0; i < components.size(); ++i) {-      // TODO(mrry): Check that the shapes match the shape attrs.+      if (components[i].dtype() != output_types_[i]) {+        return errors::InvalidArgument(+            "The result does not match the expected type for "+            "component ",+            i, ". Expected: ", DataTypeString(output_types_[i]),+            ". Actual: ", DataTypeString(components[i].dtype()), ".");+      }+      if (!output_shapes_[i].IsCompatibleWith(components[i].shape())) {+        return errors::InvalidArgument(+            "The result does not match the expected shape "+            "for component ",+            i, ". Expected: ", output_shapes_[i].DebugString(),+            ". Actual: ", components[i].shape().DebugString(), ".");+      }

Use VerifyTypesMatch and VerifyShapesCompatible instead.

zhuzilin

comment created time in 2 months

issue commenttensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

My definition for misconfiguration is to be using a fixed value of num_parallel_calls that results in sub-optimal performance.

I do not follow your explanation for why using num_parallel_calls that matches the size of the threadpool did not work for you. I am also not sure how to interpret your comment about the use of Keras Sequence API; are you currently using Keras Sequence APIs and are in the process of switching to tf.data? or have you used tf.data and because of issue with num_parallel_calls switched to using Keras Sequence API?

As for parallel map internals. The num_parallel_calls argument controls two things: 1) the maximum degree of parallelism and the size of internal buffer used for storing the results. If the buffer is full, the no further computation is performed. If you find the depth of the buffer insufficient (because of variance in the speed of produced and consumer), you should add prefetch to your input pipeline (as opposed to increasing the size of num_parallel_calls beyond the size of the threadpool).

eriikj

comment created time in 2 months

issue commenttensorflow/tensorflow

Dataset.padded_batch doc improvement request

Sounds good. @aaudiber should be able to help with any questions you have as he is the current owner of tf.data API documentation.

harahu

comment created time in 2 months

issue commenttensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

Sorry, didn't mean to close this.

eriikj

comment created time in 2 months

IssuesEvent

issue closedtensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 and Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.2
  • Python version: 3.6.7
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

Describe the current behavior When using a num_parallel_calls larger than the number of worker threads in the threadpool in a Dataset.map call, the order of execution is more or less random, causing a busty output behavior.

If the dataset map transform has a list of 20 elements to process, it typically processes them in a order that looks something like this: 5, 4, 7, 6, 1, 0, 2, 3, 18, 15, 10, 9, 13, 19, 14, 8, 17, 10, 12, 11

This is problematic since the output has to be contiguous, so no output will be available until a large portion of the calls in the threadpool have been processed.

I have attached a file with a self contained example which reproduces this behavior.

There are workarounds, such as allowing non-deterministic output, but in the long run, we want our trainings to be as deterministic as possible to aid debugging, so fixing this behavior would be very helpful for us.

Describe the expected behavior I expect the map call to start processing the next unprocessed element in the dataset whenever it has a free worker thread, so the results can be made available as soon as possible.

Standalone code to reproduce the issue See attached file. parallel_map_test.zip

Other info / logs Example output from the code is also available in the attached file.

closed time in 2 months

eriikj

issue commenttensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

@eriikj tf.data uses Eigen's threadpool under the hoods, which does not guarantee FIFO scheduling. Switching to a different threadpool implementation and/or implementing FIFO ordering within tf.data would be non-trivial and it is not clear to me that it would be worthwhile.

As per tf.data performance guide, users are recommended to use tf.data.experimental.AUTOTUNE for num_parallel_calls and leave the determination of the optimal value to the tf.data runtime. It is much more likely that the tf.data team will prioritize improving the autotuning logic over improving performance of misconfigured input pipelines.

eriikj

comment created time in 2 months

pull request commenttensorflow/tensorflow

Add step argument to SummaryWriter.(set_)as_default.

@gbaned please work with @nfelt and @foxik on adding tests for the new functionality.

foxik

comment created time in 2 months

pull request commenttensorflow/tensorflow

Fixed _save_model not working for batches in ModelCheckpoint Callback

For @tensorflow/api-owners: no need for API review.

ashutosh1919

comment created time in 2 months

pull request commenttensorflow/tensorflow

Add missing double overloads for GetAttr, GetNodeAttr, and TryGetNodeAttr

For @tensorflow/api-owners:

We agree that this is bad as is but unfortunately, this PR cannot be merged because it would break backwards compatibility. A path forward would be submitting the functionality as a new op.

Bidski

comment created time in 2 months

pull request commenttensorflow/tensorflow

[INTEL MKL] Added input name in the bfloat16 namescope interface so t…

For @tensorflow/api-owners:

@cuixiaom Note that this will not be included in any v1 release. Is there a particular reason we want to update the v1 APIs here?

cuixiaom

comment created time in 2 months

issue commenttensorflow/tensorflow

DynamicPaddedBatchDatasetOp for tf.data

I think this use case could be addressed using the existing tf.data.experimental.group_by_window transformation. For instance, for the input pipeline for the official Transformer model uses this transformation to group sentences based on length and creates padded batches for each bucket separately.

feihugis

comment created time in 3 months

issue commenttensorflow/tensorflow

[TF 2.2.0/TPU]: tf.data.Dataset segmentation fault with "the Encode() method is not implemented for DatasetVariantWrapper objects" after calling TPUCusterResolver()

As per TPU documentation, the TPU initialization needs to happen before any TensorFlow operations are executed.

So you will need to do the following:

...
  if use_tpu:
    tpu_cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
    tf.config.experimental_connect_to_cluster(tpu_cluster_resolver)
    tf.tpu.experimental.initialize_tpu_system(tpu_cluster_resolver)

  valid_dataset = build_dataset('gs://public-test-data-gs/valid', 64, 2048)
  valid_dataset = valid_dataset.repeat(2)

  if use_tpu:
    strategy = tf.distribute.experimental.TPUStrategy(tpu_cluster_resolver)
  else:
     strategy = tf.distribute.MirroredStrategy()
...
tarrade

comment created time in 3 months

issue commenttensorflow/tensorflow

reshuffle_each_iteration=False ignored on validation (tf.dataset + Keras)

reshuffle_each_iteration does not control whether to shuffle each iteration or not. It controls whether different order of shuffling should be used.

If your input pipeline contains shuffle, then each epoch will perform shuffling. If you wish to overlap filling up of shuffle buffer with computation, you should put repeat after shuffle. This will result in the filling up of the shuffle buffer needed for the 2nd epoch to be amortized over the 1st epoch (as opposed to happening at the beginning of each epoch) and so on. You might also want to put prefetch at the end of your input pipeline to overlap training computation with preprocessing computation (see tf.data performance guide for more details).

Kerybas

comment created time in 3 months

pull request commenttensorflow/tensorflow

exclude_col parameter in CSVDataset.

@rachellim could you please review? thanks

stjohnso98

comment created time in 3 months

issue commenttensorflow/tensorflow

Very slow recording of TFRecordWriter with tf.data.Dataset.shard()

The official documentation:

"Generally it is best if the shard operator is used early in the dataset pipeline. For example, when reading from a set of TFRecord files, shard before converting the dataset to input samples. ..."

This is because shard will evaluate the entire upstream input pipeline filtering out (num_shards - 1) / num_shards of the data. In your example, you are running the input pipeline inside of a for loop, which means that you will preprocess the entire input pipeline num_shards many times.

You could use window instead of shard to divide up the dataset into chunks and save each chunk separately along the following lines:

dataset = ...
dataset = dataset.window(ELEMENTS_PER_FILE)
dataset = dataset.enumerate()

def write_data(i, dataset):
  out_path = os.path.join(output_folder, f'{folder}{i+1}.tfrecord')
  dump_dataset(dataset, out_path)
  return out_path

dataset = dataset.map(write_dataset, num_parallel_calls=tf.data.experimental.AUTOTUNE)

If you get rid of using from_generator in your dump_dataset method, you would be able to do the writing in parallel as well (which is additional inefficiency of your program -- writing a unnecessarily serialized on Python GIL).

theonekeyg

comment created time in 3 months

issue commenttensorflow/tensorflow

Loding dataset with TFRecord throws incompatible with the layer

@tomerk can you please take a look? thanks

ll01

comment created time in 3 months

issue closedtensorflow/tensorflow

TextLineDataset could be more expressive

TextLineDataset could be more expressive. For instance, it could have more arguments like:

train_dataset = tf.data.TextLineDataset(
                                     file_path, # dataset file path
                                     format, # file format: jsonl, csv, etc.
                                     fields # a set of colums
)

It would also be incredible if there was a way to indicate how to tokenize each of the fields.

closed time in 3 months

Ceceu

issue commenttensorflow/tensorflow

TextLineDataset could be more expressive

TextLineDataset is for reading text lines. For different types of format, there are dedicated tf.data sources, such as tf.data.experimental.CsvDataset or JsonIODataset from the tensorflow/io repository.

Ceceu

comment created time in 3 months

issue commenttensorflow/tensorflow

Dataset iterating different behavior in TF 2.1 and 2.2

@tomerk could you please take a look? this looks to be rooted in Keras and not tf.data.

sondracek

comment created time in 3 months

issue commenttensorflow/tensorflow

Extension of the Data API `take` method to accept percent values

Thanks. If you do not care about selecting the first 20% but any 20% (approximately), then you can do so by filter:

import tensorflow as tf

dataset = tf.data.Dataset.range(50)
dataset = dataset.filter(lambda _: tf.less(tf.random.uniform(shape=[], maxval=100, dtype=tf.int32, seed=12), 20))

# Iterating through the dataset multiple times will print the same elements 
# because we use a fixed seed for the random uniform call.
for elem in dataset:
  print(elem)

for elem in dataset:
  print(elem) 
milost

comment created time in 3 months

IssuesEvent

issue closedtensorflow/tensorflow

TFRecordDataset mapped with crop is heavily impacted by image sizes

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10, Ubuntu 18.04, Colab
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): 2.2
  • Python version: 3.7.7, 3.6.9
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: Colab's GPU, V100 32GB, CPU

Describe the current behavior When loading a dataset with large images, and mapping it to random crop, iterating over the dataset is significantly slower, even for a small dataset of 10 images in repeat. In my example I compare a dataset of 10 240x240x3 images, to one with ten 2400x2400x3 images, both of them randomly cropped to 120x120x3 - the latter was about 100 times slower to operate on (reduce_sum, in my toy example).

Describe the expected behavior I would expect there to be no effect from the size of the images in the dataset if they are being cropped right after being loaded once the dataset resides in memory.

Standalone code to reproduce the issue https://colab.research.google.com/drive/1uGeLJJp2gPvgC37WxbP_yANAXB8D83ld#scrollTo=DddLV9gxm5eL

closed time in 3 months

feature-engineer

issue commenttensorflow/tensorflow

TFRecordDataset mapped with crop is heavily impacted by image sizes

You need to run one epoch worth of data to populate the cache before you start timing things.

Get rid of the trailing repeat() and do:

# Warm up caches
for elem in data_from_small:
  pass
for elem in data_from_big:
  pass

start = time.time()
tf.reduce_sum([x for x in data_from_small.repeat().take(1000)])
print(time.time() - start)

start = time.time()
tf.reduce_sum([x for x in data_from_big.repeat().take(1000)])
print(time.time() - start)

I would expect there to be no difference in runtime at that point.

feature-engineer

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Allow single element padding_values to be broadcasted to a structure for tf.data.Dataset.padded_batch

 def padded_batch(self,     * If the dimension is unknown, the component will be padded out to the       maximum length of all elements in that dimension. +    The `padding_values` argument determines the values used to pad each+    component to the respective shape. The `padding_values` should have the+    same structure as the input dataset. If `padding_values` is a single+    element and the input dataset has multiple components, then the same+    `padding_values` will be used to pad every component of the dataset.+    If `padding_values` is a scalar, then its value will be broadcasted+    to match the shape of each component.

This should be included in the "Args" section of the docstring instead.

yongtang

comment created time in 3 months

issue commenttensorflow/tensorflow

Very slow recording of TFRecordWriter with tf.data.Dataset.shard()

Why not use the method suggested by the official documentation?

theonekeyg

comment created time in 3 months

issue commenttensorflow/tensorflow

TFRecordDataset mapped with crop is heavily impacted by image sizes

repeat re-executes the input pipeline, if you would like to cache the results of previous repeat iterations, you should use the cache transformation.

feature-engineer

comment created time in 3 months

issue commenttensorflow/tensorflow

Why tf data window not using in tutorials/structured_data/time_series

@lamberta this question is related to a tutorial you worked on, could you please take a look? thank you

jimmy6

comment created time in 3 months

issue commenttensorflow/tensorflow

"shuffle_and_repeat_fusion" optimizer content incorrect on s390x arch (big-endian)

adding @gharibian as he is more familiar with TensorFlow strings

rposts

comment created time in 3 months

issue commenttensorflow/tensorflow

Breakpoints do not stop inside tf.function

Reassigning to @jaingaurav for triage as this is not specific to tf.data.

edurenye

comment created time in 3 months

more