profile
viewpoint
Andrew Audibert aaudiber Tensorflow United States

aaudiber/alluxio 1

Memory-Centric Virtual Distributed Storage System

aaudiber/2nd-semester-introduction-to-computer-science-principles 0

A 2nd semester follow-up to the TEALS Intro CS course

aaudiber/algorithms 0

interesting algorithms

aaudiber/alluxio-extensions 0

Alluxio Extensions

aaudiber/alluxio-test-client 0

Basic Alluxio client to help with testing

aaudiber/Alosapien 0

Team building Program

aaudiber/atomix 0

A reactive framework for building fault-tolerant distributed systems for the JVM

aaudiber/clahub 0

Easy contributor license agreements for your GitHub projects.

aaudiber/community 0

Stores documents used by the TensorFlow developer community

aaudiber/copycat 0

A novel implementation of the Raft consensus algorithm

issue commenttensorflow/tensorflow

S3 ParseURI supporting query parameters

I think this support would be implemented by modifying s3_file_system.cc, as opposed to making a change within TFRecordDataset. Reassigning to Mihai, who is more familiar with the filesystem level.

mjnovice

comment created time in 5 days

issue commenttensorflow/tensorflow

tf.data.Dataset API with ImageGenerator => ValueError: as_list() is not defined on an unknown TensorShape.

Hi @vladbph, I get permission denied when I try to access the colab. Can you make it publicly accessible? Thanks.

vladbph

comment created time in 6 days

issue commenttensorflow/tensorflow

Returning tf.data.UNKNOWN_CARDINALITY when the cardinality can be easily computed

Thanks for bringing this up @andrescodas! I've updated the cardinality docs in https://github.com/tensorflow/tensorflow/commit/2fe61b03cd2703de2658ece095edb7a0ada3f681

andrescodas

comment created time in 12 days

Pull request review commenttensorflow/tensorflow

Extended test cases and added eager mode tests for compression ops

 def _test_objects():   ]  +def _test_eager_objects():+  return [

Got it, thanks!

kvignesh1420

comment created time in 12 days

Pull request review commenttensorflow/tensorflow

Extended test cases and added eager mode tests for compression ops

 def testDatasetCompression(self, element):     dataset = dataset.map(lambda x: compression_ops.uncompress(x, element_spec))     self.assertDatasetProduces(dataset, [element]) +  @combinations.generate(combinations.times(

Can you add the eager_only objects to the combinations of the existing tests? You can add the combinations like

combinations.generate(
  combinations.times(
    test_base.default_test_combinations(), combinations.combine(element=_test_objects()) +
  combinations.times(
    test_base.eager_only_combinations(), combinations.combine(element=_eager_only_test_objects()))
kvignesh1420

comment created time in 12 days

Pull request review commenttensorflow/tensorflow

Extended test cases and added eager mode tests for compression ops

 def _test_objects():   ]  +def _test_eager_objects():+  return [

do these objects not work in graph mode?

kvignesh1420

comment created time in 12 days

Pull request review commenttensorflow/tensorflow

Added tests for unsupported types in UniqueDataset

 from tensorflow.python.data.ops import dataset_ops from tensorflow.python.framework import combinations from tensorflow.python.framework import dtypes+from tensorflow.python.framework import errors

This import isn't needed any longer

kvignesh1420

comment created time in 18 days

pull request commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

Are you sure the tests are actually covering unique_dataset? It looks like the error occurs in the generator dataset, before reaching the unique dataset.

kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  def _checkDatasetRaises(self, dtype, test_cases, error):+    """Test whether the dataset raises the appropriate errors+    while generating the outputs.++    Args:+      dtype: The actual `dtype` of the elements in each test case.+      test_cases: A list of lists. The dataset will be created from the list items.+      error: The expected error to be raised when a corrupted item in encountered.+    """++    current_test_case = []+    dataset = dataset_ops.Dataset.from_generator(lambda: current_test_case,+                                                 dtype).apply(unique.unique())++    for test_case in test_cases:+      current_test_case = test_case+      with self.assertRaises(error):+        _ = self.getDatasetOutput(dataset)

can remove the _ =

kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  def _checkDatasetRaises(self, dtype, test_cases, error):+    """Test whether the dataset raises the appropriate errors+    while generating the outputs.++    Args:+      dtype: The actual `dtype` of the elements in each test case.

This is the expected dtype, not actual

kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  def _checkDatasetRaises(self, dtype, test_cases, error):+    """Test whether the dataset raises the appropriate errors+    while generating the outputs.++    Args:+      dtype: The actual `dtype` of the elements in each test case.+      test_cases: A list of lists. The dataset will be created from the list items.+      error: The expected error to be raised when a corrupted item in encountered.+    """++    current_test_case = []+    dataset = dataset_ops.Dataset.from_generator(lambda: current_test_case,+                                                 dtype).apply(unique.unique())++    for test_case in test_cases:+      current_test_case = test_case+      with self.assertRaises(error):+        _ = self.getDatasetOutput(dataset)++  @combinations.generate(test_base.graph_only_combinations())+  def testStringTypeMismatch(self):+    """Should raise InternalError when element type doesn't match+    with dtypes.string."""++    test_cases = [+        ["hello", 1],+        ["hello", "hello", "world", 3],+        ["hello", 1, 1],+        ["hello", "world", 1, 2],+        [1, "hello"],+        [1, 2, "hello"],+        [1, 3, "hello", "world"],+        [1, 1, "hello", "hello"]+    ]+    self._checkDatasetRaises(dtype=dtypes.string, test_cases=test_cases,+                             error=errors.InternalError)++  @combinations.generate(test_base.graph_only_combinations())+  def testInt32TypeMismatch(self):+    """Should raise InvalidArgumentError when element type doesn't+    match with dtypes.int32"""++    test_cases = [+        [1, "foo"],+        [1, 2, "bar"],+        [1, 3, "foo", "bar"],+        [1, 4, "foo", "foo"],+        ["bar", 1],+        ["bar", "foo", 2],+        ["bar", "bar", "foo", 3],+        ["foo", 1, 1],+        ["bar", "bar", 1, 1],+    ]+    self._checkDatasetRaises(dtype=dtypes.int32, test_cases=test_cases,+                             error=errors.InvalidArgumentError)++  @combinations.generate(test_base.graph_only_combinations())+  def testInt64TypeMismatch(self):+    """Should raise InvalidArgumentError when element type doesn't+    match with dtypes.int64."""++    test_cases = [+        [2, "hello"],+        [3, 2, "hello"],+        [5, 3, "hello", "world"],+        [6, 7, "hello", "hello"],+        ["hello", 6],+        ["hello", "world", 8],+        ["hello", "hello", "world", 8],+        ["hello", 9, 9],+        ["hello", "world", 10, 10],+    ]+    self._checkDatasetRaises(dtype=dtypes.int64, test_cases=test_cases,+                             error=errors.InvalidArgumentError)++  @combinations.generate(test_base.graph_only_combinations())+  def testUnsupportedTypes(self):+    """Should raise TypeError when element type doesn't match with the+    dtypes.int64, dtypes.int32 or dtypes.string (supported types)."""++    sample_unsupported_types = [dtypes.bool, dtypes.double, dtypes.complex64,+                                dtypes.float32, dtypes.float64, dtypes.qint16, dtypes.qint32]+    current_test_case = []

can inline this since we don't have a list of test cases here

kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  def _checkDatasetRaises(self, dtype, test_cases, error):+    """Test whether the dataset raises the appropriate errors+    while generating the outputs.++    Args:+      dtype: The actual `dtype` of the elements in each test case.+      test_cases: A list of lists. The dataset will be created from the list items.+      error: The expected error to be raised when a corrupted item in encountered.+    """++    current_test_case = []+    dataset = dataset_ops.Dataset.from_generator(lambda: current_test_case,+                                                 dtype).apply(unique.unique())++    for test_case in test_cases:+      current_test_case = test_case+      with self.assertRaises(error):+        _ = self.getDatasetOutput(dataset)++  @combinations.generate(test_base.graph_only_combinations())+  def testStringTypeMismatch(self):+    """Should raise InternalError when element type doesn't match+    with dtypes.string."""++    test_cases = [+        ["hello", 1],+        ["hello", "hello", "world", 3],+        ["hello", 1, 1],+        ["hello", "world", 1, 2],+        [1, "hello"],+        [1, 2, "hello"],+        [1, 3, "hello", "world"],+        [1, 1, "hello", "hello"]+    ]+    self._checkDatasetRaises(dtype=dtypes.string, test_cases=test_cases,+                             error=errors.InternalError)++  @combinations.generate(test_base.graph_only_combinations())+  def testInt32TypeMismatch(self):+    """Should raise InvalidArgumentError when element type doesn't+    match with dtypes.int32"""++    test_cases = [+        [1, "foo"],+        [1, 2, "bar"],+        [1, 3, "foo", "bar"],+        [1, 4, "foo", "foo"],+        ["bar", 1],+        ["bar", "foo", 2],+        ["bar", "bar", "foo", 3],+        ["foo", 1, 1],+        ["bar", "bar", 1, 1],+    ]+    self._checkDatasetRaises(dtype=dtypes.int32, test_cases=test_cases,+                             error=errors.InvalidArgumentError)

Why does the string case raise InternalError while the int case raises InvalidArgumentError?

kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  def _checkDatasetRaises(self, dtype, test_cases, error):+    """Test whether the dataset raises the appropriate errors+    while generating the outputs.++    Args:+      dtype: The actual `dtype` of the elements in each test case.+      test_cases: A list of lists. The dataset will be created from the list items.+      error: The expected error to be raised when a corrupted item in encountered.+    """++    current_test_case = []+    dataset = dataset_ops.Dataset.from_generator(lambda: current_test_case,+                                                 dtype).apply(unique.unique())++    for test_case in test_cases:+      current_test_case = test_case+      with self.assertRaises(error):+        _ = self.getDatasetOutput(dataset)++  @combinations.generate(test_base.graph_only_combinations())+  def testStringTypeMismatch(self):+    """Should raise InternalError when element type doesn't match+    with dtypes.string."""++    test_cases = [+        ["hello", 1],+        ["hello", "hello", "world", 3],+        ["hello", 1, 1],+        ["hello", "world", 1, 2],+        [1, "hello"],+        [1, 2, "hello"],+        [1, 3, "hello", "world"],+        [1, 1, "hello", "hello"]+    ]+    self._checkDatasetRaises(dtype=dtypes.string, test_cases=test_cases,+                             error=errors.InternalError)++  @combinations.generate(test_base.graph_only_combinations())+  def testInt32TypeMismatch(self):+    """Should raise InvalidArgumentError when element type doesn't+    match with dtypes.int32"""++    test_cases = [+        [1, "foo"],+        [1, 2, "bar"],+        [1, 3, "foo", "bar"],+        [1, 4, "foo", "foo"],+        ["bar", 1],+        ["bar", "foo", 2],+        ["bar", "bar", "foo", 3],+        ["foo", 1, 1],+        ["bar", "bar", 1, 1],+    ]+    self._checkDatasetRaises(dtype=dtypes.int32, test_cases=test_cases,+                             error=errors.InvalidArgumentError)++  @combinations.generate(test_base.graph_only_combinations())+  def testInt64TypeMismatch(self):

is the only relevant difference between this and the int32 test the dtype? If so we could parameterize a single test with

@combinations.generate(combinations.times(
    test_base.graph_only_combinations(),
    combinations.combine(dtype=[dtypes.int32, dtypes.int64]))
def testIntTypeMismatch(self, dtype):
kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  def _checkDatasetRaises(self, dtype, test_cases, error):+    """Test whether the dataset raises the appropriate errors+    while generating the outputs.++    Args:+      dtype: The actual `dtype` of the elements in each test case.+      test_cases: A list of lists. The dataset will be created from the list items.+      error: The expected error to be raised when a corrupted item in encountered.

remove "when a corrupted item is encountered"

kvignesh1420

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # Placeholder values are needed to fill in the expected array with dummy value so that,+    # when the dataset generates the element and observes that there is a type mismatch,+    # it raises the proper error and not an OutOfRangeError which occurs when it is unable+    # to fetch an element to compare from the expected array in the first place.+    string_placeholder = ""+    int32_placeholder = 0+    int64_placeholder = 0

It seems like the root issue is that _testSimpleHelper isn't intended for test cases that raise exceptions. Instead of using _testSimpleHelper, can we write a "_checkDatasetRaises" method which runs through the dataset and checks that it raises the expected error? To run through a dataset, you can call self.getDatasetOutput().

It would be good if we can make the test simple and self-explanatory enough that the variable and method names are enough documentation, and no further comments are required. Consider splitting the test into multiple smaller tests which each focus on a particular type of mismatch.

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    with self.assertRaises(errors.InternalError):+      self._testSimpleHelper(dtypes.string, [+          (["hello", 1, 2, 1], ["hello"]),+          (["hello", "world", 1], ["hello", "world"]),+          (["hello", "hello", "world", 1, 2], ["hello", "world"]),+          (["hello", "world", 1, 1, 2], ["hello", "world"]),+          ([1, 2, "hello"], ["hello"]),+          ([1, 1, 2, 3, 3, "hello"], ["hello"]),+      ])++      self._testSimpleHelper(dtypes.int32, [+          ([1, "hello", "world"], [1]),+          ([1, 2, 1, "hello", "hello", "world"], [1, 2]),+          (["hello", 1, 2], [1, 2]),+          (["hello", 1, 1, 2, 3, 3], [1, 2, 3]),+      ])++      self._testSimpleHelper(dtypes.int64, [+          ([2, 3, "hello", "world"], [2, 3]),+          ([2, 3, 3, "hello", "hello", "world"], [2, 3]),+          (["hello", 2, 2], [2]),+          (["hello", "hello", 1, 1, 2, 3], [1, 2, 3]),

ditto

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    with self.assertRaises(errors.InternalError):+      self._testSimpleHelper(dtypes.string, [+          (["hello", 1, 2, 1], ["hello"]),+          (["hello", "world", 1], ["hello", "world"]),+          (["hello", "hello", "world", 1, 2], ["hello", "world"]),+          (["hello", "world", 1, 1, 2], ["hello", "world"]),+          ([1, 2, "hello"], ["hello"]),+          ([1, 1, 2, 3, 3, "hello"], ["hello"]),+      ])++      self._testSimpleHelper(dtypes.int32, [+          ([1, "hello", "world"], [1]),+          ([1, 2, 1, "hello", "hello", "world"], [1, 2]),+          (["hello", 1, 2], [1, 2]),+          (["hello", 1, 1, 2, 3, 3], [1, 2, 3]),+      ])++      self._testSimpleHelper(dtypes.int64, [+          ([2, 3, "hello", "world"], [2, 3]),+          ([2, 3, 3, "hello", "hello", "world"], [2, 3]),+          (["hello", 2, 2], [2]),

ditto

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    with self.assertRaises(errors.InternalError):+      self._testSimpleHelper(dtypes.string, [+          (["hello", 1, 2, 1], ["hello"]),+          (["hello", "world", 1], ["hello", "world"]),+          (["hello", "hello", "world", 1, 2], ["hello", "world"]),+          (["hello", "world", 1, 1, 2], ["hello", "world"]),+          ([1, 2, "hello"], ["hello"]),+          ([1, 1, 2, 3, 3, "hello"], ["hello"]),+      ])++      self._testSimpleHelper(dtypes.int32, [+          ([1, "hello", "world"], [1]),+          ([1, 2, 1, "hello", "hello", "world"], [1, 2]),+          (["hello", 1, 2], [1, 2]),+          (["hello", 1, 1, 2, 3, 3], [1, 2, 3]),

ditto

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    with self.assertRaises(errors.InternalError):+      self._testSimpleHelper(dtypes.string, [+          (["hello", 1, 2, 1], ["hello"]),+          (["hello", "world", 1], ["hello", "world"]),+          (["hello", "hello", "world", 1, 2], ["hello", "world"]),+          (["hello", "world", 1, 1, 2], ["hello", "world"]),+          ([1, 2, "hello"], ["hello"]),

should this be []? I think it should fail when it reads the first element, and never produce "hello"

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    with self.assertRaises(errors.InternalError):+      self._testSimpleHelper(dtypes.string, [+          (["hello", 1, 2, 1], ["hello"]),+          (["hello", "world", 1], ["hello", "world"]),+          (["hello", "hello", "world", 1, 2], ["hello", "world"]),+          (["hello", "world", 1, 1, 2], ["hello", "world"]),+          ([1, 2, "hello"], ["hello"]),+          ([1, 1, 2, 3, 3, "hello"], ["hello"]),+      ])++      self._testSimpleHelper(dtypes.int32, [+          ([1, "hello", "world"], [1]),+          ([1, 2, 1, "hello", "hello", "world"], [1, 2]),+          (["hello", 1, 2], [1, 2]),

ditto

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    with self.assertRaises(errors.InternalError):+      self._testSimpleHelper(dtypes.string, [+          (["hello", 1, 2, 1], ["hello"]),+          (["hello", "world", 1], ["hello", "world"]),+          (["hello", "hello", "world", 1, 2], ["hello", "world"]),+          (["hello", "world", 1, 1, 2], ["hello", "world"]),+          ([1, 2, "hello"], ["hello"]),+          ([1, 1, 2, 3, 3, "hello"], ["hello"]),

ditto

kvignesh1420

comment created time in 20 days

issue commenttensorflow/tensorflow

Shuffling then zip tf.data.Dataset

@mathieuorhan,

tf.data.Dataset objects don't eagerly compute all of their data. They work like blueprints, where the data is computed on the fly every time you iterate through the dataset. As a result, iterating through the same dataset multiple times could result in different output.

The Dataset.zip transformation takes multiple datasets and iterates through them in parallel. If you want to iterate through the input just once, use map instead of zip:

import tensorflow as tf
master = tf.data.Dataset.range(10)
master = master.shuffle(10)
dataset = master.map(lambda x: (x, -x))
list(dataset.as_numpy_iterator())
[(3, -3),
 (9, -9),
 (7, -7),
 (5, -5),
 (2, -2),
 (8, -8),
 (1, -1),
 (4, -4),
 (0, 0),
 (6, -6)]
mathieuorhan

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def testSimpleString(self):         (["foo", "bar", "baz", "baz", "bar", "foo"], ["foo", "bar", "baz"]),     ]) +  @combinations.generate(test_base.graph_only_combinations())+  def testTypeMismatch(self):++    # raises InternalError when dtypes don't match.+    # NOTE: Generating the following expected outputs can be considered/taken up as an

In TensorFlow, tensors have a specific dtype which is not allowed to change. I think it makes sense to validate the behavior of raising an error when the types don't match. But having the type change mid-dataset isn't something we plan to support, so we shouldn't have a test suggesting such a plan.

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Added tests for type mismatches in UniqueDataset

 def _testSimpleHelper(self, dtype, test_cases):     for test_case, expected in test_cases:       current_test_case = test_case       self.assertDatasetProduces(dataset, [-          compat.as_bytes(element) if dtype == dtypes.string else element+          compat.as_bytes(+              element) if dtype == dtypes.string else element

This might cause linting warnings, should probably undo this formatting change.

kvignesh1420

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

 class ShardDatasetOp::Dataset : public DatasetBase {         return Status::OK();       } -      std::vector<Tensor> result;-      do {-        result.clear();-        TF_RETURN_IF_ERROR(input_impl_->GetNext(ctx, &result, end_of_sequence));-        if (*end_of_sequence) {-          input_impl_.reset();-          return Status::OK();-        }-      } while ((next_index_++ % dataset()->num_shards_) != dataset()->index_);+      int num_to_skip = (dataset()->index_ - next_index_) %+                        dataset()->num_shards_;+      if (num_to_skip < 0) {+        num_to_skip += dataset()->num_shards_;+      }+      int num_skipped;+      TF_RETURN_IF_ERROR(input_impl_->Skip(ctx, num_to_skip, end_of_sequence,+                                           &num_skipped));+      next_index_ += num_skipped;

Got it, I missed the require_non_empty_ handling. Your current implementation looks good to me.

zhuzilin

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

Add SkipRecords to RecordReader

 Status RecordReader::ReadRecord(uint64* offset, tstring* record) {   return Status::OK(); } -Status RecordReader::SkipRecords(uint64* offset, int num_to_skip) {+Status RecordReader::SkipRecords(uint64* offset, int num_to_skip,+                                 int* num_skipped) {   TF_RETURN_IF_ERROR(PositionInputStream(*offset));    Status s;   tstring record;+  *num_skipped = 0;   for (int i = 0; i < num_to_skip; ++i) {     s = ReadChecksummed(*offset, sizeof(uint64), &record);     if (!s.ok()) {       last_read_failed_ = true;       return s;     }     const uint64 length = core::DecodeFixed64(record.data());-    input_stream_->SkipNBytes(length + kFooterSize);++    // Skip data+    s = input_stream_->SkipNBytes(length + kFooterSize);+    if (!s.ok()) {+      last_read_failed_ = true;+      if (errors::IsOutOfRange(s)) {+        s = errors::DataLoss("truncated record at ", *offset);

include the original error message as well, in case it has useful information

zhuzilin

comment created time in 21 days

PR opened tensorflow/ecosystem

Rename master to dispatcher in tf.data service example.

master was renamed to dispatcher in https://github.com/tensorflow/tensorflow/commit/c3f2d3d5710caa789eb54a4d24280a818b9375ba

+14 -14

0 comment

4 changed files

pr created time in 22 days

create barnchaaudiber/ecosystem

branch : dispatcher

created branch time in 22 days

Pull request review commenttensorflow/tensorflow

Add SkipRecords to RecordReader

 Status RecordReader::ReadRecord(uint64* offset, tstring* record) {   return Status::OK(); } +Status RecordReader::SkipRecords(uint64* offset, int num_to_skip) {

can this return the number of elements skipped? I could see the called wanting to know that in some cases. That could be more useful than OUT_OF_RANGE

zhuzilin

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

Add SkipRecords to RecordReader

 TEST(RecordReaderWriterTest, TestBasics) {   } } +TEST(RecordReaderWriterTest, TestSkip) {

Add tests for error cases, such as skipping more records than there are left and reading from a truncated file.

zhuzilin

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

Add SkipRecords to RecordReader

 Status RecordReader::ReadRecord(uint64* offset, tstring* record) {   return Status::OK(); } +Status RecordReader::SkipRecords(uint64* offset, int num_to_skip) {+  TF_RETURN_IF_ERROR(PositionInputStream(*offset));++  Status s;+  tstring record;+  for (int i = 0; i < num_to_skip; ++i) {+    s = ReadChecksummed(*offset, sizeof(uint64), &record);+    if (!s.ok()) {+      last_read_failed_ = true;+      return s;+    }+    const uint64 length = core::DecodeFixed64(record.data());+    input_stream_->SkipNBytes(length + kFooterSize);+    *offset += kHeaderSize + length + kFooterSize;+    DCHECK_EQ(*offset, input_stream_->Tell());

Could the DCHECK fail due to SkipNBytes reaching end of file or having an IO error? If so we should return a Status instead of using DCHECK

zhuzilin

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

 Status DatasetBaseIterator::GetNext(IteratorContext* ctx,   return s; } +Status DatasetBaseIterator::Skip(IteratorContext* ctx, int num_to_skip,+                                 bool* end_of_sequence, int* num_skipped) {+  profiler::TraceMe activity([&] { return BuildTraceMeName(); },+                             profiler::TraceMeLevel::kInfo);+  DVLOG(3) << prefix() << " Skip enter";+  RecordStart(ctx, /*stop_output=*/true);+  Status s = SkipInternal(ctx, num_to_skip, end_of_sequence, num_skipped);+  if (s.ok() && !*end_of_sequence) RecordElement(ctx);

After skipping num_skipped elements, this will record the iterator as having produced one element. This could confuse the autotuning implementation, which uses these recorded times to decide how to tune parallelism levels and buffer sizes. We need to figure out a way to make autotuning interact well with skipping. Looping in @jsimsa, who may have a recommendation for what to do here.

zhuzilin

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

 class ShardDatasetOp::Dataset : public DatasetBase {         return Status::OK();       } -      std::vector<Tensor> result;-      do {-        result.clear();-        TF_RETURN_IF_ERROR(input_impl_->GetNext(ctx, &result, end_of_sequence));-        if (*end_of_sequence) {-          input_impl_.reset();-          return Status::OK();-        }-      } while ((next_index_++ % dataset()->num_shards_) != dataset()->index_);+      int num_to_skip = (dataset()->index_ - next_index_) %+                        dataset()->num_shards_;+      if (num_to_skip < 0) {+        num_to_skip += dataset()->num_shards_;+      }+      int num_skipped;+      TF_RETURN_IF_ERROR(input_impl_->Skip(ctx, num_to_skip, end_of_sequence,+                                           &num_skipped));+      next_index_ += num_skipped;

Do we need to keep track of next_index_? It seems like each call to GetNext could call input_impl->Skip(num_shards - 1), then input_impl_->GetNext(), and there's no need to remember the index across calls to GetNext

zhuzilin

comment created time in 22 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

 class ShardDatasetOp::Dataset : public DatasetBase {         return Status::OK();       } -      std::vector<Tensor> result;-      do {-        result.clear();-        TF_RETURN_IF_ERROR(input_impl_->GetNext(ctx, &result, end_of_sequence));-        if (*end_of_sequence) {-          input_impl_.reset();-          return Status::OK();-        }-      } while ((next_index_++ % dataset()->num_shards_) != dataset()->index_);+      int num_to_skip = (dataset()->index_ - next_index_) %+                        dataset()->num_shards_;+      if (num_to_skip < 0) {+        num_to_skip += dataset()->num_shards_;+      }

Should num_to_skip always be dataset()->num_shards_ - 1?

zhuzilin

comment created time in 22 days

create barnchaaudiber/tensorflow

branch : cherrypicks_DM5NO

created branch time in a month

PR opened tensorflow/tensorflow

Update "master" to "dispatch"/"dispatcher" in tf.data service terminology

Dispatcher is more descriptive and follows the guidance in https://developers.google.com/style/word-list#master

PiperOrigin-RevId: 321613785 Change-Id: Iaa576d35f0581e21278101f8b31201ba737a6865

+367 -353

0 comment

30 changed files

pr created time in a month

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@zhuzilin This should merge soon, I just need to fix up a couple internal tests

zhuzilin

comment created time in a month

pull request commenttensorflow/tensorflow

[BugFix] - Prefetching on GPU is actually executed on CPU

@DEKHTIARJonathan The linked commit fixes a bug where the prefetch dataset would be placed on CPU. As demonstrated by the included unit test, the prefetch dataset is now placed on GPU.

There is a separate issue that iterators are not automatically colocated with their inputs. We originally tried to address both issues with this PR, but automatic colocation broke internal tests, so we had to roll this PR back. It will take some time to figure out the root issue. It should be possible to work around the issue by explicitly putting the get_next inside a device scope. Could you open a separate issue for iterator colocation? Please include details about why it is important, to help us prioritize the issue.

DEKHTIARJonathan

comment created time in a month

pull request commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

@zhuzilin Thanks for explaining the motivation, I think it makes sense.

This is one of the design issues I hope to discuss 😄 . I agree that Skip(num_to_skip) would potentially have better performance than SkipNext, but I'm not sure how to deal with input_impl_.reset() when end_of_sequence is met. Are we going to put the reset in the Skip or shall we return the number of successful skipping back to the iterator?

Good point, I think we will need to report the number of skipped elements as well so that the caller can decide what to do when end_of_sequence is reached before the requested number of elements are skipped.

zhuzilin

comment created time in a month

pull request commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

Thanks for putting this together @zhuzilin! Sorry for the delay in review. Some questions:

  1. Is there a specific use case motivating this change?
  2. Which datasets are you planning to implement skipping for?
  3. What do you think of changing SkipNext(ctx, end_of_input) to Skip(ctx, num_to_skip, end_of_input)? In some cases skipping multiple could be cheaper than skipping one at a time.
zhuzilin

comment created time in a month

Pull request review commentAlosapien/Alosapien

Read from spreadsheet directly

 import itertools+import pickle+import os.path+from googleapiclient.discovery import build+from google_auth_oauthlib.flow import InstalledAppFlow+from google.auth.transport.requests import Request++# If modifying these scopes, delete the file token.pickle.+SCOPES = ['https://www.googleapis.com/auth/spreadsheets.readonly']++SPREADSHEET_ID = '1c2Kvc3y1GrF9EFqAKvPoRU8NmFV_exCrAeCTrmDct2Y'+ELOS_RANGE = 'TEST!A2:ZZ12'

We could add markings to the sheet to indicate where one section starts and the next ends, or maybe put different sections of the sheet in different tabs.

aaudiber

comment created time in a month

push eventaaudiber/Alosapien

Andrew Audibert

commit sha 6fda80b5794007bbd76eeb6c547e1f5a40533252

Read from spreadsheet directly

view details

push time in a month

PR opened Alosapien/Alosapien

Read from spreadsheet directly
  • With this change, the team maker will read the ELOs and current players from the spreadsheet directly, so that we no longer need to update any constants in the code.
  • This change also adds a README to explain how to run things.
+147 -64

0 comment

3 changed files

pr created time in a month

push eventaaudiber/Alosapien

Andrew Audibert

commit sha fd9294854ad58969d1f60f9dc109b7960e59a619

Read from spreadsheet directly

view details

push time in a month

create barnchaaudiber/Alosapien

branch : test

created branch time in a month

push eventaaudiber/Alosapien

Alosapien

commit sha f14b9848bb85f7c90c73506ff7e0a8184bb18e87

Update TeamMaker.py

view details

Alosapien

commit sha 29f80d84fe40394419cd79017f904121458434b1

Create teamMaker2.2.py

view details

Alosapien

commit sha 5afab416422dc9ed9df83dd7d650e77612920324

Rename teamMaker2.2.py to teamMaker2.py

view details

push time in a month

PR closed Alosapien/Alosapien

Rename TeamMaker2

Remove the ".2"

+0 -0

1 comment

1 changed file

aaudiber

pr closed time in a month

PR closed Alosapien/Alosapien

Apply PEP 8 Style

This PR updates the code style to follow the conventions laid out in the official Python Style Guide (PEP 8)

No functionality is affected -- this is only renames and whitespace changes.

+54 -51

1 comment

1 changed file

aaudiber

pr closed time in a month

PR closed Alosapien/Alosapien

Add an alternate implementation of TeamMaker

This approach computes all possible teams, and lists the fairest possibilities (top-3 by default).

Fairness is determined by summing the absolute difference between each team score and the average team score.

+34 -0

0 comment

1 changed file

aaudiber

pr closed time in a month

pull request commentAlosapien/Alosapien

Rename TeamMaker2

@Alosapien to accept these changes, click the merge button on this page

aaudiber

comment created time in a month

PR opened Alosapien/Alosapien

Rename TeamMaker2

Remove the ".2"

+0 -0

0 comment

1 changed file

pr created time in a month

push eventaaudiber/Alosapien

Andrew Audibert

commit sha caf9300ef36c0d83eceba89f0f217a7b7ce0cf9e

Rename TeamMaker2

view details

push time in a month

PR opened Alosapien/Alosapien

Add an alternate implementation of TeamMaker

This approach computes all possible teams, and lists the fairest possibilities (top-3 by default).

Fairness is determined by summing the absolute difference between each team score and the average team score.

+34 -0

0 comment

1 changed file

pr created time in a month

push eventaaudiber/Alosapien

Andrew Audibert

commit sha 471adb80f7004cd5295ea3b00031caa6363c5b72

Add an alternate implementation of TeamMaker This approach computes all possible teams, and lists the fairest possibilities (top-3 by default). Fairness is determined by summing the absolute difference between each team score and the average team score.

view details

push time in a month

pull request commentAlosapien/Alosapien

Apply PEP 8 Style

There's a bunch of conflicts, now, not worth merging this. It's mostly useful as an example of how PEP 8 would be applied

aaudiber

comment created time in a month

push eventaaudiber/Alosapien

Andrew Audibert

commit sha db1f7d4392c30d4d7728112f986baa21c0c65505

More style changes

view details

push time in a month

pull request commenttensorflow/tensorflow

[BugFix] - Prefetching on GPU is actually executed on CPU

This should be fixed now by https://github.com/tensorflow/tensorflow/commit/8be4d61574f29568c8699708d88945b441bfd317

DEKHTIARJonathan

comment created time in a month

issue commenttensorflow/tensorflow

random in tf.data.Dataset.map is not random if not coming from tensorflow

This happens because functions passed to tf.data transformations are traced and converted to tensorflow graphs for execution in C++. During tracing, rd.random() and np.random.rand() get evaluated into constants (since tensorflow doesn't have hooks to detect these methods being called, like it does for tensorflow ops). The easiest way to work around this is to stick with tensorflow ops. Another option is to wrap non-tensorflow code in tf.py_function, which executes arbitrary python code as a tensorflow op (with the caveat that this requires grabbing the GIL and generally results in worse performance compared to the equivalent tensorflow ops).

I agree that it's too easy to get an unpleasant surprise from using non-tensorflow ops in tf.data functions. @Wirg, do you think there is anywhere that we could improve the documentation that would have helped you here? Some places that we could add discussion of tracing:

  • Top-level tf.data.Dataset doc: https://www.tensorflow.org/api_docs/python/tf/data/Dataset
  • Individual tf.data.Dataset transformations that take a function argument, e.g. https://www.tensorflow.org/api_docs/python/tf/data/Dataset#map. The map doc does currently say that the function will be traced, but it is buried in the detailed section of the doc, and not mentioned in the short description of the function argument.
  • In the tf.data guide: https://www.tensorflow.org/guide/data
Wirg

comment created time in a month

PR opened Alosapien/Alosapien

Apply PEP 8 Style

This PR updates the code style to follow the conventions laid out in the official Python Style Guide (PEP 8)

No functionality is affected -- this is only renames and whitespace changes.

+50 -50

0 comment

1 changed file

pr created time in a month

push eventaaudiber/Alosapien

Andrew Audibert

commit sha 88eb0f1f6a40b52557249aba49fe139935c35dd6

Apply PEP 8 Style This PR updates the code style to follow the conventions laid out in the official Python Style Guide ([PEP 8](https://www.python.org/dev/peps/pep-0008/))

view details

push time in a month

fork aaudiber/Alosapien

Team building Program

fork in a month

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@zhuzilin That test is currently broken, unrelated to your PR. There is an outstanding change to disable the test while it is fixed.

zhuzilin

comment created time in 2 months

issue commenttensorflow/tensorflow

Iterator.make_initializer returns None

/usr/bin/ld: cannot find -ltensorflow_framework collect2: error: ld returned 1 exit status

pvnieo

comment created time in 2 months

issue commenttensorflow/tensorflow

Iterator.make_initializer returns None

@pvnieo I tried running the suggested commands but encountered build errors in my environment. If you can isolate the issue to something reproducible in colab, it will help a lot with figuring out the root problem.

pvnieo

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "cardinality preserved transformations, e.g. dataset.map(...).take(3)"

preserved -> preserving

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "cardinality preserved transformations, e.g. dataset.map(...).take(3)"       "will be optimized to dataset.take(3).map(...). For now this"       "optimization will move `skip`, `shard` and `take` to the front of"-      "`cache`, `map` and `prefetch`. And notice this optimization is only"-      "for performance, it will not affect the output of the dataset."-      "However, it will influence the cache to the file, for the unused"-      "data will no longer be saved after this optimization."-      "If None, defaults to False.")+      "`map` and `prefetch`. And notice this optimization is only for"+      "performance, it will not affect the output of the dataset."

semicolon after performance instead of comma

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "cardinality preserved transformations, e.g. dataset.map(...).take(3)"       "will be optimized to dataset.take(3).map(...). For now this"       "optimization will move `skip`, `shard` and `take` to the front of"-      "`cache`, `map` and `prefetch`. And notice this optimization is only"-      "for performance, it will not affect the output of the dataset."-      "However, it will influence the cache to the file, for the unused"-      "data will no longer be saved after this optimization."-      "If None, defaults to False.")+      "`map` and `prefetch`. And notice this optimization is only for"

Remove "And notice"

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "cardinality preserved transformations, e.g. dataset.map(...).take(3)"       "will be optimized to dataset.take(3).map(...). For now this"       "optimization will move `skip`, `shard` and `take` to the front of"-      "`cache`, `map` and `prefetch`. And notice this optimization is only"-      "for performance, it will not affect the output of the dataset."-      "However, it will influence the cache to the file, for the unused"-      "data will no longer be saved after this optimization."-      "If None, defaults to False.")+      "`map` and `prefetch`. And notice this optimization is only for"+      "performance, it will not affect the output of the dataset."

each line needs a space at the end so that the last word of the line doesn't get combined with the first word of the next line.

zhuzilin

comment created time in 2 months

push eventaaudiber/custom-nightly-op

Andrew Audibert

commit sha 66f075c206ee615fb87e36d20abf6ef2405d68e9

Depend on tf-nightly instead of tensorflow

view details

push time in 2 months

create barnchaaudiber/custom-nightly-op

branch : master

created branch time in 2 months

created repositoryaaudiber/custom-nightly-op

created time in 2 months

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@zhuzilin I think it's important for the optimization to be on by default, so that it can have more impact. If we leave it off by default, not many users will be aware of it, and there will be very little usage. What do you think of removing the interaction with cache for now, and only re-ordering map and prefetch? Then later on we can split CacheDataset into separate MemoryCacheDataset and FileCacheDataset ops, so that we can include MemoryCacheDataset in this optimization.

zhuzilin

comment created time in 2 months

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@zhuzilin Regarding cache-to-file, I think a common use case while debugging could be to add a take() after the call to cache(filename) to inspect some example data. For example, the user builds a dataset with

ds = create_dataset()
ds = ds.cache(filename)

Then they append .take(3) to make sure their data looks correct:

ds = create_dataset()
ds = ds.cache(filename)
for element in ds.take(3):
  print(element)

Then they remove the printing and run their training:

ds = create_dataset()
ds = ds.cache(filename)
train_on_dataset(ds)

The user will be surprised and unhappy that their dataset now produces only 3 elements.

zhuzilin

comment created time in 2 months

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

  1. What exactly is hoisting in this context? What are things hoisted into, and how does this affect the data pipeline in question? (Looking at the one-line example, this seems to be a reordering.)

I agree that reordering is a better term. Hoisting usually means moving to a different scope, as happens with hoist_random_uniform. How about calling it reorder_data_discarding_ops?

  1. What is the full set of transformations that is hoisted? Do these eventually get run somewhere else?

@zhuzilin please update the docstring in optimization_options.py to specify the exact list of which transformations will be reordered, and also to make it clear that the optimization is only for performance, and will not affect the output of the dataset.

  1. Can you add examples to the docstrings so that users will understand how to use this?

More broadly, should this be turned on by default? What would we have to test/ensure first?

It makes sense to turn this on by default, since it is almost always a strict improvement. I see two risks: (1) The user relies on a side-effect of applying their map function to discarded elements and (2) the user expects their entire dataset to be cached to disk, but with the re-ordering only part of the dataset is cached.

(1) is unlikely, and the user would be relying on undefined behavior. (2) is more problematic - I think we should avoid applying the reordering to cache transformations which cache to a file instead of in-memory.

Once we've addressed the file-caching issue, lets turn the optimization by default, and I will run extra internal tests to make sure nothing is broken.

zhuzilin

comment created time in 2 months

issue commenttensorflow/tensorflow

Loding dataset with TFRecord throws incompatible with the layer

decode_image is a wrapper around decode_jpeg, decode_bmp, decode_png, and decode_gif. decode_image automatically detects the file format based on a file's contents, then uses one of those functions. The rank of the result will depend on the file contents, since GIFs will have rank 4 while other images have rank 3. Shape inference happens before reading file data, so there is no way to know the rank ahead of time.

In addition to using ensure_shape, there are a few other workarounds:

  • If the file type is known, use tf.io.decode_bmp, tf.io.decode_jpeg, tf.io.decode_png, or tf.io.decode_gif instead. These produce tensors of known rank (3 for non-GIF, 4 for GIF).
  • If you are ok with truncating GIFs to a single frame, pass expand_animations=False to tf.io.decode_image, so that the rank is guaranteed to be 3.
ll01

comment created time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha eeb8bc6433214b18acd3e7a1a7efef68818eb926

Add example of running the tf.data service in GKE.

view details

push time in 2 months

PR opened tensorflow/ecosystem

Add example of running the tf.data service in GKE.

This directory provides an example of running the tf.data service to horizontally scale tf.data input processing. We use GKE (Google Kubernetes Engine) to manage the tf.data servers.

+282 -0

0 comment

5 changed files

pr created time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha bbad07fb77566e92a6e7f767437fb1f81141f619

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha a4c8f89ae3d5de198e79d27792e18a7c58fa656b

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha 8cb4b6ac9c2ba32143c3cb4ee419461e9294d2d1

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha 2c9f428848da9a651fcfab0879bd86ce22fdf055

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha cb19ab423614a88bcded552c0c4d95d093856b6a

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha 32effdfe07e29bd6a29da6a043c94904770130ba

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha f3d4d29e5d61e4b49bb2e66cc282481d8a598169

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha 1469d6c7e9fa787ce2e12f368028f627b8390ca9

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha d4c7a0da8e91b84c0d8cb228e03d3ba8d9bfb3a2

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha 5d9db29ff501010fb949da99d7a7dedc75061d7a

Add example of running the tf.data service in GKE.

view details

push time in 2 months

push eventaaudiber/ecosystem

Andrew Audibert

commit sha 2df5559d6f299eb58c03655eb1ffdb320a057a53

Add data_service examples

view details

push time in 2 months

issue commenttensorflow/tensorflow

Tensorflow's FixedLengthRecordDataset reads binary file significantly slower than pure Python

Thanks for the detailed explanation.

  • Would it be possible for the model to train on batches of examples instead of single examples? That could reduce the overhead and get better utilization from the CPU.
  • Try setting the num_parallel_reads option of FixedLengthRecordDataset, so that multiple files are read in parallel.
  • To fully understand the performance and what is causing the bottleneck, use https://www.tensorflow.org/tensorboard/tensorboard_profiling_keras#overview to generate a trace view that shows how much time is being spent in each dataset.
mhorlacher

comment created time in 2 months

issue commenttensorflow/tensorflow

Tensorflow's FixedLengthRecordDataset reads binary file significantly slower than pure Python

Is the data stored in a single file? If you split the data into multiple files, you can use Dataset.interleave to read from many files in parallel to speed up the input. If you share the dataset you are using, I can give recommendations

mhorlacher

comment created time in 2 months

pull request commenttensorflow/community

RFC: tf.data Snapshot

@byronyi Absolutely! We plan to integrate snapshotting and tf.data service so that the tf.data service speeds up the first (normally slow) epoch. The idea is to temporarily use the tf.data service to write the snapshot quickly, then shut down the service (saving resource costs), and read from the snapshot for the remaining epochs.

frankchn

comment created time in 2 months

issue commenttensorflow/tensorflow

Tensorflow's FixedLengthRecordDataset reads binary file significantly slower than pure Python

@mhorlacher For a more apples to apples comparison, we should append skip(10000) to the dataset, so that the iteration happens in C++ instead of python. We should also make the benchmark run longer, to reduce the impact on constant overheads. With those changes, the difference is between 3x and 4x. This isn't concerning, since real use cases don't fetch tens of thousands of elements per second from tf.data. The performance advantages of using tf.data come out when doing non-trivial input processing to train a model.

mhorlacher

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_discard.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++const std::unordered_set<string> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++const std::unordered_set<string> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  auto iter = kDataDiscarding.find(node.op());+  if (iter == kDataDiscarding.end()) {+    return false;+  }+  return true;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  auto iter = kCardinalityPreserving.find(node.op());+  if (iter == kCardinalityPreserving.end()) {+    return false;+  }+  auto attr_iter = node.attr().find("preserve_cardinality");

Add a comment that we check this because MapDataset with preserve_cardinality=false is not cardinality preserving.

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "Whether to fuse filter dataset that predicts random_uniform < rate into "       "a sampling dataset. If None, defaults to False.") +  hoist_discard = options.create_option(+      name="hoist_discard",+      ty=bool,+      docstring=+      "Whether to hoist ops that will discard data (such as skip, take, shard) "+      "out of unary cardinality preserved transformations. "

Add a basic example, e.g. "dataset.map(...).take(3) gets optimized to dataset.take(3).map(...)".

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_discard.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++const std::unordered_set<string> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++const std::unordered_set<string> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  auto iter = kDataDiscarding.find(node.op());+  if (iter == kDataDiscarding.end()) {+    return false;+  }+  return true;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  auto iter = kCardinalityPreserving.find(node.op());+  if (iter == kCardinalityPreserving.end()) {+    return false;+  }+  auto attr_iter = node.attr().find("preserve_cardinality");+  if (attr_iter != node.attr().end() && !attr_iter->second.b()) {+    return false;+  }+  return true;+}++}  // namepsace++Status HoistDiscard::OptimizeAndCollectStats(Cluster* cluster,+                                             const GrapplerItem& item,+                                             GraphDef* output,+                                             OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (int i = 0; i < graph.graph()->node_size(); i++) {+      auto node = graph.graph()->mutable_node(i);+      if (IsDataDiscarding(*node)) {+        NodeDef* start = node;+        NodeDef* start_parent = graph_utils::GetInputNode(*start, graph);+        while (IsCardinalityPreserving(*start_parent)) {+          start = start_parent;+          start_parent = graph_utils::GetInputNode(*start, graph);+        }+        if (start->name() == node->name()) {+          continue;+        }+        auto parent = graph_utils::GetInputNode(*node, graph);+        TF_RETURN_IF_ERROR(graph.UpdateFanouts(node->name(), parent->name()));+        if (!absl::StartsWith(node->name(), "hoist_discard/")) {

make "hoist_discard/" a string constant at the top of the file, similar to https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/data/auto_shard.cc#L39

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+#define TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_++#include "tensorflow/core/grappler/optimizers/data/optimizer_base.h"++namespace tensorflow {+namespace grappler {++// This optimization hoists the data discarding ops (such as `skip`, `take` and+//  `shard`) to avoid unnecessary computation.+class HoistDiscard : public TFDataOptimizerBase {+ public:+  HoistDiscard() = default;+  ~HoistDiscard() override = default;++  string name() const override { return "hoist_discard"; };++  bool UsesFunctionLibrary() const override { return false; }++  Status Init(+      const tensorflow::RewriterConfig_CustomGraphOptimizer* config) override {+    return Status::OK();+  }++  Status OptimizeAndCollectStats(Cluster* cluster, const GrapplerItem& item,+                                 GraphDef* output,+                                 OptimizationStats* stats) override;++  void Feedback(Cluster* cluster, const GrapplerItem& item,+                const GraphDef& optimize_output, double result) override;+};++}  // namespace grappler+}  // namespace tensorflow++#endif  // TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+

remove extra newline

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the `HoistDiscard` rewrite."""+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized++from tensorflow.python.data.experimental.ops import testing+from tensorflow.python.data.kernel_tests import test_base+from tensorflow.python.data.ops import dataset_ops+from tensorflow.python.framework import combinations+from tensorflow.python.platform import test+++class HoistDiscardTest(test_base.DatasetTestBase, parameterized.TestCase):++  @combinations.generate(combinations.combine(tf_api_version=2,+                                              mode=["eager", "graph"]))+  def testSimpleHoistingV2(self):+    dataset = dataset_ops.Dataset.range(100)+    dataset = dataset.apply(+        testing.assert_next(["FiniteSkip", "FiniteTake", "Shard",+                             "ParallelMap", "MemoryCacheImpl"]))+    dataset = dataset.map(+        lambda x: x + 1, num_parallel_calls=10)+    dataset = dataset.skip(10)+    dataset = dataset.cache()+    dataset = dataset.take(50)+    dataset = dataset.shard(2, 0)+    options = dataset_ops.Options()+    options.experimental_optimization.apply_default_optimizations = False+    options.experimental_optimization.hoist_discard = True+    dataset = dataset.with_options(options)+    self.assertDatasetProduces(dataset, range(11, 61, 2))++  @combinations.generate(combinations.combine(tf_api_version=1,+                                              mode=["eager", "graph"]))+  def testSimpleHoistingV1(self):+    dataset = dataset_ops.Dataset.range(100)+    dataset = dataset.apply(+        testing.assert_next(["ParallelMap", "FiniteSkip", "FiniteTake",

Comment in the test why the skip, take, and shard aren't supposed to be hoisted before the parallel map

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+#define TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_++#include "tensorflow/core/grappler/optimizers/data/optimizer_base.h"++namespace tensorflow {+namespace grappler {++// This optimization hoists the data discarding ops (such as `skip`, `take` and+//  `shard`) to avoid unnecessary computation.

remove extra space before shard

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the `HoistDiscard` rewrite."""+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized++from tensorflow.python.data.experimental.ops import testing+from tensorflow.python.data.kernel_tests import test_base+from tensorflow.python.data.ops import dataset_ops+from tensorflow.python.framework import combinations+from tensorflow.python.platform import test+++class HoistDiscardTest(test_base.DatasetTestBase, parameterized.TestCase):

Add tests for hoisting before prefetch, and for not hoisting before non-cardinality-preserving ops

zhuzilin

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_

this should match the path, i.e. TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DISCARD_H_

zhuzilin

comment created time in 2 months

more