profile
viewpoint

Ask questionsBuggy behaviour of dataset API

System information

Describe the current behavior At Dataset graph branching points, the node, which is the root of the branching is resampled for each branch during one round of execution. With non-randomized inputs to the Dataset, this does not cause any problems. If the root node is after a .shuffle() call, the branches will receive different inputs in the same computation round.

Describe the expected behavior Downstream branches should receive the same data even if shuffle() is applied.

Standalone code to reproduce the issue https://colab.research.google.com/drive/1AeVRilpcGp8zb0hZijTIxGWdL9GkzfN_

More info:

This behaviour is also present if the dataset is created from a generator, which handles the shuffling implicitly.

Edit: fixed the links here as well

tensorflow/tensorflow

Answer questions aaudiber

Hi @csxeba,

This is working as intended. Datasets can be much larger than the memory of a single machine, so Dataset objects act like blueprints for how to produce data (instead of trying to hold the entire dataset at once). Datasets provide a streaming API for consuming data through an iterator. If you want each iterator created on a shuffled dataset to produce elements in the same order, use the reshuffle_each_iteration argument to Datasest.shuffle:

index = index.shuffle(buffer_size=len(self.index), reshuffle_each_iteration=False)
useful!

Related questions

ModuleNotFoundError: No module named 'tensorflow.contrib' hot 8
Error occurred when finalizing GeneratorDataset iterator hot 6
ModuleNotFoundError: No module named 'tensorflow.contrib'
When importing TensorFlow, error loading Hadoop
tf.keras.layers.Conv1DTranspose ?
tensorflow-gpu CUPTI errors hot 4
[TF 2.0] tf.keras.optimizers.Adam hot 4
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning. hot 4
TF2.0 AutoGraph issue hot 4
Tf.Keras metrics issue hot 4
module 'tensorflow' has no attribute 'ConfigProto' hot 4
TF 2.0 'Tensor' object has no attribute 'numpy' while using .numpy() although eager execution enabled by default hot 4
ModuleNotFoundError: No module named 'tensorflow.examples.tutorials' hot 4
AttributeError: module 'tensorflow.python.framework.op_def_registry' has no attribute 'register_op_list' hot 4
tensorflow2.0 detected 'xla_gpu' , but 'gpu' expected hot 3
Github User Rank List