Ask questionsError occurred when finalizing GeneratorDataset iterator
System information
Describe the current behavior executing Tensorflow's MNIST handwriting example produces error: the error dissapears if the code doesn't use OneDeviceStrategy or MirroredStrategy
W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled
Code to reproduce the issue
import tensorflow as tf
import tensorflow_datasets as tfds
import time
from tensorflow.keras.optimizers import Adam
def build_model():
filters = 48
units = 24
kernel_size = 7
learning_rate = 1e-4
model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=filters, kernel_size=(kernel_size, kernel_size), activation='relu', input_shape=(28, 28, 1)),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units, activation='relu'),
tf.keras.layers.Dense(10, activation='softmax')
])
model.compile(loss='sparse_categorical_crossentropy', optimizer=Adam(learning_rate), metrics=['accuracy'])
return model
datasets, info = tfds.load(name='mnist', with_info=True, as_supervised=True)
mnist_train, mnist_test = datasets['train'], datasets['test']
num_train_examples = info.splits['train'].num_examples
num_test_examples = info.splits['test'].num_examples
strategy = tf.distribute.OneDeviceStrategy(device='/gpu:0')
BUFFER_SIZE = 10000
BATCH_SIZE = 32
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
train_dataset = mnist_train.map(scale).shuffle(BUFFER_SIZE).repeat().batch(BATCH_SIZE).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
eval_dataset = mnist_test.map(scale).repeat().batch(BATCH_SIZE).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
with strategy.scope():
model = build_model()
epochs=5
start = time.perf_counter()
model.fit(
train_dataset,
validation_data=eval_dataset,
steps_per_epoch=num_train_examples/epochs,
validation_steps=num_test_examples/epochs,
epochs=epochs)
elapsed = time.perf_counter() - start
print('elapsed: {:0.3f}'.format(elapsed))
Answer
questions
olk
I've downgraded my system:
Still facing the error:
Train for 30000.0 steps, validate for 5000.0 steps Epoch 1/2 2019-12-17 19:21:54.361240: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10 2019-12-17 19:21:55.824790: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-12-17 19:21:56.980785: W tensorflow/stream_executor/gpu/redzone_allocator.cc:312] Not found: ./bin/ptxas not found Relying on driver to perform ptx compilation. This message will be only logged once. 30000/30000 [==============================] - 115s 4ms/step - loss: 0.0856 - accuracy: 0.9761 - val_loss: 0.0376 - val_accuracy: 0.9879 Epoch 2/2 29990/30000 [============================>.] - ETA: 0s - loss: 0.0152 - accuracy: 0.99582019-12-17 19:25:28.372294: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled 30000/30000 [==============================] - 111s 4ms/step - loss: 0.0152 - accuracy: 0.9958 - val_loss: 0.0375 - val_accuracy: 0.9889 2019-12-17 19:25:40.010887: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled 2019-12-17 19:25:40.031138: W tensorflow/core/kernels/data/generator_dataset_op.cc:103] Error occurred when finalizing GeneratorDataset iterator: Cancelled: Operation was cancelled elapsed: 226.391
seams to be related to tensorflow-2.1.0-rc1
Related questions