profile
viewpoint

Ask questionsRestoring Keras model fails inside a distribution strategy scope

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Arch Linux
  • TensorFlow installed from (source or binary): binary (using pip)
  • TensorFlow version (use command below): both v1.14.0-rc1-22-gaf24dc9 1.14.0 and v2.0.0-beta0-17-g8e423e3 2.0.0-beta1
  • Python version: 3.7.3
  • CUDA/cuDNN version: CUDA 10.1.168-4, cuDNN 7.6.1.34-1
  • GPU model and memory: NVIDIA Quadro P2000, 4GB

Describe the current behavior Inside a distribution strategy scope, restoring a Keras model (that has been trained at all) with tf.keras.models.load_model raises the exception shown below (while handling the optimizer in particular, it seems).

(Looks a bit similar to #28599 if you squint, but many details differ.)

Describe the expected behavior Restoring the model should succeed.

Code to reproduce the issue

import numpy as np, tensorflow as tf

strategy = tf.distribute.MirroredStrategy()
path = "/tmp/model.hdf5"

with strategy.scope():
    # Construct model.
    model = tf.keras.models.Sequential([tf.keras.layers.Dense(1, input_shape=(1,))])
    model.compile(optimizer=tf.keras.optimizers.SGD(), loss=tf.keras.metrics.mse)
    # Do a fit so the optimizer weights are created. Removing this lets the restore succeed.
    model.fit(np.array([[1]]), np.array([[1]]))
    # Save and attempt to restore.
    tf.keras.models.save_model(model, path)
    tf.keras.models.load_model(path)

Other info / logs Traceback for TF 2.0 (TF 1.14 is the same except for line numbers):

  File ".../tensorflow/python/keras/saving/save.py", line 137, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
  File ".../tensorflow/python/keras/saving/hdf5_format.py", line 187, in load_model_from_hdf5
    model._make_train_function()
  File ".../tensorflow/python/keras/engine/training.py", line 1974, in _make_train_function
    params=self._collected_trainable_weights, loss=self.total_loss)
  File ".../tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 491, in get_updates
    grads = self.get_gradients(loss, params)
  File ".../tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 391, in get_gradients
    grads = gradients.gradients(loss, params)
  File ".../tensorflow/python/ops/gradients_impl.py", line 158, in gradients
    unconnected_gradients)
  File ".../tensorflow/python/ops/gradients_util.py", line 543, in _GradientsHelper
    for x in xs
  File ".../tensorflow/python/ops/gradients_util.py", line 543, in <listcomp>
    for x in xs
  File ".../tensorflow/python/distribute/values.py", line 643, in handle
    raise ValueError("`handle` is not available outside the replica context"
ValueError: `handle` is not available outside the replica context or a `tf.distribute.Strategy.update()` call.
tensorflow/tensorflow

Answer questions guptapriya

Hi - we don't support saving with hdf5 format. However, you can save and restore with the standard TF format - just remove the hdf5 extension from the file path. See https://www.tensorflow.org/beta/tutorials/distribute/save_and_load for more information.

useful!

Related questions

ModuleNotFoundError: No module named 'tensorflow.contrib'
Error occurred when finalizing GeneratorDataset iterator
ModuleNotFoundError: No module named 'tensorflow.contrib'
When importing TensorFlow, error loading Hadoop hot 4
The flag 'log_dir' is defined twice. hot 3
[TF 2.0] Dataset has no attribute 'make_one_shot_iterator' hot 3
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning. hot 3
TF2.0 AutoGraph issue hot 3
Error loading tensorflow hot 3
AttributeError: module 'tensorflow' has no attribute 'set_random_seed' hot 3
No tf.lite.experimental.nn.bidirectional_dynamic_rnn ops is finded hot 3
AttributeError: module 'tensorflow' has no attribute 'app' hot 3
Incorrect Error TypeError: padded_batch() missing 1 required positional argument: &#39;padded_shapes&#39; hot 3
tensorflow2.0 detected 'xla_gpu' , but 'gpu' expected hot 2
Using tensorflow gpu 2.1 with Cuda 10.2 hot 2
Github User Rank List