profile
viewpoint
Qianli Scott Zhu qlzh727 Google California Software Engineer @google, brain, Tensorflow Keras.

keras-team/keras 48855

Deep Learning for humans

tensorflow/addons 901

Useful extra functionality for TensorFlow 2.x maintained by SIG-addons

qlzh727/addons 0

Useful extra functionality for TensorFlow 2.0 maintained by SIG-addons

qlzh727/community 0

Stores documents used by the TensorFlow developer community

qlzh727/keras 0

Deep Learning for humans

qlzh727/keras-applications 0

Reference implementations of popular deep learning models.

qlzh727/keras-tuner 0

Hyperparameter tuning for humans

qlzh727/models 0

Models and examples built with TensorFlow

qlzh727/tensorboard 0

TensorFlow's Visualization Toolkit

qlzh727/tensorflow 0

Computation using data flow graphs for scalable machine learning

issue commenttensorflow/tensorflow

RNN with cell get_initial_state and state_size incompatible

Please unwrap the [inputs] since rnn layer only expect one input, and it shouldn't be a list.

zwenju

comment created time in 5 days

Pull request review commenttensorflow/addons

Move the tf.keras.layers.PeepholeLSTMCell to tfa

 def get_config(self):         }         base_config = super().get_config()         return {**base_config, **config}+++@tf.keras.utils.register_keras_serializable(package="Addons")+class PeepholeLSTMCell(tf.keras.layers.LSTMCell):+    """Equivalent to `tf.keras.layers.LSTMCell` class but adds peephole connections.++    Peephole connections allow the gates to utilize the previous internal state as+    well as the previous hidden state (which is what LSTMCell is limited to).+    This allows PeepholeLSTMCell to better learn precise timings over LSTMCell.++    From [Gers et al., 2002](+    http://www.jmlr.org/papers/volume3/gers02a/gers02a.pdf):++    "We find that LSTM augmented by 'peephole connections' from its internal+    cells to its multiplicative gates can learn the fine distinction between+    sequences of spikes spaced either 50 or 49 time steps apart without the help+    of any short training exemplars."++    The peephole implementation is based on:++    [Sak et al., 2014](https://research.google.com/pubs/archive/43905.pdf)++    Example:++    ```python+    # Create 2 PeepholeLSTMCells+    peephole_lstm_cells = [PeepholeLSTMCell(size) for size in [128, 256]]+    # Create a layer composed sequentially of the peephole LSTM cells.+    layer = RNN(peephole_lstm_cells)+    input = keras.Input((timesteps, input_dim))+    output = layer(input)+    ```+    """++    def build(self, input_shape):+        super(PeepholeLSTMCell, self).build(input_shape)

Done.

qlzh727

comment created time in 10 days

push eventqlzh727/addons

qlzh727

commit sha 5a99ecdd669b13f5a72dd08e769ba4c51a21ec5a

Update build method to be more aligned with py3 style.

view details

push time in 10 days

pull request commenttensorflow/addons

Move the tf.keras.layers.PeepholeLSTMCell to tfa

Yeah that makes sense. In addons, we're trying to show users how to write idiomatic tf 2.x code. Even if tf.compat is not going away anytime soon, we would prefer to use modern, future-proof TF code to minimize the technical debt. This was the rational when we decided to forbid tf.compat. In this case, a good alternative would be to hardcode some numbers in the tests, and check against that. That would prevent any regressions since we know that the current implementation is correct.

Done.

qlzh727

comment created time in 11 days

push eventqlzh727/addons

qlzh727

commit sha deb57c329c92e844ad2eb2a9381f02f1526d009a

Fix format.

view details

push time in 11 days

push eventqlzh727/addons

qlzh727

commit sha 4b64158a6ff7ed4f37b1966b45026ebcad1e965f

Add PeepholeLSTMCell to the exception list for typehint check. The cell itself doesn't have __init__ and it inherit the __init__ from keras.LSTMcell, which doesn't have type hint yet.

view details

push time in 11 days

push eventqlzh727/addons

qlzh727

commit sha 860814c760c303376ed756290a4a0ed29ceeb3e0

Update peephole lstm cell test with golden values. We removed the v1 compat API since TFA only works with TF v2.

view details

push time in 11 days

issue closedtensorflow/tensorflow

model.reset_states() does not work for bidirectional-RNNs in tf.keras

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): YES

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04 LTS

  • TensorFlow installed from (source or binary): binary

  • TensorFlow version (use command below):
    TF 2.1 and tf-nightly==2.2.0.dev20200407 (both have bug around this issue, but different issues)

  • Python version: 3.7.4

  • CUDA/cuDNN version: 10.1, 7.6.5

  • GPU model and memory: 2080ti, 11GB. Bug is on both CPU/GPU.

Describe the current behavior

model.reset_states() does not work for bidirectional, stateful recurrent layers (bidi-RNNs).

TF 2.1: model.reset_states() does nothing for stateful bidi-RNNs. tf-nightly: calling model.reset_states() for stateful bidi-RNNs causes a crash

Describe the expected behavior This was reported as a bug in TF 2.0 -- model.reset_states() does nothing for bidi-RNNs. I thought this was fixed in tf-nightly at that time, but has returned in TF 2.1.

model.reset_states() for standard RNNs changed in TF 2.1. has the following behavior:

  • if model is stateful with NO initial state input: resets state to zero
  • if model is stateful with initial state input: resets state to state input
  • otherwise the state is carried over form the last call.

Thus the expected behavior for stateful bidi-RNNs is:

  • if model is stateful with NO initial state input: resets fwd and bwd state to zero
  • if model is stateful with initial state input: resets fwd state to fwd state input and resets bwd state to bwd state input
  • otherwise the fwd state and bwd state are carried over form the last call (as is done in stateful bidi-RNNs).

Standalone code to reproduce the issue

Code to show this behavior with no state-inputs:

import os
os.environ['CUDA_DEVICE_ORDER']='PCI_BUS_ID'
os.environ['CUDA_VISIBLE_DEVICES']=''

import numpy as np
from tensorflow.keras.layers import Input, Dense, SimpleRNN, GRU, LSTM, Bidirectional
from tensorflow.keras.models import Model

REC = LSTM

sequence_length = 3
feature_dim = 1
features_in = Input(batch_shape=(1, sequence_length, feature_dim)) 

rnn_out = Bidirectional( REC(1, activation=None, use_bias=False, return_sequences=True, return_state=False, stateful=False))(features_in)
stateless_model = Model(inputs=[features_in], outputs=[rnn_out])

stateful_rnn_out = Bidirectional( REC(1, activation=None, use_bias=False, return_sequences=True, return_state=False, stateful=True))(features_in)
stateful_model = Model(inputs=features_in, outputs=stateful_rnn_out)

stateful_model.set_weights( stateless_model.get_weights() )

x_in = np.random.normal(0,10,sequence_length)
x_in = x_in.reshape( (1, sequence_length, feature_dim) )

def print_bidi_out(non_stateful_out, stateful_out):
	fb = ['FWD::', 'BWD::']

	for i in range(2):
		print(fb[i])
		print(f'non_stateful: {non_stateful_out.T[i]}')
		print(f'stateful: {stateful_out.T[i]}')
		print(f'delta: {stateful_out.T[i]-non_stateful_out.T[i]}')


non_stateful_out = stateless_model.predict(x_in).reshape((sequence_length,2))
stateful_out = stateful_model.predict(x_in).reshape((sequence_length,2))
print_bidi_out(non_stateful_out, stateful_out)

non_stateful_out = stateless_model.predict(x_in).reshape((sequence_length,2))
stateful_out = stateful_model.predict(x_in).reshape((sequence_length,2))
print_bidi_out(non_stateful_out, stateful_out)

print('\n** RESETING STATES in STATEFUL MODEL **\n')
stateful_model.reset_states()
non_stateful_out = stateless_model.predict(x_in).reshape((sequence_length,2))
stateful_out = stateful_model.predict(x_in).reshape((sequence_length,2))
print_bidi_out(non_stateful_out, stateful_out)

Code to demo with initial-state inputs:

import os
os.environ['CUDA_DEVICE_ORDER']='PCI_BUS_ID'
os.environ['CUDA_VISIBLE_DEVICES']=''

import numpy as np
from tensorflow.keras.layers import Input, Dense, SimpleRNN, GRU, LSTM, Bidirectional
from tensorflow.keras.models import Model

REC = LSTM

sequence_length = 3
feature_dim = 1
features_in = Input(batch_shape=(1, sequence_length, feature_dim)) 
state_h_fwd_in = Input(batch_shape=(1, 1))
state_h_bwd_in = Input(batch_shape=(1, 1))
state_c_fwd_in = Input(batch_shape=(1, 1))
state_c_bwd_in = Input(batch_shape=(1, 1))

four_state_shape = [state_h_fwd_in, state_c_fwd_in, state_h_bwd_in, state_c_bwd_in]
two_state_shape = [state_h_fwd_in, state_h_bwd_in]

if REC == LSTM:
    rnn_out = Bidirectional( REC(1, activation='linear', use_bias=False, return_sequences=True, return_state=False, stateful=False))(features_in, initial_state=four_state_shape)
    stateful_rnn_out = Bidirectional( REC(1, activation='linear', use_bias=False, return_sequences=True, return_state=False, stateful=True))(features_in, initial_state=four_state_shape)
    rnn_inputs = [features_in, state_h_fwd_in, state_c_fwd_in, state_h_bwd_in, state_c_bwd_in]
else:
    if REC == SimpleRNN:
        rnn_out = Bidirectional( REC(1, activation='linear', use_bias=False, return_sequences=True, return_state=False, stateful=False))(features_in, initial_state=two_state_shape)
        stateful_rnn_out = Bidirectional( REC(1, activation='linear', use_bias=False, return_sequences=True, return_state=False, stateful=True))(features_in, initial_state=two_state_shape)
    else:
        rnn_out = Bidirectional( REC(1, activation='linear', use_bias=False, return_sequences=True, return_state=False, stateful=False))(features_in, initial_state=two_state_shape)
        stateful_rnn_out = Bidirectional( REC(1, activation='linear', use_bias=False, return_sequences=True, return_state=False, stateful=True))(features_in, initial_state=two_state_shape)
    rnn_inputs = [features_in, state_h_fwd_in, state_h_bwd_in]

stateless_model = Model(inputs=rnn_inputs, outputs=rnn_out)
stateful_model = Model(inputs=rnn_inputs, outputs=stateful_rnn_out)


# toy_weights = [np.asarray([[ 1.0]], dtype=np.float32), np.asarray([[0.5 ]], dtype=np.float32), np.asarray([[ -1.0 ]], dtype=np.float32), np.asarray([[ -0.5 ]], dtype=np.float32)]
# stateless_model.set_weights(toy_weights)
# stateful_model.set_weights(toy_weights)

stateful_model.set_weights( stateless_model.get_weights() )

stateful_model.save('temp_stateful.h5')
stateless_model.save('temp_stateless.h5')

x_in = np.random.normal(0,10,sequence_length)
x_in = np.asarray([1,0,0])
x_in = x_in.reshape( (1, sequence_length, feature_dim) )

fwd_initial_h = np.asarray(2.75).reshape(1,1)
fwd_initial_c = np.asarray(1.3).reshape(1,1)
bwd_initial_h = np.asarray(-2.0).reshape(1,1)
bwd_initial_c = np.asarray(-1.2).reshape(1,1)

# fwd_initial_h = np.asarray(np.random.normal(0,10)).reshape(1,1)
# fwd_initial_h = np.asarray(np.random.normal(0,10)).reshape(1,1)
# bwd_initial_h = np.asarray(np.random.normal(0,10)).reshape(1,1)
# fwd_initial_c = np.asarray(np.random.normal(0,10)).reshape(1,1)
# bwd_initial_c = np.asarray(np.random.normal(0,10)).reshape(1,1)

if REC == LSTM:
    rnn_input = [x_in, fwd_initial_h, fwd_initial_c, bwd_initial_h, bwd_initial_c]
else:
    rnn_input = [x_in, fwd_initial_h, bwd_initial_h] 
    

def print_bidi_out(non_stateful_out, stateful_out):
	fb = ['FWD::', 'BWD::']

	for i in range(2):
		print(fb[i])
		print(f'non_stateful: {non_stateful_out.T[i]}')
		print(f'stateful: {stateful_out.T[i]}')
		print(f'delta: {stateful_out.T[i]-non_stateful_out.T[i]}')

non_stateful_out = stateless_model.predict(rnn_input).reshape((sequence_length,2))
stateful_out = stateful_model.predict(rnn_input).reshape((sequence_length,2))
print_bidi_out(non_stateful_out, stateful_out)

non_stateful_out = stateless_model.predict(rnn_input).reshape((sequence_length,2))
stateful_out = stateful_model.predict(rnn_input).reshape((sequence_length,2))
print_bidi_out(non_stateful_out, stateful_out)

print('\n** RESETING STATES in STATEFUL MODEL **\n')
stateful_model.reset_states()
non_stateful_out = stateless_model.predict(rnn_input).reshape((sequence_length,2))
stateful_out = stateful_model.predict(rnn_input).reshape((sequence_length,2))
print_bidi_out(non_stateful_out, stateful_out)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

sample output for a SimpleRNN with input states -- using TF 2.1:

FWD::
non_stateful: [7.375   3.6875  1.84375]
stateful: [7.375   3.6875  1.84375]
delta: [0. 0. 0.]
BWD::
non_stateful: [ 11.5 -25.   50. ]
stateful: [ 11.5 -25.   50. ]
delta: [0. 0. 0.]
FWD::
non_stateful: [7.375   3.6875  1.84375]
stateful: [1.921875   0.9609375  0.48046875]
delta: [-5.453125  -2.7265625 -1.3632812]
BWD::
non_stateful: [ 11.5 -25.   50. ]
stateful: [-2.4375  2.875  -5.75  ]
delta: [-13.9375  27.875  -55.75  ]

** RESETING STATES in STATEFUL MODEL **

FWD::
non_stateful: [7.375   3.6875  1.84375]
stateful: [1.2402344 0.6201172 0.3100586]
delta: [-6.1347656 -3.0673828 -1.5336914]
BWD::
non_stateful: [ 11.5 -25.   50. ]
stateful: [-0.6953125 -0.609375   1.21875  ]
delta: [-12.1953125  24.390625  -48.78125  ]

Crash when using 4/7 tf-nightly:

Traceback (most recent call last):
  File "temp_bidi_state_in.py", line 89, in <module>
    stateful_model.reset_states()
  File "/home/keith/.pyenv/versions/tfn/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py", line 473, in reset_states
    layer.reset_states()
  File "/home/keith/.pyenv/versions/tfn/lib/python3.7/site-packages/tensorflow/python/keras/layers/wrappers.py", line 676, in reset_states
    self.forward_layer.reset_states()
  File "/home/keith/.pyenv/versions/tfn/lib/python3.7/site-packages/tensorflow/python/keras/layers/recurrent.py", line 903, in reset_states
    spec_shape = nest.flatten(self.input_spec[0])[0].shape
AttributeError: 'NoneType' object has no attribute 'shape'

closed time in 11 days

keithchugg

issue commenttensorflow/tensorflow

model.reset_states() does not work for bidirectional-RNNs in tf.keras

Oh, thanks for the notice, I think this should be fixed already. I just forgot to close the github issue.

keithchugg

comment created time in 11 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

 def check_match(keras_block, tf_block, keras_weight_names, tf_weight_names, mode     keras_weight_names: list of str, each string is a name for weights in keras implementation     tf_weight_names: list of str, each string is a name for weights in tf implementation   """-  match_lst = []+  names_from_keras = set()   for x in keras_weight_names:     if keras_block in x:       y = keras_name_to_tf_name_block(x, keras_block=keras_block, tf_block=tf_block, model_name_tf=model_name_tf)-      match_lst.append(y)+      names_from_keras.add(y)   -  assert len(match_lst) > 0, f'there is no weight in block {keras_block}'+  names_from_tf = set()   for x in tf_weight_names:     if tf_block in x and x.split('/')[1].endswith(tf_block):-      try:-        match_lst.remove(x)-      except:-        raise ValueError(f'{x} not in tf_weight_names')-  assert len(match_lst) == 0 , f'{len(match_lst)} variables in {tf_block} are not in {keras_block} of keras model. '+      names_from_tf.add(x)++  names_missing = names_from_keras - names_from_tf+  if len(names_missing) > 0:+    raise ValueError(f'{len(names_missing)} variables not found in checkpoint file: {names_missing}')++  names_unused = names_from_keras - names_from_tf 

I guess the name_unused should be names_from_tf - names_from_keras?

yixingfu

comment created time in 11 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

 def keras_name_to_tf_name_block(keras_name, keras_block='block1a', tf_block='blo         tf_name.append(x)   if use_ema:     tf_name.append('ExponentialMovingAverage')-  return '/'.join(tf_name) + ':0'+  return '/'.join(tf_name)  -def check_match(keras_block, tf_block, keras_weights, tf_weights):+def check_match(keras_block, tf_block, keras_weight_names, tf_weight_names, model_name_tf):   """ Check if the weights in h5 and ckpt match   -  we match each name from keras_weights that is in keras_block -  and check if there is 1-1 correspondence to names from tf_weights+  we match each name from keras_weight_names that is in keras_block +  and check if there is 1-1 correspondence to names from tf_weight_names   that is in tf_block      Args:     keras_block: str, the block name for keras implementation (e.g. 'block1a')     tf_block: str, the block name for tf implementation (e.g. 'blocks_0')-    keras_weights: list of str, each string is a name for weights in keras implementation-    tf_weights: list of str, each string is a name for weights in tf implementation+    keras_weight_names: list of str, each string is a name for weights in keras implementation+    tf_weight_names: list of str, each string is a name for weights in tf implementation   """

I think rather than adding and removing from the list, can we populate two lists, one from keras weights and one from tf weights, and compare diff? That will allow us to raise an error when all mismatch items, rather than just one.

yixingfu

comment created time in 12 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.

2020

yixingfu

comment created time in 13 days

Pull request review commenttensorflow/tensorflow

Add show_dtype support for plot_model, update related tests.

 def _serialize_hyperparameter(self, hyperparameter_name):       return value()     if tensor_util.is_tensor(value):       return backend.get_value(value)-    return value+    return float(value)

Not sure why this is included in this PR.

jonah-kohn

comment created time in 13 days

pull request commenttensorflow/addons

Move the tf.keras.layers.PeepholeLSTMCell to tfa

@qlzh727 Thanks for the pull request!

  1. Sure, we can allow that, no problem. I'll need to modify the library that does the check. In the meantime, you can add this class to the list of exceptions avoid having the tests bothering you: https://github.com/tensorflow/addons/blob/master/tools/testing/source_code_test.py#L40

Sure, will do.

  1. I see tf.compat.v1 is used three times. I believe it should be possible to find alternatives. Maybe using tf.random.set_seed tf.keras.layers.LSTMCell and tf.keras.initializers.Ones?

I can't change to keras.LSTMCell since only the tf.nn.rnn_cell.LSTMCell contains the implementation of peephole. It was the base line for me to verify the numbers. Could we allow the v1.compat in test only?

qlzh727

comment created time in 13 days

pull request commenttensorflow/addons

Move the tf.keras.layers.PeepholeLSTMCell to tfa

@seanpmorgan, currently the CI test is failed for 2 reasons https://github.com/tensorflow/addons/pull/1944/checks?check_run_id=796216452.

  1. The typehint is not available for the new cell, since it was using the init from the keras LSTMCell. For a class doesn't have its own init(), can we suppress the warning if possible?

  2. The unit test was failing since it is using some tf.compat.v1 API. I was copying the existing test since it was doing some numerical comparison with tf.v1 API for correctness test. Can we somehow allow that?

qlzh727

comment created time in 13 days

push eventqlzh727/addons

qlzh727

commit sha ca2f3b4d8334cbf8fa40c6232a7621225e41598b

Fix code style

view details

push time in 13 days

PR opened tensorflow/addons

Move the tf.keras.layers.PeepholeLSTMCell to tfa

This cell was exported under tf core API as an experimental since 2.0, but I think tfa should be a better place for this implementation.

We are planing to deprecate and eventually remove the PeepholeLSTMCell in the tf core API once this is landed.

+111 -0

0 comment

3 changed files

pr created time in 15 days

push eventqlzh727/addons

qlzh727

commit sha da5d0bec037e2e79ac627858bfdccabcad479d68

Move the tf.keras.layers.PeepholeLSTMCell to tfa This cell was exported under tf core API as an experimental since 2.0, but I think tfa should be a better place for this implementation. We are planing to deprecate and eventually remove the PeepholeLSTMCell in the tf core API once this is landed.

view details

push time in 15 days

push eventqlzh727/addons

Guillaume Klein

commit sha 4830c25a4fddf861f2f022cb4d15bb7d972acc46

Add missing documentation for some output tuples (#1916)

view details

Guillaume Klein

commit sha 65cdbbcfd56eccd772745b3f79f75f6d2faa3ae8

Use a tf.function with an input signature instead of tf.placeholder (#1921)

view details

Guillaume Klein

commit sha 69db5e4653a52a2a06088eaa7ff49f22fb738e42

Update context checks in dynamic_decode (#1919) There are several things here: - The XLA check is updated to match what is done in `tf.keras.backend.rnn` and also mentionned in https://github.com/tensorflow/tensorflow/commit/e67e29e3c8a150be847e17bbfe4f5d399e7b08ea - As far as I know, setting a caching device on the variable scope only impacts variables created with `tf.compat.v1.get_variable` which is no longer used in V2. The current check was wrong in V2 anyway (see https://github.com/tensorflow/tensorflow/commit/e67e29e3c8a150be847e17bbfe4f5d399e7b08ea) - If we don't need to configure the variable scope, we can change it to a name scope which has the same effect in V2.

view details

who who who

commit sha 8586698dafb1a99bc93841afbebe50143cf2f1a3

use raw ops (#1914)

view details

Xavier Holt

commit sha fcaf462ae63fc32601356652b6ebeeb3d762138e

Fix typo in doctoring of CohenKappa __init__ (#1928)

view details

Guillaume Klein

commit sha 8d1473b4220c02addb90b0005ec4bd1723b9bfe7

Add a fixture to test both the custom and py ops (#1929)

view details

Guillaume Klein

commit sha dcdd3ea00bba2ae768e9dcc5fcf0ab31668c75a3

Remove 0 as a valid value for maximum_iterations (#1930) * Remove 0 as a valid value for maximum_iterations * Update docstring

view details

Sean Morgan

commit sha 6a91f5648554166f22c0db643f0301df9701eb82

* Update to informative list names (#1932)

view details

Dheeraj R Reddy

commit sha def106c0745527e1c1d2e219ff8e82652820fa6b

Remove tf.control_dependencies from SWA (#1805) * Remove tf.control_dependencies from SWA * Wrap `average_op` in `tf.function` * Replace tf.where with python if * Use locking param for assign * Remove SWA from exceptions

view details

bhack

commit sha bd5bbfc66e750cba94360236e7f85481662ba4de

Add experimental_aggregate_gradients support (#1924) * Add experimental_aggregate_gradients support * Try to reset precision with the right method

view details

Marissa Ikonomidis

commit sha a0bfe3f35f310bdfee53078bdee15e10af342f53

MovingAverage: add dynamic decay and swap weights (#1726) * Add dynamic decay to MovingAverage. Adding dynamic decay to MovingAverage improves early accuracy. When using dynamic decay, decay starts at 0.1 and gradually increases up to `average_decay`. * Add ability to swap weights to MovingAverage. This patch makes it easier to swap the model weights and the MovingAverage weights before eval and swap them back after eval.

view details

qlzh727

commit sha 377dc47ce7b51cccab5aa3386aac5b7c21bcb3b8

Merge remote-tracking branch 'upstream/master'

view details

push time in 15 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]+  else:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' not in x[0]]++  # remove util variables used for RMSprop+  tf2_listvar = [x for x in tf2_listvar if 'RMS' not in x[0]]++  tf2_listvar = [x[0] for x in tf2_listvar]+  return tf2_listvar+++def get_tf_blocks(tf_weights):+  """Extract the block names from list of full weight names"""+  tf_blocks = set([x.split('/')[1] for x in tf_weights if 'block' in x])+  tf_blocks = sorted(tf_blocks, key=lambda x:int(x.split('_')[1]))+  return tf_blocks+++def get_keras_blocks(keras_weights):+  """Extract the block names from list of full weight names"""+  return sorted(set([x.split('_')[0] for x in keras_weights if 'block' in x]))+++def keras_name_to_tf_name_stem_top(keras_name, use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that is in stem or top (head)+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    KeyError if we cannot parse the keras_name+  """+  if use_ema:+    ema = '/ExponentialMovingAverage'+  else:+    ema = ''++  stem_top_dict = {+      'probs/probs/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'probs/probs/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'predictions/predictions/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'predictions/predictions/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'stem_conv/stem_conv/kernel:0':f'{model_name_tf}/stem/conv2d/kernel{ema}:0',+      'top_conv/top_conv/kernel:0':f'{model_name_tf}/head/conv2d/kernel{ema}:0',+  }++  # stem batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'stem_bn/stem_bn/{bn_weights}:0'] = f'{model_name_tf}/stem/tpu_batch_normalization/{bn_weights}{ema}:0'+  # top / head batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'top_bn/top_bn/{bn_weights}:0'] = f'{model_name_tf}/head/tpu_batch_normalization/{bn_weights}{ema}:0'++  if keras_name in stem_top_dict:+    return stem_top_dict[keras_name]+  else:+    raise KeyError(f'{keras_name} from h5 file cannot be parsed')+++def keras_name_to_tf_name_block(keras_name, keras_block='block1a', tf_block='blocks_0', use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that belongs to a block+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    ValueError if keras_block does not show up in keras_name+  """++  if f'{keras_block}' not in keras_name:+    raise ValueError(f'block name {keras_block} not found in {keras_name}')++  # all blocks in the first group will not have expand conv and bn+  is_first_blocks = (keras_block[5]=='1')++  tf_name = [model_name_tf, tf_block]++  # depthwide conv+  if 'dwconv' in keras_name:+    tf_name.append('depthwise_conv2d')+    tf_name.append('depthwise_kernel')++  # conv layers+  if is_first_blocks:+    # first blocks only have one conv2d+    if 'project_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+  else:+    if 'project_conv' in keras_name:+      tf_name.append('conv2d_1')+      tf_name.append('kernel')+    elif 'expand_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+      +  # squeeze expansion layers +  if '_se_' in keras_name:+    if 'reduce' in keras_name:+      tf_name.append('se/conv2d')+    elif 'expand' in keras_name:+      tf_name.append('se/conv2d_1')++    if 'kernel' in keras_name:+      tf_name.append('kernel')+    elif 'bias' in keras_name:+      tf_name.append('bias')++  # batch normalization layers +  if 'bn' in keras_name:+    if is_first_blocks:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_1')+      else:+        tf_name.append('tpu_batch_normalization')+    else:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_2')+      elif 'expand' in keras_name:+        tf_name.append('tpu_batch_normalization')+      else:+        tf_name.append('tpu_batch_normalization_1')++    for x in ['moving_mean', 'moving_variance', 'beta', 'gamma']:+      if x in keras_name:+        tf_name.append(x)+  if use_ema:+    tf_name.append('ExponentialMovingAverage')+  return '/'.join(tf_name) + ':0'+++def check_match(keras_block, tf_block, keras_weights, tf_weights):+  """ Check if the weights in h5 and ckpt match+  +  we match each name from keras_weights that is in keras_block +  and check if there is 1-1 correspondence to names from tf_weights+  that is in tf_block+  +  Args:+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    keras_weights: list of str, each string is a name for weights in keras implementation+    tf_weights: list of str, each string is a name for weights in tf implementation+  """+  for x in keras_weights:+    if keras_block in x:+      y = keras_name_to_tf_name_block(x, keras_block=bk, tf_block=bt)+    match_lst.append(y)++  assert len(match_lst) > 0++  for x in tf_weights:+    if tf_block in x[0] and x[0].split('/')[1].endswith(tf_block):+      match_lst.remove(x[0]+':0')+  assert len(match_lst) == 0 +++if __name__ == '__main__':+  parser = argparse.ArgumentParser(description="Load Models ")+  parser.add_argument("--model", required=True, type=str, help="name of efficient model. e.g. b2 or b5notop")+  parser.add_argument("--ckpt", required=True, type=str, help="checkpoint path")+  parser.add_argument("--o", required=True, type=str, help="output (h5) file path")+  args = parser.parse_args()++  include_top = True+  if args.model.endswith('notop'):+    include_top = False++  arg_to_model = {+      'b0':EfficientNetB0,+      'b1':EfficientNetB1,+      'b2':EfficientNetB2,+      'b3':EfficientNetB3,+      'b4':EfficientNetB4,+      'b5':EfficientNetB5,+      'b6':EfficientNetB6,+      'b7':EfficientNetB7+  }

Please make the model name to be an enum.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]

Can u add a comment about the structure of tf2_listvar? it seems to be a list of list?

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]+  else:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' not in x[0]]++  # remove util variables used for RMSprop+  tf2_listvar = [x for x in tf2_listvar if 'RMS' not in x[0]]++  tf2_listvar = [x[0] for x in tf2_listvar]+  return tf2_listvar+++def get_tf_blocks(tf_weights):+  """Extract the block names from list of full weight names"""+  tf_blocks = set([x.split('/')[1] for x in tf_weights if 'block' in x])

For string manipulation like this, please provide an example in the comment.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]+  else:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' not in x[0]]++  # remove util variables used for RMSprop+  tf2_listvar = [x for x in tf2_listvar if 'RMS' not in x[0]]++  tf2_listvar = [x[0] for x in tf2_listvar]+  return tf2_listvar+++def get_tf_blocks(tf_weights):+  """Extract the block names from list of full weight names"""+  tf_blocks = set([x.split('/')[1] for x in tf_weights if 'block' in x])+  tf_blocks = sorted(tf_blocks, key=lambda x:int(x.split('_')[1]))+  return tf_blocks+++def get_keras_blocks(keras_weights):+  """Extract the block names from list of full weight names"""+  return sorted(set([x.split('_')[0] for x in keras_weights if 'block' in x]))+++def keras_name_to_tf_name_stem_top(keras_name, use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that is in stem or top (head)+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    KeyError if we cannot parse the keras_name+  """+  if use_ema:+    ema = '/ExponentialMovingAverage'+  else:+    ema = ''++  stem_top_dict = {+      'probs/probs/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'probs/probs/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'predictions/predictions/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'predictions/predictions/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'stem_conv/stem_conv/kernel:0':f'{model_name_tf}/stem/conv2d/kernel{ema}:0',+      'top_conv/top_conv/kernel:0':f'{model_name_tf}/head/conv2d/kernel{ema}:0',+  }++  # stem batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'stem_bn/stem_bn/{bn_weights}:0'] = f'{model_name_tf}/stem/tpu_batch_normalization/{bn_weights}{ema}:0'+  # top / head batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'top_bn/top_bn/{bn_weights}:0'] = f'{model_name_tf}/head/tpu_batch_normalization/{bn_weights}{ema}:0'++  if keras_name in stem_top_dict:+    return stem_top_dict[keras_name]+  else:+    raise KeyError(f'{keras_name} from h5 file cannot be parsed')+++def keras_name_to_tf_name_block(keras_name, keras_block='block1a', tf_block='blocks_0', use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that belongs to a block+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    ValueError if keras_block does not show up in keras_name+  """++  if f'{keras_block}' not in keras_name:+    raise ValueError(f'block name {keras_block} not found in {keras_name}')++  # all blocks in the first group will not have expand conv and bn+  is_first_blocks = (keras_block[5]=='1')++  tf_name = [model_name_tf, tf_block]++  # depthwide conv+  if 'dwconv' in keras_name:+    tf_name.append('depthwise_conv2d')+    tf_name.append('depthwise_kernel')++  # conv layers+  if is_first_blocks:+    # first blocks only have one conv2d+    if 'project_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+  else:+    if 'project_conv' in keras_name:+      tf_name.append('conv2d_1')+      tf_name.append('kernel')+    elif 'expand_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+      +  # squeeze expansion layers +  if '_se_' in keras_name:+    if 'reduce' in keras_name:+      tf_name.append('se/conv2d')+    elif 'expand' in keras_name:+      tf_name.append('se/conv2d_1')++    if 'kernel' in keras_name:+      tf_name.append('kernel')+    elif 'bias' in keras_name:+      tf_name.append('bias')++  # batch normalization layers +  if 'bn' in keras_name:+    if is_first_blocks:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_1')+      else:+        tf_name.append('tpu_batch_normalization')+    else:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_2')+      elif 'expand' in keras_name:+        tf_name.append('tpu_batch_normalization')+      else:+        tf_name.append('tpu_batch_normalization_1')++    for x in ['moving_mean', 'moving_variance', 'beta', 'gamma']:+      if x in keras_name:+        tf_name.append(x)+  if use_ema:+    tf_name.append('ExponentialMovingAverage')+  return '/'.join(tf_name) + ':0'+++def check_match(keras_block, tf_block, keras_weights, tf_weights):+  """ Check if the weights in h5 and ckpt match+  +  we match each name from keras_weights that is in keras_block +  and check if there is 1-1 correspondence to names from tf_weights+  that is in tf_block+  +  Args:+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    keras_weights: list of str, each string is a name for weights in keras implementation+    tf_weights: list of str, each string is a name for weights in tf implementation+  """+  for x in keras_weights:+    if keras_block in x:+      y = keras_name_to_tf_name_block(x, keras_block=bk, tf_block=bt)+    match_lst.append(y)++  assert len(match_lst) > 0++  for x in tf_weights:+    if tf_block in x[0] and x[0].split('/')[1].endswith(tf_block):+      match_lst.remove(x[0]+':0')+  assert len(match_lst) == 0 

same here.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]+  else:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' not in x[0]]++  # remove util variables used for RMSprop+  tf2_listvar = [x for x in tf2_listvar if 'RMS' not in x[0]]++  tf2_listvar = [x[0] for x in tf2_listvar]+  return tf2_listvar+++def get_tf_blocks(tf_weights):+  """Extract the block names from list of full weight names"""+  tf_blocks = set([x.split('/')[1] for x in tf_weights if 'block' in x])+  tf_blocks = sorted(tf_blocks, key=lambda x:int(x.split('_')[1]))+  return tf_blocks+++def get_keras_blocks(keras_weights):+  """Extract the block names from list of full weight names"""+  return sorted(set([x.split('_')[0] for x in keras_weights if 'block' in x]))+++def keras_name_to_tf_name_stem_top(keras_name, use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that is in stem or top (head)+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    KeyError if we cannot parse the keras_name+  """+  if use_ema:+    ema = '/ExponentialMovingAverage'+  else:+    ema = ''++  stem_top_dict = {+      'probs/probs/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',

Please add space before and after ":" so that it is more clear to find the key and value.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break

eg, i see. If you are trying to find the v in v_all with the same name, how about [v for v in v_all if v.name ==tf_name]?

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):

This function name doesn't properly describe the functionality.

How about get_variable_names_from_h5()?

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break

Why break here? should we convert all the v from v_all?

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'

What items get filtered by this? can u provide an example here?

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]+  else:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' not in x[0]]++  # remove util variables used for RMSprop+  tf2_listvar = [x for x in tf2_listvar if 'RMS' not in x[0]]++  tf2_listvar = [x[0] for x in tf2_listvar]+  return tf2_listvar+++def get_tf_blocks(tf_weights):+  """Extract the block names from list of full weight names"""+  tf_blocks = set([x.split('/')[1] for x in tf_weights if 'block' in x])+  tf_blocks = sorted(tf_blocks, key=lambda x:int(x.split('_')[1]))+  return tf_blocks+++def get_keras_blocks(keras_weights):+  """Extract the block names from list of full weight names"""+  return sorted(set([x.split('_')[0] for x in keras_weights if 'block' in x]))+++def keras_name_to_tf_name_stem_top(keras_name, use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that is in stem or top (head)+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    KeyError if we cannot parse the keras_name+  """+  if use_ema:+    ema = '/ExponentialMovingAverage'+  else:+    ema = ''++  stem_top_dict = {+      'probs/probs/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'probs/probs/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'predictions/predictions/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'predictions/predictions/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'stem_conv/stem_conv/kernel:0':f'{model_name_tf}/stem/conv2d/kernel{ema}:0',+      'top_conv/top_conv/kernel:0':f'{model_name_tf}/head/conv2d/kernel{ema}:0',+  }++  # stem batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'stem_bn/stem_bn/{bn_weights}:0'] = f'{model_name_tf}/stem/tpu_batch_normalization/{bn_weights}{ema}:0'+  # top / head batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'top_bn/top_bn/{bn_weights}:0'] = f'{model_name_tf}/head/tpu_batch_normalization/{bn_weights}{ema}:0'++  if keras_name in stem_top_dict:+    return stem_top_dict[keras_name]+  else:+    raise KeyError(f'{keras_name} from h5 file cannot be parsed')+++def keras_name_to_tf_name_block(keras_name, keras_block='block1a', tf_block='blocks_0', use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that belongs to a block+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    ValueError if keras_block does not show up in keras_name+  """++  if f'{keras_block}' not in keras_name:+    raise ValueError(f'block name {keras_block} not found in {keras_name}')++  # all blocks in the first group will not have expand conv and bn+  is_first_blocks = (keras_block[5]=='1')++  tf_name = [model_name_tf, tf_block]++  # depthwide conv+  if 'dwconv' in keras_name:+    tf_name.append('depthwise_conv2d')+    tf_name.append('depthwise_kernel')++  # conv layers+  if is_first_blocks:+    # first blocks only have one conv2d+    if 'project_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+  else:+    if 'project_conv' in keras_name:+      tf_name.append('conv2d_1')+      tf_name.append('kernel')+    elif 'expand_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+      +  # squeeze expansion layers +  if '_se_' in keras_name:+    if 'reduce' in keras_name:+      tf_name.append('se/conv2d')+    elif 'expand' in keras_name:+      tf_name.append('se/conv2d_1')++    if 'kernel' in keras_name:+      tf_name.append('kernel')+    elif 'bias' in keras_name:+      tf_name.append('bias')++  # batch normalization layers +  if 'bn' in keras_name:+    if is_first_blocks:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_1')+      else:+        tf_name.append('tpu_batch_normalization')+    else:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_2')+      elif 'expand' in keras_name:+        tf_name.append('tpu_batch_normalization')+      else:+        tf_name.append('tpu_batch_normalization_1')++    for x in ['moving_mean', 'moving_variance', 'beta', 'gamma']:+      if x in keras_name:+        tf_name.append(x)+  if use_ema:+    tf_name.append('ExponentialMovingAverage')+  return '/'.join(tf_name) + ':0'+++def check_match(keras_block, tf_block, keras_weights, tf_weights):+  """ Check if the weights in h5 and ckpt match+  +  we match each name from keras_weights that is in keras_block +  and check if there is 1-1 correspondence to names from tf_weights+  that is in tf_block+  +  Args:+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    keras_weights: list of str, each string is a name for weights in keras implementation+    tf_weights: list of str, each string is a name for weights in tf implementation+  """+  for x in keras_weights:+    if keras_block in x:+      y = keras_name_to_tf_name_block(x, keras_block=bk, tf_block=bt)+    match_lst.append(y)++  assert len(match_lst) > 0

Please provide a proper error message for the assert, otherwise will be very unreadable when this error happens.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst

Just curious, can we get the same value from model.weights, which should also return a list of variable tensor with proper name.

Having to save the model and parse the h5 file seems to be bit complicated.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):+  """Get list of tensor names from checkpoint"""++  tf2_listvar = tf.train.list_variables(path_ckpt)++  if use_ema:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' in x[0]]+  else:+    tf2_listvar = [x for x in tf2_listvar if 'ExponentialMovingAverage' not in x[0]]++  # remove util variables used for RMSprop+  tf2_listvar = [x for x in tf2_listvar if 'RMS' not in x[0]]++  tf2_listvar = [x[0] for x in tf2_listvar]+  return tf2_listvar+++def get_tf_blocks(tf_weights):+  """Extract the block names from list of full weight names"""+  tf_blocks = set([x.split('/')[1] for x in tf_weights if 'block' in x])+  tf_blocks = sorted(tf_blocks, key=lambda x:int(x.split('_')[1]))+  return tf_blocks+++def get_keras_blocks(keras_weights):+  """Extract the block names from list of full weight names"""+  return sorted(set([x.split('_')[0] for x in keras_weights if 'block' in x]))+++def keras_name_to_tf_name_stem_top(keras_name, use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that is in stem or top (head)+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    KeyError if we cannot parse the keras_name+  """+  if use_ema:+    ema = '/ExponentialMovingAverage'+  else:+    ema = ''++  stem_top_dict = {+      'probs/probs/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'probs/probs/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'predictions/predictions/bias:0':f'{model_name_tf}/head/dense/bias{ema}:0',+      'predictions/predictions/kernel:0':f'{model_name_tf}/head/dense/kernel{ema}:0',+      'stem_conv/stem_conv/kernel:0':f'{model_name_tf}/stem/conv2d/kernel{ema}:0',+      'top_conv/top_conv/kernel:0':f'{model_name_tf}/head/conv2d/kernel{ema}:0',+  }++  # stem batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'stem_bn/stem_bn/{bn_weights}:0'] = f'{model_name_tf}/stem/tpu_batch_normalization/{bn_weights}{ema}:0'+  # top / head batch normalization+  for bn_weights in ['beta', 'gamma', 'moving_mean', 'moving_variance']:+    stem_top_dict[f'top_bn/top_bn/{bn_weights}:0'] = f'{model_name_tf}/head/tpu_batch_normalization/{bn_weights}{ema}:0'++  if keras_name in stem_top_dict:+    return stem_top_dict[keras_name]+  else:+    raise KeyError(f'{keras_name} from h5 file cannot be parsed')+++def keras_name_to_tf_name_block(keras_name, keras_block='block1a', tf_block='blocks_0', use_ema=True, model_name_tf='efficientnet-b0'):+  """ map name in h5 to ckpt that belongs to a block+  +  we map name keras_name that points to a weight in h5 file +  to a name of weight in ckpt file. +  +  Args:+    keras_name: str, the name of weight in the h5 file of keras implementation+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    use_ema: Bool, use the ExponentialMovingAverage resuolt in ckpt or not +    model_name_tf: str, the name of model in ckpt.++  Returns:+    String for the name of weight as in ckpt file.++  Raises:+    ValueError if keras_block does not show up in keras_name+  """++  if f'{keras_block}' not in keras_name:+    raise ValueError(f'block name {keras_block} not found in {keras_name}')++  # all blocks in the first group will not have expand conv and bn+  is_first_blocks = (keras_block[5]=='1')++  tf_name = [model_name_tf, tf_block]++  # depthwide conv+  if 'dwconv' in keras_name:+    tf_name.append('depthwise_conv2d')+    tf_name.append('depthwise_kernel')++  # conv layers+  if is_first_blocks:+    # first blocks only have one conv2d+    if 'project_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+  else:+    if 'project_conv' in keras_name:+      tf_name.append('conv2d_1')+      tf_name.append('kernel')+    elif 'expand_conv' in keras_name:+      tf_name.append('conv2d')+      tf_name.append('kernel')+      +  # squeeze expansion layers +  if '_se_' in keras_name:+    if 'reduce' in keras_name:+      tf_name.append('se/conv2d')+    elif 'expand' in keras_name:+      tf_name.append('se/conv2d_1')++    if 'kernel' in keras_name:+      tf_name.append('kernel')+    elif 'bias' in keras_name:+      tf_name.append('bias')++  # batch normalization layers +  if 'bn' in keras_name:+    if is_first_blocks:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_1')+      else:+        tf_name.append('tpu_batch_normalization')+    else:+      if 'project' in keras_name:+        tf_name.append('tpu_batch_normalization_2')+      elif 'expand' in keras_name:+        tf_name.append('tpu_batch_normalization')+      else:+        tf_name.append('tpu_batch_normalization_1')++    for x in ['moving_mean', 'moving_variance', 'beta', 'gamma']:+      if x in keras_name:+        tf_name.append(x)+  if use_ema:+    tf_name.append('ExponentialMovingAverage')+  return '/'.join(tf_name) + ':0'+++def check_match(keras_block, tf_block, keras_weights, tf_weights):+  """ Check if the weights in h5 and ckpt match+  +  we match each name from keras_weights that is in keras_block +  and check if there is 1-1 correspondence to names from tf_weights+  that is in tf_block+  +  Args:+    keras_block: str, the block name for keras implementation (e.g. 'block1a')+    tf_block: str, the block name for tf implementation (e.g. 'blocks_0')+    keras_weights: list of str, each string is a name for weights in keras implementation+    tf_weights: list of str, each string is a name for weights in tf implementation+  """+  for x in keras_weights:+    if keras_block in x:+      y = keras_name_to_tf_name_block(x, keras_block=bk, tf_block=bt)+    match_lst.append(y)++  assert len(match_lst) > 0++  for x in tf_weights:+    if tf_block in x[0] and x[0].split('/')[1].endswith(tf_block):+      match_lst.remove(x[0]+':0')+  assert len(match_lst) == 0 +++if __name__ == '__main__':+  parser = argparse.ArgumentParser(description="Load Models ")+  parser.add_argument("--model", required=True, type=str, help="name of efficient model. e.g. b2 or b5notop")+  parser.add_argument("--ckpt", required=True, type=str, help="checkpoint path")+  parser.add_argument("--o", required=True, type=str, help="output (h5) file path")+  args = parser.parse_args()++  include_top = True+  if args.model.endswith('notop'):

This seems to be quite magic, how about having notop to be a separate arg?

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')+  return model_name_tf+++def get_h5_names(path_h5):+  """Get list of variable names from the h5 file """+  h5_namelst = []+  def append_to_lst(x):+    h5_namelst.append(x)++  with h5py.File(path_h5, 'r') as f:+    for x in f.keys():+      f[x].visit(append_to_lst)++  # all weights end with ':0'+  h5_namelst = [x for x in h5_namelst if ':' in x]++  # append group name to the front+  h5_namelst = ['/'.join([x.split('/')[0], x]) for x in h5_namelst]+    +  return h5_namelst+++def get_tf_names(path_ckpt, use_ema=True):

Same here, please rename this function to get_variable_names_from_tf_checkpoints

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *++def write_ckpt_to_h5(path_h5, path_ckpt, keras_model, use_ema=True):+  """ Map the weights in checkpoint file (tf) to h5 file (keras)+  +  Args:+    path_h5: str, path to output hdf5 file to write weights loaded+      from ckpt files.+    path_ckpt: str, path to the ckpt files (e.g. 'efficientnet-b0/model.ckpt')+      that records efficientnet weights from original repo +      https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet+    keras_model: keras model, built from keras.applications efficientnet+      functions (e.g. EfficientNetB0)+    use_ema: Bool, whether to use ExponentialMovingAverage result or not+  """+  model_name_keras = keras_model.name+  model_name_tf = model_name_keras_to_tf(model_name_keras)+  keras_model.save_weights(path_h5)++  keras_weights = get_h5_names(path_h5)+  tf_weights = get_tf_names(path_ckpt)+  blocks_keras = get_keras_blocks(keras_weights)+  blocks_tf = get_tf_blocks(tf_weights)++  with tf.compat.v1.Session() as sess:+    saver = tf.compat.v1.train.import_meta_graph(f'{path_ckpt}.meta')+    saver.restore(sess, path_ckpt)+    graph = tf.compat.v1.get_default_graph()+    v_all = tf.compat.v1.global_variables()++    for keras_block, tf_block in zip(blocks_keras, blocks_tf):+      print(f'working on block {keras_block}, {tf_block}')+      for keras_name in keras_weights:+        if keras_block in keras_name:+          tf_name = keras_name_to_tf_name_block(keras_name, keras_block=keras_block, tf_block=tf_block, use_ema=use_ema, model_name_tf=model_name_tf)+          for v in v_all:+            if v.name == tf_name:+              v_val = sess.run(v)+              with h5py.File(path_h5, 'a') as f:+                v_prev = f[keras_name][()]+                f[keras_name].write_direct(v_val)+              print(f'writing from: {tf_name}\n  to: {keras_name}')+              print(f'  average change: {abs(v_prev - v_val).mean()}')+              v_all.remove(v)+              break+          else:+            raise ValueError(f'{keras_name} has no match in ckpt file')++    for keras_name in keras_weights:+      if any([x in keras_name for x in ['stem', 'top', 'predictions', 'probs']]):+        tf_name = keras_name_to_tf_name_stem_top(keras_name, use_ema=use_ema, model_name_tf=model_name_tf)+        for v in v_all:+          if v.name == tf_name:+            v_val = sess.run(v)+            with h5py.File(path_h5, 'a') as f:+              v_prev = f[keras_name][()]+              try:+                f[keras_name].write_direct(v_val)+              except:+                raise ValueError(f'weight in {tf_name} does not ift into {keras_name}')+            print(f'writing from: {tf_name}\n  to: {keras_name}')+            print(f'  average change: {abs(v_prev - v_val).mean()}')+            v_all.remove(v)+            break+++def model_name_keras_to_tf(model_name_keras):+  """Infer model name in both keras and tf implementations"""+  model_name_tf = model_name_keras.replace('efficientnet', 'efficientnet-')

This is basically a one line function, which should be discouraged. Let's remove this function, and move the body to its caller.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.

Please add a sample usage for this script, so that ppl can easily copy paste to start with.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo+(https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet)+to h5 file for keras implementation of the models.+"""++import tensorflow as tf+import tensorflow.keras as keras+import h5py+import numpy as np+import argparse+from tensorflow.keras.applications.efficientnet import *

importing * seems to be a bad practice, can we explicitly list out all the needed functions/classes?

or you can do from tensorflow.keras.applications import efficientnet

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

add script for updating keras application's efficientnet weights from ckpt

+"""Utils for EfficientNet models for Keras.+Write weights from  ckpt file as in original repo

Please add the standard OSS license header to this file.

yixingfu

comment created time in 16 days

Pull request review commenttensorflow/tensorflow

Add show_dtype support for plot_model, update related tests.

 def model_to_dot(model,       label = '{}: {}'.format(layer_name, class_name)     else:       label = class_name+  +    # Rebuild the label as a table including the layer's dtype.+    if show_dtype: +      def format_dtype(dtype):

It seems that this function is not used here, should the label below be:

label = '%s|%s' % (label, format_dtype(dtype))
jonah-kohn

comment created time in 16 days

issue commenttensorflow/tensorflow

Request to have ConvLSTM2D for TFLite

Note that the new fixes will be included in nightly tomorrow.

beniroquai

comment created time in 17 days

pull request commenttensorflow/tensorflow

correct summing total blocks in EfficientNet

@yixingfu discovered that the keras-application weights are exactly same as the one from the checkpoint in https://github.com/tensorflow/tpu/tree/30b0889ee1933c3dbe90f1c46461a1a89370a5ba/models/official/efficientnet, which means the weights doesn't need to be updated in this PR. Approving this PR.

yixingfu

comment created time in 17 days

issue closedtensorflow/tensorflow

Cannot save RNN-based model as a saved model format

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04.3 LTS
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): v2.2.0-rc4-8-g2b96f3662b (GIT_VERSION), 2.2.0 (VERSION)
  • Python version: 3.7.5
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: Titan RTX, ~25GB

Describe the current behavior

I wrote a simple code that builds RNN using GRUCells and save with signature function.

import tensorflow as tf

gru_encoder = tf.keras.layers.RNN([tf.keras.layers.GRUCell(200) for _ in range(4)], return_sequences=True)


gru_encoder(tf.keras.Input((32, 200)))


@tf.function(
    input_signature=[tf.TensorSpec(shape=[None, None, None], dtype=tf.float32)]  # batch, sequence length, hidden size
)
def _signature_fn(input_embedding):
    return gru_encoder(input_embedding)


tf.saved_model.save(gru_encoder, "./test-model/1", signatures=_signature_fn)

And the result of above script is

$ python test.py
2020-06-02 18:42:20.388075: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-02 18:42:20.525918: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:19:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:20.527134: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:1a:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:20.528153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:67:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:20.529164: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:68:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:20.536014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-02 18:42:20.537691: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-02 18:42:20.539509: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-02 18:42:20.539851: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-02 18:42:20.541872: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-02 18:42:20.542990: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-02 18:42:20.547193: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-02 18:42:20.556527: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3
2020-06-02 18:42:20.556829: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-06-02 18:42:20.590996: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3500000000 Hz
2020-06-02 18:42:20.593061: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f254c000b20 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-02 18:42:20.593123: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-06-02 18:42:21.337111: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f0e31589f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-02 18:42:21.337154: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): TITAN RTX, Compute Capability 7.5
2020-06-02 18:42:21.337163: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (1): TITAN RTX, Compute Capability 7.5
2020-06-02 18:42:21.337171: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (2): TITAN RTX, Compute Capability 7.5
2020-06-02 18:42:21.337179: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (3): TITAN RTX, Compute Capability 7.5
2020-06-02 18:42:21.339651: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:19:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:21.341408: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 1 with properties: 
pciBusID: 0000:1a:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:21.343151: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 2 with properties: 
pciBusID: 0000:67:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:21.344891: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 3 with properties: 
pciBusID: 0000:68:00.0 name: TITAN RTX computeCapability: 7.5
coreClock: 1.77GHz coreCount: 72 deviceMemorySize: 23.65GiB deviceMemoryBandwidth: 625.94GiB/s
2020-06-02 18:42:21.344938: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-02 18:42:21.344955: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-06-02 18:42:21.344969: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-06-02 18:42:21.344984: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-06-02 18:42:21.345001: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-06-02 18:42:21.345016: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-06-02 18:42:21.345031: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-06-02 18:42:21.356549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0, 1, 2, 3
2020-06-02 18:42:21.356628: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-06-02 18:42:21.367494: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-02 18:42:21.367544: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 1 2 3 
2020-06-02 18:42:21.367553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N N N N 
2020-06-02 18:42:21.367559: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 1:   N N N N 
2020-06-02 18:42:21.367565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 2:   N N N N 
2020-06-02 18:42:21.367570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 3:   N N N N 
2020-06-02 18:42:21.374361: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22604 MB memory) -> physical GPU (device: 0, name: TITAN RTX, pci bus id: 0000:19:00.0, compute capability: 7.5)
2020-06-02 18:42:21.376397: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 22604 MB memory) -> physical GPU (device: 1, name: TITAN RTX, pci bus id: 0000:1a:00.0, compute capability: 7.5)
2020-06-02 18:42:21.377956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 22604 MB memory) -> physical GPU (device: 2, name: TITAN RTX, pci bus id: 0000:67:00.0, compute capability: 7.5)
2020-06-02 18:42:21.379349: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 22581 MB memory) -> physical GPU (device: 3, name: TITAN RTX, pci bus id: 0000:68:00.0, compute capability: 7.5)
Traceback (most recent call last):
  File "test.py", line 16, in <module>
    tf.saved_model.save(gru_encoder, "./test-model/1", signatures=_signature_fn)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/saved_model/save.py", line 951, in save
    obj, export_dir, signatures, options, meta_graph_def)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/saved_model/save.py", line 1012, in _build_meta_graph
    signature_serialization.validate_saveable_view(checkpoint_graph_view)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/saved_model/signature_serialization.py", line 268, in validate_saveable_view
    saveable_view.root):
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/saved_model/save.py", line 108, in list_dependencies
    extra_dependencies = self.list_extra_dependencies(obj)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/saved_model/save.py", line 137, in list_extra_dependencies
    self._serialization_cache)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2746, in _list_extra_dependencies_for_serialization
    .list_extra_dependencies_for_serialization(serialization_cache))
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/base_serialization.py", line 74, in list_extra_dependencies_for_serialization
    return self.objects_to_serialize(serialization_cache)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py", line 73, in objects_to_serialize
    serialization_cache).objects_to_serialize)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/layer_serialization.py", line 94, in _get_serialized_attributes
    serialized_attr.set_and_validate_objects(object_dict)
  File "{PROJECT_DIRECTORY}/env/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/serialized_attributes.py", line 212, in set_and_validate_objects
    ' {})'.format(object_dict[key], key))
ValueError: Object dictionary contained a non-trackable object: (None, None, None, None) (for key states)

Describe the expected behavior

I think this script should run without any exception and actually it works fine in tensorflow==2.1.0.

closed time in 17 days

jeongukjae

issue commenttensorflow/tensorflow

Cannot save RNN-based model as a saved model format

Should be fixed by https://github.com/tensorflow/tensorflow/commit/47582983cb1064b5bb81233db4f0adeeaa10b74d.

jeongukjae

comment created time in 17 days

issue commenttensorflow/tensorflow

Cannot save RNN-based model as a saved model format

Sorry for the late reply. This issue has the same root cause as https://github.com/tensorflow/tensorflow/issues/40328, and we are sending a fix for it. Should be fixed in 2.3 release.

jeongukjae

comment created time in 17 days

Pull request review commenttensorflow/community

# RFC: Multihead Attention and EinsumDense on Keras

+# RFC: Multihead Attention and EinsumDense on Keras++| Status        | (Proposed / Accepted / Implemented / Obsolete)          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [260](https://github.com/tensorflow/community/pull/260) |+| **Author(s)** | Hongkun Yu (hongkuny@google.com), Mark Omernick (momernick@google.com)    |+| **Sponsor**   | Francois Chollet (fchollet@google.com)                  |+| **Updated**   | 2020-06-16                                              |++## Objective++Introduce the MultiHeadAttention layer and EinsumDense layer to tf.keras.++## Motivation++MultiHeadAttention is very popular and has become standard for deep learning+libraries. We propose to contribute a flexible well-defined implementation+inside Keras absorbing common best practices from reference libraries.++## User Benefit++We can standardize the implementation of Transformer layers and use the best+practice. We offer a rich set of functionalities to different use cases, e.g.+different project spaces, outputing multi-head attention scores for analysis,+etc. We also modularize computations to make the MultiHeadAttention layer+extensible to variants.++## Design Proposal++### Key Features++*   Returns multi-headed attention scores, which is commonly useful for+    attention visualization and analysis.+*   Supports query (Q), key (K), value (V) tensors as individual inputs and+    supports projecting Q, K, V to different dimensions.+*   Final outputs projects to user specified dimensions.+*   Using tf.einsum to express high-dimensional computation and adopts+    [tf.keras.layers.experimental.EinsumDense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/EinsumDense)+    layer.+*   Supports high-dimension attention when target and source are 2D, 3D, etc.++### Code Examples++*   How to write a TransformerBlock for an encoder.++```python+class TransformerBlock(tf.keras.layers.Layer):+  def __init__(self, embed_dim, num_heads, ff_dim):+    super(TransformerBlock, self).__init__()+    self.att = attention.MultiHeadAttention(embed_dim, num_heads)+    self.ffn = tf.keras.Sequential(+        [tf.keras.layers.Dense(ff_dim, activation="relu"),+         tf.keras.layers.Dense(embed_dim),]+    )+    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)+    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)++  def call(self, inputs, attention_mask=None):+    attn_output = self.att([inputs, inputs], attention_mask=attention_mask)+    out1 = self.layernorm1(inputs + attn_output)+    ffn_output = self.ffn(out1)+    return self.layernorm2(out1 + ffn_output)+```++*   Use attention mask to avoid performing attention on padding token indices.++```python+test_layer = TransformerBlock(+    embed_dim=2,+    num_heads=2,+    ff_dim=4)+query = np.array([[[0.1, 0.2], [0.0, 0.0]]])+mask = np.array([[[1, 0], [1, 0]]], dtype='bool')+output = test_layer(query, mask)+```++*   Inside a Transformer decoder, we often want to output the cross-attention+    scores to analyze how the target sequence attend to the source sequence. We+    are able to visualize the alignment according to attention scores.++```python+test_layer = MultiHeadAttention(+    num_heads=2, key_size=2, return_attention_scores=True)+target = np.array([[[0.1, 0.2], [0.0, 0.0]]])+source = np.array([[[0.1, 0.2], [3.0, 1.0]]])+output, scores = test_layer([target, source])+scores = tf.math.reduce_sum(scores, axis=1) # shape = (1, 2, 2)+```++*   Attention beyound sequences. Taking 2D, 3D target and source.++```python+query_shape = [2, 3, 4, 4]  # batch, target, target, embedding.+value_shape = [2, 3, 2, 4]  # batch, source, source, embedding.+mask_shape = [2, 3, 4, 3, 2]+query = 10 * np.random.random_sample(query_shape)+value = 10 * np.random.random_sample(value_shape)+mask_data = np.random.randint(2, size=mask_shape).astype("bool")+output = test_layer([query, value], mask_data)+```++### Interface++```python+class MultiHeadAttention(tf.keras.layers.Layer):+  """MultiHeadAttention layer.++  This is an implementation of multi-headed attention based on "Attention+  is all you Need". If `query`, `key,` `value` are the same, then+  this is self-attention. Each timestep in `query` attends to the+  corresponding sequence in `key`, and returns a fixed-width vector.++  This layer first projects `query`, `key` and `value`. These are+  (effectively) a list of tensors of length `num_attention_heads`, where the+  corresponding shapes are [batch_size, <query dimensions>, key_size],+  [batch_size, <key/value dimensions>, key_size],+  [batch_size, <key/value dimensions>, value_size].++  Then, the query and key tensors are dot-producted and scaled. These are+  softmaxed to obtain attention probabilities. The value tensors are then+  interpolated by these probabilities, then concatenated back to a single+  tensor.++  Finally, the result tensor with the last dimension as value_size can take an+  linear projection and return.++  Examples:++  Performs 1D cross-attention over two sequence inputs with an attention mask.+  Returns the additional attention weights over heads.++  >>> layer = MultiHeadAttention(num_heads=2, key_size=2,+  ...                            return_attention_scores=True)+  >>> target = tf.keras.Input(shape=[8, 16])+  >>> source = tf.keras.Input(shape=[4, 16])+  >>> mask_tensor = tf.keras.Input(shape=[8, 4])+  >>> output_tensor, weights = layer([target, source])

I think the first input tensor was specially handled before Tomer's change, which means you had to use list of tensors if there were multiple inputs. Now you can have individual tensors as kwargs. I think the kwargs will be more readable, and could set as an example for user when there are multiple inputs. @fchollet and @tomerk, WDYT?

saberkun

comment created time in 17 days

Pull request review commenttensorflow/community

# RFC: Multihead Attention and EinsumDense on Keras

+# RFC: Multihead Attention and EinsumDense on Keras++| Status        | (Proposed / Accepted / Implemented / Obsolete)          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [260](https://github.com/tensorflow/community/pull/260) |+| **Author(s)** | Hongkun Yu (hongkuny@google.com), Mark Omernick (momernick@google.com)    |+| **Sponsor**   | Francois Chollet (fchollet@google.com)                  |+| **Updated**   | 2020-06-16                                              |++## Objective++Introduce the MultiHeadAttention layer and EinsumDense layer to tf.keras.++## Motivation++MultiHeadAttention is very popular and has become standard for deep learning+libraries. We propose to contribute a flexible well-defined implementation+inside Keras absorbing common best practices from reference libraries.++## User Benefit++We can standardize the implementation of Transformer layers and use the best+practice. We offer a rich set of functionalities to different use cases, e.g.+different project spaces, outputing multi-head attention scores for analysis,+etc. We also modularize computations to make the MultiHeadAttention layer+extensible to variants.++## Design Proposal++### Key Features++*   Returns multi-headed attention scores, which is commonly useful for+    attention visualization and analysis.+*   Supports query (Q), key (K), value (V) tensors as individual inputs and+    supports projecting Q, K, V to different dimensions.+*   Final outputs projects to user specified dimensions.+*   Using tf.einsum to express high-dimensional computation and adopts+    [tf.keras.layers.experimental.EinsumDense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/EinsumDense)+    layer.+*   Supports high-dimension attention when target and source are 2D, 3D, etc.++### Code Examples++*   How to write a TransformerBlock for an encoder.++```python+class TransformerBlock(tf.keras.layers.Layer):+  def __init__(self, embed_dim, num_heads, ff_dim):+    super(TransformerBlock, self).__init__()+    self.att = attention.MultiHeadAttention(embed_dim, num_heads)+    self.ffn = tf.keras.Sequential(+        [tf.keras.layers.Dense(ff_dim, activation="relu"),+         tf.keras.layers.Dense(embed_dim),]+    )+    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)+    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)++  def call(self, inputs, attention_mask=None):+    attn_output = self.att([inputs, inputs], attention_mask=attention_mask)+    out1 = self.layernorm1(inputs + attn_output)+    ffn_output = self.ffn(out1)+    return self.layernorm2(out1 + ffn_output)+```++*   Use attention mask to avoid performing attention on padding token indices.++```python+test_layer = TransformerBlock(+    embed_dim=2,+    num_heads=2,+    ff_dim=4)+query = np.array([[[0.1, 0.2], [0.0, 0.0]]])+mask = np.array([[[1, 0], [1, 0]]], dtype='bool')+output = test_layer(query, mask)+```++*   Inside a Transformer decoder, we often want to output the cross-attention+    scores to analyze how the target sequence attend to the source sequence. We+    are able to visualize the alignment according to attention scores.++```python+test_layer = MultiHeadAttention(+    num_heads=2, key_size=2, return_attention_scores=True)+target = np.array([[[0.1, 0.2], [0.0, 0.0]]])+source = np.array([[[0.1, 0.2], [3.0, 1.0]]])+output, scores = test_layer([target, source])+scores = tf.math.reduce_sum(scores, axis=1) # shape = (1, 2, 2)+```++*   Attention beyound sequences. Taking 2D, 3D target and source.++```python+query_shape = [2, 3, 4, 4]  # batch, target, target, embedding.+value_shape = [2, 3, 2, 4]  # batch, source, source, embedding.+mask_shape = [2, 3, 4, 3, 2]+query = 10 * np.random.random_sample(query_shape)+value = 10 * np.random.random_sample(value_shape)+mask_data = np.random.randint(2, size=mask_shape).astype("bool")+output = test_layer([query, value], mask_data)+```++### Interface++```python+class MultiHeadAttention(tf.keras.layers.Layer):+  """MultiHeadAttention layer.++  This is an implementation of multi-headed attention based on "Attention+  is all you Need". If `query`, `key,` `value` are the same, then+  this is self-attention. Each timestep in `query` attends to the+  corresponding sequence in `key`, and returns a fixed-width vector.++  This layer first projects `query`, `key` and `value`. These are+  (effectively) a list of tensors of length `num_attention_heads`, where the+  corresponding shapes are [batch_size, <query dimensions>, key_size],+  [batch_size, <key/value dimensions>, key_size],+  [batch_size, <key/value dimensions>, value_size].++  Then, the query and key tensors are dot-producted and scaled. These are+  softmaxed to obtain attention probabilities. The value tensors are then+  interpolated by these probabilities, then concatenated back to a single+  tensor.++  Finally, the result tensor with the last dimension as value_size can take an+  linear projection and return.++  Examples:++  Performs 1D cross-attention over two sequence inputs with an attention mask.+  Returns the additional attention weights over heads.++  >>> layer = MultiHeadAttention(num_heads=2, key_size=2,+  ...                            return_attention_scores=True)+  >>> target = tf.keras.Input(shape=[8, 16])+  >>> source = tf.keras.Input(shape=[4, 16])+  >>> mask_tensor = tf.keras.Input(shape=[8, 4])+  >>> output_tensor, weights = layer([target, source])

@tomerk, since we started the effort of treat all tensor input to be the same and not specialize for the first call arg, should we encourage user to not pass in list of tensor, and have individual kwargs for tensor inputs? like call(query, key, value=None, mask=None)?

I think the explicit kwargs will be more readable, since the plain list doesn't carry any information of the individual tensor.

saberkun

comment created time in 18 days

Pull request review commenttensorflow/community

# RFC: Multihead Attention and EinsumDense on Keras

+# RFC: Multihead Attention and EinsumDense on Keras++| Status        | (Proposed / Accepted / Implemented / Obsolete)          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [260](https://github.com/tensorflow/community/pull/260) |+| **Author(s)** | Hongkun Yu (hongkuny@google.com), Mark Omernick (momernick@google.com)    |+| **Sponsor**   | Francois Chollet (fchollet@google.com)                  |+| **Updated**   | 2020-06-16                                              |++## Objective++Introduce the MultiHeadAttention layer and EinsumDense layer to tf.keras.++## Motivation++MultiHeadAttention is very popular and has become standard for deep learning+libraries. We propose to contribute a flexible well-defined implementation+inside Keras absorbing common best practices from reference libraries.++## User Benefit++We can standardize the implementation of Transformer layers and use the best+practice. We offer a rich set of functionalities to different use cases, e.g.+different project spaces, outputing multi-head attention scores for analysis,+etc. We also modularize computations to make the MultiHeadAttention layer+extensible to variants.++## Design Proposal++### Key Features++*   Returns multi-headed attention scores, which is commonly useful for+    attention visualization and analysis.+*   Supports query (Q), key (K), value (V) tensors as individual inputs and+    supports projecting Q, K, V to different dimensions.+*   Final outputs projects to user specified dimensions.+*   Using tf.einsum to express high-dimensional computation and adopts+    [tf.keras.layers.experimental.EinsumDense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/EinsumDense)+    layer.+*   Supports high-dimension attention when target and source are 2D, 3D, etc.++### Code Examples++*   How to write a TransformerBlock for an encoder.++```python+class TransformerBlock(tf.keras.layers.Layer):+  def __init__(self, embed_dim, num_heads, ff_dim):+    super(TransformerBlock, self).__init__()+    self.att = attention.MultiHeadAttention(embed_dim, num_heads)+    self.ffn = tf.keras.Sequential(+        [tf.keras.layers.Dense(ff_dim, activation="relu"),+         tf.keras.layers.Dense(embed_dim),]+    )+    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)+    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)++  def call(self, inputs, attention_mask=None):+    attn_output = self.att([inputs, inputs], attention_mask=attention_mask)+    out1 = self.layernorm1(inputs + attn_output)+    ffn_output = self.ffn(out1)+    return self.layernorm2(out1 + ffn_output)+```++*   Use attention mask to avoid performing attention on padding token indices.++```python+test_layer = TransformerBlock(+    embed_dim=2,+    num_heads=2,+    ff_dim=4)+query = np.array([[[0.1, 0.2], [0.0, 0.0]]])+mask = np.array([[[1, 0], [1, 0]]], dtype='bool')+output = test_layer(query, mask)+```++*   Inside a Transformer decoder, we often want to output the cross-attention+    scores to analyze how the target sequence attend to the source sequence. We+    are able to visualize the alignment according to attention scores.++```python+test_layer = MultiHeadAttention(+    num_heads=2, key_size=2, return_attention_scores=True)+target = np.array([[[0.1, 0.2], [0.0, 0.0]]])+source = np.array([[[0.1, 0.2], [3.0, 1.0]]])+output, scores = test_layer([target, source])+scores = tf.math.reduce_sum(scores, axis=1) # shape = (1, 2, 2)+```++*   Attention beyound sequences. Taking 2D, 3D target and source.++```python+query_shape = [2, 3, 4, 4]  # batch, target, target, embedding.+value_shape = [2, 3, 2, 4]  # batch, source, source, embedding.+mask_shape = [2, 3, 4, 3, 2]+query = 10 * np.random.random_sample(query_shape)+value = 10 * np.random.random_sample(value_shape)+mask_data = np.random.randint(2, size=mask_shape).astype("bool")+output = test_layer([query, value], mask_data)+```++### Interface++```python+class MultiHeadAttention(tf.keras.layers.Layer):+  """MultiHeadAttention layer.++  This is an implementation of multi-headed attention based on "Attention+  is all you Need". If `query`, `key,` `value` are the same, then+  this is self-attention. Each timestep in `query` attends to the+  corresponding sequence in `key`, and returns a fixed-width vector.++  This layer first projects `query`, `key` and `value`. These are+  (effectively) a list of tensors of length `num_attention_heads`, where the+  corresponding shapes are [batch_size, <query dimensions>, key_size],+  [batch_size, <key/value dimensions>, key_size],+  [batch_size, <key/value dimensions>, value_size].++  Then, the query and key tensors are dot-producted and scaled. These are+  softmaxed to obtain attention probabilities. The value tensors are then+  interpolated by these probabilities, then concatenated back to a single+  tensor.++  Finally, the result tensor with the last dimension as value_size can take an+  linear projection and return.++  Examples:++  Performs 1D cross-attention over two sequence inputs with an attention mask.+  Returns the additional attention weights over heads.++  >>> layer = MultiHeadAttention(num_heads=2, key_size=2,+  ...                            return_attention_scores=True)+  >>> target = tf.keras.Input(shape=[8, 16])+  >>> source = tf.keras.Input(shape=[4, 16])+  >>> mask_tensor = tf.keras.Input(shape=[8, 4])+  >>> output_tensor, weights = layer([target, source])+  >>> print(output_tensor.shape), print(weights.shape)+  (None, 8, 16)  (None, 2, 8, 4)++  Performs 2D self-attention over a 5D input tensor on axes 2 and 3.++  >>> layer = MultiHeadAttention(num_heads=2, key_size=2, attention_axes=(2, 3))+  >>> input_tensor = tf.keras.Input(shape=[5, 3, 4, 16])+  >>> output_tensor = layer([input_tensor, input_tensor])+  >>> print(output_tensor.shape)+  (None, 5, 3, 4, 16)++  Arguments:+    num_heads: Number of attention heads.+    key_size: Size of each attention head for query and key.+    value_size:  Size of each attention head for value.+    dropout: Dropout probability for a Dropout layer on attention_scores.+    use_bias: Boolean, whether the dense layers use bias vectors/matrices.+    output_shape: The expected shape of an output tensor, besides the batch and+      sequence dims. If not specified, projects back to the key feature dim.+    attention_axes: axes over which the attention is applied. `None` means+      attention over all axes, but batch, heads, and features.+    return_attention_scores: bool, if `True`, returns the multi-head+      attention scores as an additional output argument.+    kernel_initializer: Initializer for dense layer kernels.+    bias_initializer: Initializer for dense layer biases.+    kernel_regularizer: Regularizer for dense layer kernels.+    bias_regularizer: Regularizer for dense layer biases.+    activity_regularizer: Regularizer for dense layer activity.+    kernel_constraint: Constraint for dense layer kernels.+    bias_constraint: Constraint for dense layer kernels.+  """++  def call(self, inputs, attention_mask=None):+    """Implements the forward pass.++    Size glossary:+      * Number of heads (H): the number of attention heads.+      * Value size (V): the size of each value embedding per head.+      * Key size (K): the size of each key embedding per head. Equally, the size+          of each query embedding per head. Typically K <= V.+      * Batch dimensions (B).+      * Query (target) attention axes shape (T).+      * Value (source) attention axes shape (S), the rank must match the target.++    Args:+      inputs: List of the following tensors:+        * query: Query `Tensor` of shape `[B, T, dim]`.+        * value: Value `Tensor` of shape `[B, S, dim]`.+        * key: Optional key `Tensor` of shape `[B, S, dim]`. If not given, will+          use `value` for both `key` and `value`, which is the most common case.+      attention_mask: a boolean mask of shape `[B, T, S]`, that prevents+        attention to certain positions.++    Returns:+      attention_output: The result of the computation, of shape [B, T, E],+        where `T` is for target sequence shapes and `E` is the query input last+        dimension if `output_shape` is `None`. Otherwise, the multi-head outputs+        are project to the shape specified by `output_shape`.+      attention_scores: [Optional] multi-head attention coeffients over+        attention axes.+    """+```++### Auxiliary Layers and Changes++*   EinsumDense layer++We use `tf.einsum` to implement a dense layer can perform einsum calculations of+arbitrary dimensionality. This example shows how to instantiate a layer that+applies the same dense operation to every element in a sequence. Here, the+'output_shape' has two values (since there are two non-batch dimensions in the+output); the first dimension in the output_shape is `None`, because the sequence+dimension `b` has an unknown shape.++```python+layer = EinsumDense("abc,cd->abd", output_shape=(None, 64), bias_axes="d")+input_tensor = tf.keras.Input(shape=[32, 128])+output_tensor = layer(input_tensor) # output shape is (None, 32, 64)+```++*   Masked Softmax++Inside the attention computation, we need to mask logits before softmax and it+has become a common treatment in many applications. We propose to add an+optional `mask` argument to `tf.nn.softmax`. The downstream keras `Softmax`+layer will also take an optional `mask` tensor. This `mask` tensor should have+the same rank as the input tensor and mask elements on the axis which will+perform softmax.++Inside `MultiHeadAttention` keras layer, we will use the keras `Softmax` layer+with mask and adjust attention mask shape to match the inputs. The dimension+expension logic and multi-axes softmax will be handled locally in+`MultiHeadAttention` layer.++*   Keras Dense Attention++[tf.keras.layers.Attention](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention)+layer call method takes an optional argument, `mask`, which requires two+tensors, `q_mask` and `v_mask`. They are following keras framework requirements+with (batch_size, target_length) and (batch_size, source_length) as shapes. This+limits the flexibility of masking and `MultiHeadAttention` layer generalize the+attention mask to be (batch dims, target dims, source dims). To be consistent,+we would like to introduce an optional argument `attention_mask` for+`tf.keras.layers.Attention`. In the reduced case of `tf.keras.layers.Attention`,+the shape is (batch_size, target_length, source_length). Whenever+`attention_mask` is specified, the `mask` argument is OK to be skipped.++* TFA `MultiHeadAttention` Deprecation and Re-mapping++[MultiHeadAttention](https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/multihead_attention.py) has been released. The proposed `MultiHeadAttention` has similar `__init__` arguments+and `call` interface, where the minor differences are argument names and the attention `mask` shape.+We expect the new `MultiHeadAttention` keras layer will +cover the functionalities. Once the implementation are merged as experimental layers,+we will work with TF Addons team to design the deprecation and re-mapping procedure.+++### Alternatives Considered++We examined multi-head attention layer implemented in various libraries. There+are a few features that we do not include inside this keras layer and we feel it+is better to subclass the `MultiHeadAttention` layer to fulfill the needs.++*   Attention caching for decoding. Implemented in+    [Flax](https://github.com/google/flax/blob/master/flax/nn/attention.py#L301).+    The caching is a special treatment for inference and we noticied that+    different treatments are required for dynamic or static shape programs.+    Thus, subclassing as a+    [CachedAttention](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/attention.py)+    layer is the solution inside the model garden.+*   [MultiHeadAttention](https://github.com/tensorflow/addons/blob/master/tensorflow_addons/layers/multihead_attention.py)+    keras layer is also implemented in TF-Addons. The design in this doc covers+    the features in TF-addons implementation but generalizes to more use cases.++### Performance Implications++*   We will add microbenchmarks following the common practices of keras layers.+*   We have end-to-end integration/regression tests for models using this layer,+    e.g. BERT.++### Dependencies++No dependencies.++### Engineering Impact++*   The keras layer can be tested inside the package.+*   TensorFlow team will maintain the code.++### Platforms and Environments++*   Work for all platforms and environments++### Best Practices++*   No change for Tensorflow best practices.++### Tutorials and Examples++*   Code examples can be found inside Tensorflow Model Garden. For example, an+    encoder+    [Transformer](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/transformer.py).++*   2D attention example in the+    [unit test](https://github.com/tensorflow/models/blob/master/official/nlp/modeling/layers/attention_test.py#L135).++### Compatibility++*   This is a new layer without compatibility concerns.+*   The proposal works with TFLite, distribution strategy, tf.function, GPU/TPU+    and serializable to SavedModel. These are tested inside TensorFlow Model+    Garden applications.++### User Impact++*   We will first introduce the layer as+    `tf.keras.layers.experimental.MultiHeadAttention` and+    `tf.keras.layers.experimental.EinsumDense`. When the APIs are stable and+    functionalities are fully verified, the next step is to+    graduate as core keras layers by removing `experimental` scope.+    

@fchollet, for the existing keras.layers.Attention and keras.layers.AddictiveAttention, I think we should add some clarification to not confuse user between them and the new multihead attention. I could expect most of the user will use multihead attention.

saberkun

comment created time in 18 days

Pull request review commenttensorflow/community

# RFC: Multihead Attention and EinsumDense on Keras

+# RFC: Multihead Attention and EinsumDense on Keras++| Status        | (Proposed / Accepted / Implemented / Obsolete)          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [260](https://github.com/tensorflow/community/pull/260) |+| **Author(s)** | Hongkun Yu (hongkuny@google.com), Mark Omernick (momernick@google.com)    |+| **Sponsor**   | Francois Chollet (fchollet@google.com)                  |+| **Updated**   | 2020-06-16                                              |++## Objective++Introduce the MultiHeadAttention layer and EinsumDense layer to tf.keras.++## Motivation++MultiHeadAttention is very popular and has become standard for deep learning+libraries. We propose to contribute a flexible well-defined implementation+inside Keras absorbing common best practices from reference libraries.++## User Benefit++We can standardize the implementation of Transformer layers and use the best+practice. We offer a rich set of functionalities to different use cases, e.g.+different project spaces, outputing multi-head attention scores for analysis,+etc. We also modularize computations to make the MultiHeadAttention layer+extensible to variants.++## Design Proposal++### Key Features++*   Returns multi-headed attention scores, which is commonly useful for+    attention visualization and analysis.+*   Supports query (Q), key (K), value (V) tensors as individual inputs and+    supports projecting Q, K, V to different dimensions.+*   Final outputs projects to user specified dimensions.+*   Using tf.einsum to express high-dimensional computation and adopts+    [tf.keras.layers.experimental.EinsumDense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/experimental/EinsumDense)+    layer.+*   Supports high-dimension attention when target and source are 2D, 3D, etc.++### Code Examples++*   How to write a TransformerBlock for an encoder.++```python+class TransformerBlock(tf.keras.layers.Layer):+  def __init__(self, embed_dim, num_heads, ff_dim):+    super(TransformerBlock, self).__init__()+    self.att = attention.MultiHeadAttention(embed_dim, num_heads)+    self.ffn = tf.keras.Sequential(+        [tf.keras.layers.Dense(ff_dim, activation="relu"),+         tf.keras.layers.Dense(embed_dim),]+    )+    self.layernorm1 = tf.keras.layers.LayerNormalization(epsilon=1e-6)+    self.layernorm2 = tf.keras.layers.LayerNormalization(epsilon=1e-6)++  def call(self, inputs, attention_mask=None):+    attn_output = self.att([inputs, inputs], attention_mask=attention_mask)+    out1 = self.layernorm1(inputs + attn_output)+    ffn_output = self.ffn(out1)+    return self.layernorm2(out1 + ffn_output)+```++*   Use attention mask to avoid performing attention on padding token indices.++```python+test_layer = TransformerBlock(+    embed_dim=2,+    num_heads=2,+    ff_dim=4)+query = np.array([[[0.1, 0.2], [0.0, 0.0]]])+mask = np.array([[[1, 0], [1, 0]]], dtype='bool')+output = test_layer(query, mask)+```++*   Inside a Transformer decoder, we often want to output the cross-attention+    scores to analyze how the target sequence attend to the source sequence. We+    are able to visualize the alignment according to attention scores.++```python+test_layer = MultiHeadAttention(+    num_heads=2, key_size=2, return_attention_scores=True)+target = np.array([[[0.1, 0.2], [0.0, 0.0]]])+source = np.array([[[0.1, 0.2], [3.0, 1.0]]])+output, scores = test_layer([target, source])+scores = tf.math.reduce_sum(scores, axis=1) # shape = (1, 2, 2)+```++*   Attention beyound sequences. Taking 2D, 3D target and source.++```python+query_shape = [2, 3, 4, 4]  # batch, target, target, embedding.+value_shape = [2, 3, 2, 4]  # batch, source, source, embedding.+mask_shape = [2, 3, 4, 3, 2]+query = 10 * np.random.random_sample(query_shape)+value = 10 * np.random.random_sample(value_shape)+mask_data = np.random.randint(2, size=mask_shape).astype("bool")+output = test_layer([query, value], mask_data)+```++### Interface++```python+class MultiHeadAttention(tf.keras.layers.Layer):+  """MultiHeadAttention layer.++  This is an implementation of multi-headed attention based on "Attention+  is all you Need". If `query`, `key,` `value` are the same, then+  this is self-attention. Each timestep in `query` attends to the+  corresponding sequence in `key`, and returns a fixed-width vector.++  This layer first projects `query`, `key` and `value`. These are+  (effectively) a list of tensors of length `num_attention_heads`, where the+  corresponding shapes are [batch_size, <query dimensions>, key_size],+  [batch_size, <key/value dimensions>, key_size],+  [batch_size, <key/value dimensions>, value_size].++  Then, the query and key tensors are dot-producted and scaled. These are+  softmaxed to obtain attention probabilities. The value tensors are then+  interpolated by these probabilities, then concatenated back to a single+  tensor.++  Finally, the result tensor with the last dimension as value_size can take an+  linear projection and return.++  Examples:++  Performs 1D cross-attention over two sequence inputs with an attention mask.+  Returns the additional attention weights over heads.++  >>> layer = MultiHeadAttention(num_heads=2, key_size=2,+  ...                            return_attention_scores=True)+  >>> target = tf.keras.Input(shape=[8, 16])+  >>> source = tf.keras.Input(shape=[4, 16])+  >>> mask_tensor = tf.keras.Input(shape=[8, 4])

The mask_tensor seems not used in this code block.

saberkun

comment created time in 18 days

Pull request review commenttensorflow/addons

Update context checks in dynamic_decode

 def dynamic_decode(     Raises:       ValueError: if `maximum_iterations` is provided but is not a scalar.     """-    with tf.compat.v1.variable_scope(scope, "decoder") as varscope:-        # Determine context types.-        ctxt = tf.compat.v1.get_default_graph()._get_control_flow_context()-        is_xla = control_flow_util.GetContainingXLAContext(ctxt) is not None-        in_while_loop = control_flow_util.GetContainingWhileContext(ctxt) is not None-        # Properly cache variable values inside the while_loop.-        # Don't set a caching device when running in a loop, since it is-        # possible that train steps could be wrapped in a tf.while_loop. In that-        # scenario caching prevents forward computations in loop iterations from-        # re-reading the updated weights.-        if not tf.executing_eagerly() and not in_while_loop:-            if varscope.caching_device is None:-                varscope.set_caching_device(lambda op: op.device)+    with tf.name_scope(scope or "decoder"):+        is_xla = not tf.executing_eagerly() and control_flow_util.GraphOrParentsInXlaContext(+            tf.compat.v1.get_default_graph()

Ah, you are right.

guillaumekln

comment created time in 25 days

Pull request review commenttensorflow/addons

Update context checks in dynamic_decode

 def dynamic_decode(     Raises:       ValueError: if `maximum_iterations` is provided but is not a scalar.     """-    with tf.compat.v1.variable_scope(scope, "decoder") as varscope:-        # Determine context types.-        ctxt = tf.compat.v1.get_default_graph()._get_control_flow_context()-        is_xla = control_flow_util.GetContainingXLAContext(ctxt) is not None-        in_while_loop = control_flow_util.GetContainingWhileContext(ctxt) is not None-        # Properly cache variable values inside the while_loop.-        # Don't set a caching device when running in a loop, since it is-        # possible that train steps could be wrapped in a tf.while_loop. In that-        # scenario caching prevents forward computations in loop iterations from-        # re-reading the updated weights.-        if not tf.executing_eagerly() and not in_while_loop:-            if varscope.caching_device is None:

Ah, you are right. The code is only expect to work in tf v1. Removing it for v2 should be fine.

guillaumekln

comment created time in a month

Pull request review commenttensorflow/addons

Update context checks in dynamic_decode

 def dynamic_decode(     Raises:       ValueError: if `maximum_iterations` is provided but is not a scalar.     """-    with tf.compat.v1.variable_scope(scope, "decoder") as varscope:-        # Determine context types.-        ctxt = tf.compat.v1.get_default_graph()._get_control_flow_context()-        is_xla = control_flow_util.GetContainingXLAContext(ctxt) is not None-        in_while_loop = control_flow_util.GetContainingWhileContext(ctxt) is not None-        # Properly cache variable values inside the while_loop.-        # Don't set a caching device when running in a loop, since it is-        # possible that train steps could be wrapped in a tf.while_loop. In that-        # scenario caching prevents forward computations in loop iterations from-        # re-reading the updated weights.-        if not tf.executing_eagerly() and not in_while_loop:-            if varscope.caching_device is None:

The caching device will have a big performance boost if the code is run in a remote parameter server and worker setting, so that the variable is not read for every iteration of while loop. Is there any reason to remove it?

guillaumekln

comment created time in a month

Pull request review commenttensorflow/addons

Update context checks in dynamic_decode

 def dynamic_decode(     Raises:       ValueError: if `maximum_iterations` is provided but is not a scalar.     """-    with tf.compat.v1.variable_scope(scope, "decoder") as varscope:-        # Determine context types.-        ctxt = tf.compat.v1.get_default_graph()._get_control_flow_context()-        is_xla = control_flow_util.GetContainingXLAContext(ctxt) is not None-        in_while_loop = control_flow_util.GetContainingWhileContext(ctxt) is not None-        # Properly cache variable values inside the while_loop.-        # Don't set a caching device when running in a loop, since it is-        # possible that train steps could be wrapped in a tf.while_loop. In that-        # scenario caching prevents forward computations in loop iterations from-        # re-reading the updated weights.-        if not tf.executing_eagerly() and not in_while_loop:-            if varscope.caching_device is None:-                varscope.set_caching_device(lambda op: op.device)+    with tf.name_scope(scope or "decoder"):+        is_xla = not tf.executing_eagerly() and control_flow_util.GraphOrParentsInXlaContext(+            tf.compat.v1.get_default_graph()

I think you are missing _get_control_flow_context() for get_default_graph()

guillaumekln

comment created time in a month

push eventqlzh727/addons

Sean Morgan

commit sha b92da7031647ec626ff85c18f078fcceee8e994c

HOTFIX: pin docker container pip install to py36 (#1117) * * HOTIFX pin docker container pip install to py36 * Better way to handle

view details

Gabriel de Marmiesse

commit sha 6b202169c95fb2cafa111f6c47995bd0eaad1dd2

Test that the wheel work in a fresh environement. (#1113)

view details

Qianli Scott Zhu

commit sha a8e04afeab32e6ffda13c3d0a63ce0a05b5d3dfe

Update AttentionStateWrapper to work with Keras RNN layer (#1118) * Update AttentionStateWrapper to work with Keras. * Fix lint errors.

view details

Gabriel de Marmiesse

commit sha 51a5940584b24410ed361bb960669b08065ce4a4

Used the official supported image. (#1120)

view details

Sean Morgan

commit sha f78818c2e480e7ab51fd496ed141dd2036b65b94

Fix label of macos version in wheels (#1119) * Fix labeled of macos wheels * Update plat version * Fix string building

view details

Gabriel de Marmiesse

commit sha d4be864fde22c0fc78ae120ca5173bc058e8b3d3

We now test the release code at each commit. (#1127) * We now test the release code at each commit.

view details

Gabriel de Marmiesse

commit sha 0581edf3956796977c1664cedefef03d9cdfc252

Grouped the versions for tensorflow-cpu. (#1130)

view details

Gabriel de Marmiesse

commit sha 39556383c657ee55f0503058302625d9881802a7

Used the official tf gpu image. (#1133)

view details

Gabriel de Marmiesse

commit sha 66e8ca89fe443639eee462fc2411842973c010ba

Added a py function for hardshrink (#1128) * Added a py function. * Added a simple test.

view details

Gabriel de Marmiesse

commit sha b961b1135a2e857df1e9ef2ea49a470bacf0db82

Made the attention_wrapper_test faster. (#1141)

view details

Gabriel de Marmiesse

commit sha 34132dfdd593b031adf6c290f3b166e038b7616c

Made the rectified adam test faster. (#1142)

view details

Gabriel de Marmiesse

commit sha 2c1ed4f45b2a21dc7fbb36dbc6a235722d22d297

Added python implementation for mish (#1139) * Added py implementation for mish

view details

Gabriel de Marmiesse

commit sha cc376d0a55a2f05ef2ed0b954d7a5f5ec7bfd4e0

Add python implementation of softshrink (#1140) * Add softshrink python op * Added check.

view details

Gabriel de Marmiesse

commit sha a4b2ae45fde5dbc807f31f9a0211d21d1c628eca

Added pure python implementation of lish (#1138)

view details

Gabriel de Marmiesse

commit sha e2005e23f2372112515a251d0c5ea8aeeff4ec03

Finish the bazel install before building. (#1135)

view details

Gabriel de Marmiesse

commit sha 49637c3bc8c53b11a2a6490048ef824cf4c46b44

Grouped dependencies together. (#1131)

view details

Gabriel de Marmiesse

commit sha c0f4a249d3875c4c8568c0ebf5095d70e5932c40

Relax requirements for dependencies (#1149) * Relax required versions. * Relax the requirements for some dependencies.

view details

Dheeraj R Reddy

commit sha eb416bfd5442296d32e8f3d2bee512b2e4c7743f

Remove extra space. (#1150)

view details

who who who

commit sha 07febff964e45ad8930f6e487abcf43848335c86

add tanhshrink_py (#1146) * add tanhshrink_py * format code * remove useless test cases

view details

Gabriel de Marmiesse

commit sha 062f02638a1e35185cbf1fb9787d207ff7c2f8c6

Remove nightly files. (#1136) * Remove nightly files.

view details

push time in a month

push eventqlzh727/addons

qlzh727

commit sha f107be50ddc7c9cf603b3065cfc1700f55b6dad1

Revert "Fix lint errors." This reverts commit 63d83d2f62f08c09dbbc6d31aacf22f86b09280b.

view details

qlzh727

commit sha f7b4fd95d326958593026e33e3e93690d353048c

Revert "Update AttentionStateWrapper to work with Keras." This reverts commit c30ac7b64838772eb578e23406780b5ce3e6e0a1.

view details

push time in a month

issue commenttensorflow/tensorflow

MobileNetV3

It seems that the code was added to keras-team/keras, but hasn't been port to tf/keras yet. We will fix it and make the pre-trained weight available.

CRosero

comment created time in a month

pull request commenttensorflow/tensorflow

correct summing total blocks in EfficientNet

Yea, I think that is aligning with my expectation as well, we will have to update the weights. We need to check with @fchollet about how weights are created and uploaded to Keras side. There is also a checksum in the source code, which I think need to be updated within the PR as well.

yixingfu

comment created time in a month

pull request commenttensorflow/tensorflow

correct summing total blocks in EfficientNet

Since this change will cause potential change to the saved weights. Could you check if the existing h5 weights need any update with the default parameter?

yixingfu

comment created time in a month

pull request commenttensorflow/tensorflow

correct summing total blocks in EfficientNet

Thanks for reporting the issue and sending the PR.

I found the link of the original implementation from the https://arxiv.org/abs/1905.11946, and corresponding code is at https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/efficientnet_model.py#L693.

drop_rate = 1.0 - survival_prob
survival_prob = 1.0 - drop_rate * float(idx) / len(self._blocks)

self._blocks is the list that contains all the blocks, the block is created by rounded value of "repeat". The existing implementation is incorrect since adding unrounded value together could be larger than the total number of blocks.

yixingfu

comment created time in a month

pull request commentkeras-team/keras-tuner

Fix #284. Parent condition should not be checked.

@omalleyt12, what's the process of reviewing PR for this repo? Is there any active member monitoring the incoming PR and triage them?

yixingfu

comment created time in a month

issue commenttensorflow/tensorflow

Memory leak on TF2.1 model.fit with validation_split

Memory log before change: #--- Run 1 of 20 memory used (MB): 420.94592 #--- Run 2 of 20 memory used (MB): 455.458816 #--- Run 3 of 20 memory used (MB): 480.89088 #--- Run 4 of 20 memory used (MB): 504.799232 #--- Run 5 of 20 memory used (MB): 465.563648 #--- Run 6 of 20 memory used (MB): 485.797888 #--- Run 7 of 20 memory used (MB): 506.544128 #--- Run 8 of 20 memory used (MB): 526.76608 #--- Run 9 of 20 memory used (MB): 547.782656 #--- Run 10 of 20 memory used (MB): 487.981056 #--- Run 11 of 20 memory used (MB): 508.862464 #--- Run 12 of 20 memory used (MB): 528.904192 #--- Run 13 of 20 memory used (MB): 549.933056 #--- Run 14 of 20 memory used (MB): 570.032128 #--- Run 15 of 20 memory used (MB): 510.455808 #--- Run 16 of 20 memory used (MB): 530.501632 #--- Run 17 of 20 memory used (MB): 551.559168 #--- Run 18 of 20 memory used (MB): 571.408384 #--- Run 19 of 20 memory used (MB): 529.518592 #--- Run 20 of 20 memory used (MB): 549.376

Memory log after change: #--- Run 1 of 20 memory used (MB): 441.933824 #--- Run 2 of 20 memory used (MB): 463.753216 #--- Run 3 of 20 memory used (MB): 465.801216 #--- Run 4 of 20 memory used (MB): 466.366464 #--- Run 5 of 20 memory used (MB): 467.0464 #--- Run 6 of 20 memory used (MB): 467.709952 #--- Run 7 of 20 memory used (MB): 468.668416 #--- Run 8 of 20 memory used (MB): 468.62336 #--- Run 9 of 20 memory used (MB): 474.35776 #--- Run 10 of 20 memory used (MB): 474.353664 #--- Run 11 of 20 memory used (MB): 474.472448 #--- Run 12 of 20 memory used (MB): 474.648576 #--- Run 13 of 20 memory used (MB): 474.697728 #--- Run 14 of 20 memory used (MB): 474.750976 #--- Run 15 of 20 memory used (MB): 474.804224 #--- Run 16 of 20 memory used (MB): 474.800128 #--- Run 17 of 20 memory used (MB): 474.857472 #--- Run 18 of 20 memory used (MB): 474.918912 #--- Run 19 of 20 memory used (MB): 475.086848 #--- Run 20 of 20 memory used (MB): 475.348992

dannyfriar

comment created time in a month

issue closedtensorflow/tensorflow

Memory leak on TF2.1 model.fit with validation_split

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.3
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): 2.1.0
  • Python version: 3.7.5
  • GPU model and memory: No GPU

Describe the current behavior Fitting a simple LSTM model causes memory leak

Standalone code to reproduce the issue

import os
import psutil

import numpy as np
from tensorflow.keras.layers import (
    LSTM,
    Bidirectional,
    Dense,
    Input,
)
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam


def dummy_model(n_classes, n_features, seq_length):
    input = Input(shape=(seq_length, n_features))
    main = Bidirectional(LSTM(128, return_sequences=False))(input)

    prediction = Dense(n_classes, activation="softmax")(main)
    optimiser = Adam(lr=1e-3)
    model = Model(inputs=[input], outputs=prediction)
    model.compile(optimiser, "categorical_crossentropy", metrics=["accuracy"])
    return model


def fit_model():
    x_train = np.random.random_sample((1000, 50, 100))
    x_train = x_train.astype(np.float32)
    y_train = np.zeros((1000, 10), dtype=np.float32)
    model = dummy_model(n_classes=10, n_features=100, seq_length=50)
    model.fit(
        x_train, y_train, epochs=1,
        validation_split=0.1, batch_size=64,
        verbose=0
    )


if __name__ == "__main__":
    process = psutil.Process(os.getpid())
    n = 20

    for i in range(n):
        fit_model()
        print(f"#--- Run {i + 1} of {n} memory used (MB): {process.memory_info().rss / 1e6}")

Other info / logs

#--- Run 1 of 20 memory used (MB): 731.81184
#--- Run 2 of 20 memory used (MB): 991.137792
#--- Run 3 of 20 memory used (MB): 1027.985408
#--- Run 4 of 20 memory used (MB): 1089.724416
#--- Run 5 of 20 memory used (MB): 1127.616512
#--- Run 6 of 20 memory used (MB): 1165.471744
#--- Run 7 of 20 memory used (MB): 1211.96544
#--- Run 8 of 20 memory used (MB): 1247.653888
#--- Run 9 of 20 memory used (MB): 1272.410112
#--- Run 10 of 20 memory used (MB): 1289.478144
#--- Run 11 of 20 memory used (MB): 1298.57536
#--- Run 12 of 20 memory used (MB): 1317.429248
#--- Run 13 of 20 memory used (MB): 1346.781184
#--- Run 14 of 20 memory used (MB): 1369.145344
#--- Run 15 of 20 memory used (MB): 1402.55232
#--- Run 16 of 20 memory used (MB): 1409.634304
#--- Run 17 of 20 memory used (MB): 1413.599232
#--- Run 18 of 20 memory used (MB): 1419.091968
#--- Run 19 of 20 memory used (MB): 1450.14784
#--- Run 20 of 20 memory used (MB): 1468.604416

Issue also appears to occur with 2.2.0rc3

Explicitly setting the following (as was recommended in this blog post - http://gregoryzynda.com/python/tensorflow/memory/leak/rnn/lstm/2019/10/17/lstm-memory-leak.html) appears to help mitigate this (although memory still increases):

tf.config.threading.set_intra_op_parallelism_threads(2)
tf.config.threading.set_inter_op_parallelism_threads(5)

As does explicitly running garbage collection gc.collect() after each iteration.

closed time in a month

dannyfriar

issue commenttensorflow/tensorflow

Memory leak on TF2.1 model.fit with validation_split

With fix in https://github.com/tensorflow/tensorflow/commit/ce2f9824eee904d60fdc0444ac8a82b217ea9149, the memory usage for validation_split is reduced.

Please note that I update the snippet to defined the model only once, and fit within the for loop. This is more aligned to normal workflow by user.

If you need to define multiple model in a loop, you might want to use keras.backend.clear_session() to remove unused the models in the keras global graph.

dannyfriar

comment created time in a month

issue closedtensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): VERSION: 2.2.0; GIT_VERSION: v2.2.0-rc4-8-g2b96f3662b
  • Python version: 3.7
  • CUDA/cuDNN version: CUDA 10.1, cuDNN 7.6.5
  • GPU model and memory: GTX 1080 8Gb; 16 Gb RAM

Describe the current behavior My code works great on 2.1.1 but not works at 2.2.0. (Error log №1 below) Empirically found that the problem appears if a dropout or recurrent_dropout is used in GRU layers. Tried to change GRU to LSTM also, - same problem. I tried to use tf.compat.v1.experimental.output_all_intermediates() True and False - has no effect. At 2.2.0 it works ONLY if I remove dropout and reccurent_dropout options from GRU layers AND disable eager_execution with tf.compat.v1.disable_eager_execution() command. But if I remove dropouts and eager is enabled - I have another error (Error log №2 below)

Standalone code to reproduce the issue
Test case with this problem: https://colab.research.google.com/drive/1HUayaLsHNZ30JaBlxvLyQz7Evf1FnsD5?usp=sharing

Other info / logs Error log №1:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

14 frames
InvalidArgumentError: Node 'training/SGD/gradients/gradients/GRU_1/while_grad/GRU_1/while_grad': Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs. Try using tf.compat.v1.experimental.output_all_intermediates(True).

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1385 
   1386   def _extend_graph(self):

InvalidArgumentError: Node 'training/SGD/gradients/gradients/GRU_1/while_grad/GRU_1/while_grad': Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs. Try using tf.compat.v1.experimental.output_all_intermediates(True).

Error log №2:

tf.keras.utils.plot_model(model)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-8c47125ededc> in <module>()
----> 1 tf.keras.utils.plot_model(model)

1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir, expand_nested, dpi, subgraph)
    141 
    142     # Append a wrapped layer's label to node's label, if it exists.
--> 143     layer_name = layer.name
    144     class_name = layer.__class__.__name__
    145 

AttributeError: 'dict' object has no attribute 'name'
model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-1d2f84f55c4b> in <module>()
----> 1 model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)

10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    966           except Exception as e:  # pylint:disable=broad-except
    967             if hasattr(e, "ag_error_metadata"):
--> 968               raise e.ag_error_metadata.to_exception(e)
    969             else:
    970               raise

AttributeError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:543 train_step  **
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:391 update_state
        self._build(y_pred, y_true)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:322 _build
        self._metrics, y_true, y_pred)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1118 map_structure_up_to
        **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1214 map_structure_with_tuple_paths_up_to
        *flat_value_lists)]
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1213 <listcomp>
        results = [func(*args, **kwargs) for args in zip(flat_path_list,
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1116 <lambda>
        lambda _, *values: func(*values),  # Discards the path arg.
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:421 _get_metric_objects
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:421 <listcomp>
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:442 _get_metric_object
        y_t_rank = len(y_t.shape.as_list())

    AttributeError: 'NoneType' object has no attribute 'shape'

closed time in a month

Bocharick

issue commenttensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

I tested you colab with latest nightly, and it is working now. Closing this issue.

Bocharick

comment created time in a month

issue closedtensorflow/tensorflow

MemoryOptimizer produces broken graph with AlreadyExistsError exception while running GRU layer on Tensorflow 2.2.0rc_3

System information

  • Custom model built using keras
  • MacBook Pro, 8-Core Intel Core i9, macOS Catalina 10.15.4
  • TensorFlow installed from pip in virtual environment
  • TensorFlow v2.2.0-rc2-77-gaad398b5e9 2.2.0-rc3
  • Python 3.7.5
  • Running on CPU

Describe the current behavior The code snippet listed below outputs multiple tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at variable_ops.cc:104 : Already exists: Resource warnings and finally exists with tensorflow.python.framework.errors_impl.AlreadyExistsError exception.

Note: the code works correctly if the GRU layer size is decreased from 320 to 80. It also works if TensorFlow is downgraded to version 2.0.1.

The issue is related to https://github.com/tensorflow/tensorflow/issues/23780 issue reported in 2018. This issue offers code to reproduce it and occurs on the latest version of TensorFlow.

Describe the expected behavior The code should work without exception.

Standalone code to reproduce the issue

import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten, Bidirectional, GRU
from tensorflow.keras.layers import Conv1D, MaxPooling1D

x = np.random.rand(1000, 401, 17)
y = np.random.choice([0, 1], size=(1000, 301))

model = Sequential()
model.add(Conv1D(filters=320, kernel_size=26, activation='relu', input_shape=(401, x.shape[2])))
model.add(MaxPooling1D(pool_size=13, strides=13))
model.add(Bidirectional(GRU(320, dropout=0.2, recurrent_dropout=0.2, return_sequences=True)))
model.add(Flatten())
model.add(Dense(2000, activation="relu"))
model.add(Dense(301, activation="sigmoid"))
model.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=["accuracy"])

model.summary()

model.fit(x=x, y=y, epochs=1, verbose=1)

The Google Colab notebook is available here. The error is reproducible.

Other info / logs The code above generates following output:

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv1d (Conv1D)              (None, 376, 320)          141760    
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 28, 320)           0         
_________________________________________________________________
bidirectional (Bidirectional (None, 28, 640)           1232640   
_________________________________________________________________
flatten (Flatten)            (None, 17920)             0         
_________________________________________________________________
dense (Dense)                (None, 2000)              35842000  
_________________________________________________________________
dense_1 (Dense)              (None, 301)               602301    
=================================================================
Total params: 37,818,701
Trainable params: 37,818,701
Non-trainable params: 0
_________________________________________________________________
2020-04-26 10:19:57.349570: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at variable_ops.cc:104 : Already exists: Resource __per_step_0/gradient_tape/sequential/bidirectional/backward_gru/while/sequential/bidirectional/backward_gru/while_grad/body/_877/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
2020-04-26 10:19:57.363399: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at variable_ops.cc:104 : Already exists: Resource __per_step_0/gradient_tape/sequential/bidirectional/backward_gru/while/sequential/bidirectional/backward_gru/while_grad/body/_877/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
2020-04-26 10:19:57.377361: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at variable_ops.cc:104 : Already exists: Resource __per_step_0/gradient_tape/sequential/bidirectional/backward_gru/while/sequential/bidirectional/backward_gru/while_grad/body/_877/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
... (repeated multiple times) ...
2020-04-26 10:19:57.677304: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at variable_ops.cc:104 : Already exists: Resource __per_step_0/gradient_tape/sequential/bidirectional/forward_gru/while/sequential/bidirectional/forward_gru/while_grad/body/_577/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "alreadyexists_err.py", line 21, in <module>
    model.fit(x=x, y=y, epochs=1, verbose=1)
  File "venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 66, in _method_wrapper
    return method(self, *args, **kwargs)
  File "venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 851, in fit
    tmp_logs = train_function(iterator)
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 644, in _call
    return self._stateless_fn(*args, **kwds)
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/sequential/bidirectional/backward_gru/while/sequential/bidirectional/backward_gru/while_grad/body/_877/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/sequential/bidirectional/backward_gru/while/sequential/bidirectional/backward_gru/while_grad/body/_877/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7551]

Function call stack:
train_function

closed time in a month

jakublipinski

issue commenttensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

Btw https://github.com/tensorflow/tensorflow/commit/80a93674eafc224a45cbe96c65e993e9735634a3 should fix the issue for training. Let me verify it when we have a new nightly PIP.

Bocharick

comment created time in a month

issue commenttensorflow/tensorflow

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

@Krishnarohith10.

Please take a closer look for the suggestions in https://forums.developer.nvidia.com/t/could-not-create-cudnn-handle-cudnn-status-alloc-failed/108261, which involves upgrade cuda/cudnn kernel. Please note that we don't have access to any user device, and hence can't reproduce the issue on our side. From the error log above, this is clearly a local env issue, and there isn't any action I can take here. I would suggest you to seek more help on stack overflow where community group can each other.

Also, please be respectful when commenting. I don't think any comment like "Why the hell" is helpful here.

Krishnarohith10

comment created time in a month

issue closedtensorflow/tensorflow

absl.flags._exceptions.UnparsedFlagAccessError if used flags in tf.TensorSpec

System information

  • Have I written custom code Yes:
  • OS Platform and Distribution Win10:
  • Mobile device No:
  • TensorFlow installed from (source or binary): - TensorFlow version (use command below):
  • Python version: 2.1.0

Standalone code to reproduce the issue code snippets url: https://gitee.com/songhaohao2018/codes/cmunjs6zqbe895y7h0gpa21

closed time in a month

songs18

issue commenttensorflow/tensorflow

absl.flags._exceptions.UnparsedFlagAccessError if used flags in tf.TensorSpec

This is expected. The flag is parsed when app.run(main) is executed. However, the @tf.function annotation is parsed when the class BiRNN and all its methods are parsed. Note that method body is not parsed until it get invoked, but method signature is parsed when class is parsed. This means that you can't have FLAG.hight in the annotation for the typespec.

Note that input_signature is used to control the shape and dtype of the tensor passed to the function. It's fine to be None, and it won't cause any issue / retrace, as long as the data passed are with same dtype and shape.

songs18

comment created time in a month

issue closedtensorflow/tensorflow

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • TensorFlow version (use command below): 2.1.0
  • Python version: 3.6.9
  • CUDA/cuDNN version: 10.1.105/ 7.6.5
  • GPU model and memory: GTX 1660 ti 6GB and 32GB memory

Describe the current behavior 1

Describe the expected behavior 2

Standalone code to reproduce the issue import tensorflow as tf inp = tf.random.normal([32, 10, 8]) lstm = tf.keras.layers.LSTM(4) out = lstm(inp)

Other info / logs As you can in expected behavior it worked but I always have to set gpu memory growth. Which is not the permanent solution. I used to get no issue before cause I upgraded tensorflow to 2.2.0 and this started. I also downgraded to previous version still getting this error. Can someone please help me? Thank You in advance.

closed time in a month

Krishnarohith10

issue commenttensorflow/tensorflow

Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED

I think its an env config issue, and not related to TF code. Search google about the "CUDNN_STATUS_ALLOC_FAILED" and the first result shows some way to fix issues like this.

https://forums.developer.nvidia.com/t/could-not-create-cudnn-handle-cudnn-status-alloc-failed/108261

Krishnarohith10

comment created time in a month

issue commenttensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

The model.fit is failing the same way as https://github.com/tensorflow/tensorflow/issues/38906, and we will send a fix very soon.

Bocharick

comment created time in a month

Pull request review commenttensorflow/tensorflow

bugfix: add a method to check penalty number availability

 def get_config(self):     """     raise NotImplementedError(str(self) + ' does not implement get_config()') +  def _check_penalty_number(self, x):

I think this doesn't necessary to be an instance method. It can be just a util function.

howl-anderson

comment created time in a month

issue commenttensorflow/tensorflow

Memory leak on TF2.1 model.fit with validation_split

@dannyfriar. Thanks for the update.

I updated your code to also create the model outside of the for loop, which is the more standard behavior. With that there isn't much memory leak.

Having said that, the memory leak still appears when validation_split is used. I will focus on that.

/Users/scottzhu/tf-2.2/bin/python /Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_17.py 2020-06-01 20:50:38.363168: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-06-01 20:50:38.385048: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x13b428490 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-06-01 20:50:38.385063: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version #--- Run 1 of 20 memory used (MB): 424.79616 #--- Run 2 of 20 memory used (MB): 431.009792 #--- Run 3 of 20 memory used (MB): 433.84832 #--- Run 4 of 20 memory used (MB): 434.60608 #--- Run 5 of 20 memory used (MB): 435.093504 #--- Run 6 of 20 memory used (MB): 436.195328 #--- Run 7 of 20 memory used (MB): 437.420032 #--- Run 8 of 20 memory used (MB): 437.837824 #--- Run 9 of 20 memory used (MB): 438.714368 #--- Run 10 of 20 memory used (MB): 439.410688 #--- Run 11 of 20 memory used (MB): 440.037376 #--- Run 12 of 20 memory used (MB): 440.000512 #--- Run 13 of 20 memory used (MB): 440.000512 #--- Run 14 of 20 memory used (MB): 439.865344 #--- Run 15 of 20 memory used (MB): 440.037376 #--- Run 16 of 20 memory used (MB): 441.430016 #--- Run 17 of 20 memory used (MB): 441.430016 #--- Run 18 of 20 memory used (MB): 441.28256 #--- Run 19 of 20 memory used (MB): 441.704448 #--- Run 20 of 20 memory used (MB): 441.704448

Process finished with exit code 0

dannyfriar

comment created time in a month

issue commenttensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

Thanks for reporting the issue. I think there are several issues we need to address here:

  1. The model is built with 2 inputs and 4 outputs. However, the training data only have 2 inputs and 2 outputs. This is causing the issue for Error log №2. After I removed 2 extra output when building the model, the model.fit() still failed with. I need to check with runtime team and see what's the root cause there.
2020-05-31 22:05:45.953509: W tensorflow/core/framework/op_kernel.cc:1760] OP_REQUIRES failed at variable_ops.cc:100 : Already exists: Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "/Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_15.py", line 73, in <module>
    model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1090, in fit
    tmp_logs = train_function(iterator)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7797]

Function call stack:
train_function
  1. I can also confirm that Error log №1 when tf.compat.v1.disable_eager_execution() is added with dropout and recurrent_dropout. It is probably related to runtime and how gradient is generated, which I need to confirm with runtime team again.
  1. It's OK for model to have more outputs than labels. In my case it's just two extra outputs for two previous classification outputs with simple argmax. So when I will use this trained model in production, argmax values will be counted as model output for me. Very useful. And when I start this training at TF 2.1.1 it has only one gentle warning:
WARNING:tensorflow:Output y_before_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_before_argmaxed.
WARNING:tensorflow:Output y_after_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_after_argmaxed.

Ok. I guess this might be regression since we refactor the training logic a bit between 2.1 and 2.2. Currently the code is expecting each output of the model should have a matching label.

@omalleyt12

Bocharick

comment created time in a month

issue closedtensorflow/tensorflow

Passing call arguments to individual layers of a model

<em>Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template</em>

System information

  • TensorFlow version (you are using): 2.1.0
  • Are you willing to contribute it (Yes/No): No

Describe the feature and the current behavior/state. There does not seem to be a way to pass call argument to individual layers of a model. For example I have been unable to find a way to pass initial_state call argument to the lstm layers inside a model. I explored multiple pathways and all failed.

Will this change the current api? How?

Who will benefit with this feature? everyone

Any Other info. Seems to be a simple fix

closed time in a month

imansb

issue commenttensorflow/tensorflow

Passing call arguments to individual layers of a model

I think you can build your model by subclassing tf.keras.Model like below:

import tensorflow as tf

class Model(tf.keras.Model):

  def __init__(self):
    super(Model, self).__init__()
    self.lstm = tf.keras.layers.LSTM(10)

  def call(self, inputs, initial_state):
    output = self.lstm(inputs, initial_state=initial_state)
    return output

model = Model()
model(tf.zeros((8, 2, 5)), initial_state=[tf.zeros((8, 10)), tf.zeros((8, 10))])
imansb

comment created time in a month

issue commenttensorflow/tensorflow

Missing positional argument error when deepcopy a LSTMCell

Thanks for reporting the issue. Will send a fix very soon.

guillaumekln

comment created time in a month

issue commenttensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

Btw, I think the issue is probably related to https://github.com/tensorflow/tensorflow/issues/38906, which we have same finding for tensorflow.python.framework.errors_impl.AlreadyExistsError.

Btw, disable_eager_execution() will probably cause some side effects in our code base, since it will fallback to some legacy behavior, which might not be recommended for current user. Do you really need eager mode turned off? or you are just trying that to see if it can walk around the issue?

Bocharick

comment created time in a month

issue commenttensorflow/tensorflow

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs..

Thanks for reporting the issue.

I think there are several issues we need to address here:

  1. The model is built with 2 inputs and 4 outputs. However, the training data only have 2 inputs and 2 outputs. This is causing the issue for Error log №2. After I removed 2 extra output when building the model, the model.fit() still failed with. I need to check with runtime team and see what's the root cause there.
2020-05-31 22:05:45.953509: W tensorflow/core/framework/op_kernel.cc:1760] OP_REQUIRES failed at variable_ops.cc:100 : Already exists: Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "/Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_15.py", line 73, in <module>
    model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1090, in fit
    tmp_logs = train_function(iterator)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7797]

Function call stack:
train_function
  1. I can also confirm that Error log №1 when tf.compat.v1.disable_eager_execution() is added with dropout and recurrent_dropout. It is probably related to runtime and how gradient is generated, which I need to confirm with runtime team again.
Bocharick

comment created time in a month

issue openedkeras-team/keras-tuner

Keras Tuner stopped at the end of first epoch when train on TPU

This is discovered during test of https://www.kaggle.com/kivlichangoogle/jigsaw-multilingual-getting-started. Note that this issue only happens when TPU is enabled for the notebook.

When Keras tuner is added, the search process will fail at the start of send epoch, when saving/loading checkpoint. I think the root cause is in TF and TPU, but a temp walk around in keras tuner will be nice, eg disable checkpointing if possible.

The error stack is like below:

---------------------------------------------------------------------------
UnimplementedError                        Traceback (most recent call last)
<ipython-input-14-a9734e750f48> in <module>
      6              verbose=1,
      7              validation_data=nonenglish_val_datasets['Combined'],
----> 8              validation_steps=100)

/opt/conda/lib/python3.7/site-packages/kerastuner/engine/base_tuner.py in search(self, *fit_args, **fit_kwargs)
    128 
    129             self.on_trial_begin(trial)
--> 130             self.run_trial(trial, *fit_args, **fit_kwargs)
    131             self.on_trial_end(trial)
    132         self.on_search_end()

/opt/conda/lib/python3.7/site-packages/kerastuner/engine/multi_execution_tuner.py in run_trial(self, trial, *fit_args, **fit_kwargs)
     94 
     95             model = self.hypermodel.build(trial.hyperparameters)
---> 96             history = model.fit(*fit_args, **copied_fit_kwargs)
     97             for metric, epoch_values in history.history.items():
     98                 if self.oracle.objective.direction == 'min':

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    817         max_queue_size=max_queue_size,
    818         workers=workers,
--> 819         use_multiprocessing=use_multiprocessing)
    820 
    821   def evaluate(self,

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py in fit(self, model, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_freq, max_queue_size, workers, use_multiprocessing, **kwargs)
    340                 mode=ModeKeys.TRAIN,
    341                 training_context=training_context,
--> 342                 total_epochs=epochs)
    343             cbks.make_logs(model, epoch_logs, training_result, ModeKeys.TRAIN)
    344 

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2.py in run_one_epoch(model, iterator, execution_function, dataset_size, batch_size, strategy, steps_per_epoch, num_samples, mode, training_context, total_epochs)
    126         step=step, mode=mode, size=current_batch_size) as batch_logs:
    127       try:
--> 128         batch_outs = execution_function(iterator)
    129       except (StopIteration, errors.OutOfRangeError):
    130         # TODO(kaftan): File bug about tf function and errors.OutOfRangeError?

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in execution_function(input_fn)
     96     # `numpy` translates Tensors to values in Eager mode.
     97     return nest.map_structure(_non_none_constant_value,
---> 98                               distributed_function(input_fn))
     99 
    100   return execution_function

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/util/nest.py in map_structure(func, *structure, **kwargs)
    566 
    567   return pack_sequence_as(
--> 568       structure[0], [func(*x) for x in entries],
    569       expand_composites=expand_composites)
    570 

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/util/nest.py in <listcomp>(.0)
    566 
    567   return pack_sequence_as(
--> 568       structure[0], [func(*x) for x in entries],
    569       expand_composites=expand_composites)
    570 

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/training_v2_utils.py in _non_none_constant_value(v)
    128 
    129 def _non_none_constant_value(v):
--> 130   constant_value = tensor_util.constant_value(v)
    131   return constant_value if constant_value is not None else v
    132

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/tensor_util.py in constant_value(tensor, partial)
    820   """
    821   if isinstance(tensor, ops.EagerTensor):
--> 822     return tensor.numpy()
    823   if not is_tensor(tensor):
    824     return tensor

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in numpy(self)
    940     """
    941     # TODO(slebedev): Consider avoiding a copy for non-CPU or remote tensors.
--> 942     maybe_arr = self._numpy()  # pylint: disable=protected-access
    943     return maybe_arr.copy() if isinstance(maybe_arr, np.ndarray) else maybe_arr
    944 

/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py in _numpy(self)
    908       return self._numpy_internal()
    909     except core._NotOkStatusException as e:
--> 910       six.raise_from(core._status_to_exception(e.code, e.message), None)
    911 
    912   @property

/opt/conda/lib/python3.7/site-packages/six.py in raise_from(value, from_value)

UnimplementedError: File system scheme '[local]' not implemented (file: 'keras-tuner-dir/jigsaw-multilingual/trial_5a9ddcb2f29ca8ba91966d1ca1862a84/checkpoints/epoch_0/checkpoint_temp_59890bc9cb8a40159f0a31dc22a070ff')
	Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.

created time in a month

pull request commenttensorflow/tensorflow

Fix an exception on using tf.keras.utils.Sequence with tf.distribute.MirroredStrategy under multi-GPU environment

Could u add a test case for that to showcase the failure in the multi-gpu's case?

debuggerD

comment created time in a month

issue commenttensorflow/tensorflow

Layer Names tf.keras.applications vs keras.applications not matching

Sure. Sorry for the late reply.

MiWeiss

comment created time in a month

IssuesEvent

issue closedtensorflow/tensorflow

[Proposal] use numpy.asarray_chkfinite in tf.keras.backend.cast_to_floatx

Maybe we can use numpy.asarray_chkfinite instead of numpy.asarray in tf.keras.backend.cast_to_floatx?

@keras_export('keras.backend.cast_to_floatx')
def cast_to_floatx(x):
  if isinstance(x, (ops.Tensor,
                    variables_module.Variable,
                    sparse_tensor.SparseTensor)):
    return math_ops.cast(x, dtype=floatx())
  return np.asarray(x, dtype=floatx())

It can help prevent user pass value which contains None, inf, np.nan or np.inf as input.

related to issue #37196

closed time in a month

howl-anderson

issue commenttensorflow/tensorflow

[Proposal] use numpy.asarray_chkfinite in tf.keras.backend.cast_to_floatx

Replied on #37196. See https://github.com/tensorflow/tensorflow/pull/37634#issuecomment-635080461 for more details.

howl-anderson

comment created time in a month

PR closed tensorflow/tensorflow

Reviewers
[Feature] use np.asarray_chkfinite in tf.keras.backend.cast_to_floatx awaiting review cla: yes comp:keras size:XS

Use numpy.asarray_chkfinite instead of numpy.asarray in tf.keras.backend.cast_to_floatx.

It can help prevent user pass value which contains None, inf, np.nan or np.inf as input.

related to issue #37196 fixes #37627

+1 -1

1 comment

1 changed file

howl-anderson

pr closed time in a month

pull request commenttensorflow/tensorflow

[Feature] use np.asarray_chkfinite in tf.keras.backend.cast_to_floatx

Thanks for sending the PR, and sorry for the late reply.

From the API perspective, I think we probably want to keep the existing behavior. Casting np array with NaN in it might be a legit use case. For the particular issue #37196, I think the proper fix is to add check in L2 regularizer.

I am closing this PR, and feel free to send new PR for the regularizer fix.

Thanks.

howl-anderson

comment created time in a month

Pull request review commenttensorflow/addons

Zoneout LSTM Cell

 /tensorflow_addons/optimizers/yogi.py @manzilz /tensorflow_addons/optimizers/tests/yogi_test.py @manzilz -/tensorflow_addons/rnn/cell.py @qlzh727 @pedrolarben-/tensorflow_addons/rnn/tests/cell_test.py @qlzh727 @pedrolarben+/tensorflow_addons/rnn/cell.py @qlzh727 @pedrolarben @failure-to-thrive

I think we probably want to split the file in future, so that the PR/issue will be forward to the correct person.

failure-to-thrive

comment created time in a month

Pull request review commenttensorflow/addons

Zoneout LSTM Cell

 def get_config(self):         }         base_config = super().get_config()         return {**base_config, **config}+++@tf.keras.utils.register_keras_serializable(package="Addons")+class ZoneoutLSTMCell(keras.layers.LSTMCell):+    """LSTM cell with recurrent zoneout.++    https://arxiv.org/abs/1606.01305+    """++    @typechecked+    def __init__(+        self,+        units: int,+        zoneout_h: float = 0,+        zoneout_c: float = 0,+        seed: int = None,+        **kwargs+    ):+        """+        """+        super().__init__(units, **kwargs)+        self.zoneout_h = zoneout_h+        self.zoneout_c = zoneout_c+        self.seed = seed++    def _zoneout(self, t, tm1, rate, training):+        dt = tf.cast(+            tf.random.uniform(t.shape, seed=self.seed) >= rate * training, t.dtype+        )+        return dt * t + (1 - dt) * tm1++    def call(self, inputs, states, training=None):+        if training is None:+            training = keras.backend.learning_phase()+        output, new_states = super().call(inputs, states, training)

Since the output is not used here, probably replace it with _.

failure-to-thrive

comment created time in a month

Pull request review commenttensorflow/addons

Zoneout LSTM Cell

 def get_config(self):         }         base_config = super().get_config()         return {**base_config, **config}+++@tf.keras.utils.register_keras_serializable(package="Addons")+class ZoneoutLSTMCell(keras.layers.LSTMCell):+    """LSTM cell with recurrent zoneout.++    https://arxiv.org/abs/1606.01305+    """++    @typechecked+    def __init__(+        self,+        units: int,+        zoneout_h: float = 0,+        zoneout_c: float = 0,+        seed: int = None,+        **kwargs+    ):+        """+        """+        super().__init__(units, **kwargs)+        self.zoneout_h = zoneout_h+        self.zoneout_c = zoneout_c+        self.seed = seed++    def _zoneout(self, t, tm1, rate, training):+        dt = tf.cast(+            tf.random.uniform(t.shape, seed=self.seed) >= rate * training, t.dtype

Please note that training can be a tensor, which means if check will yield error. In the case you want to disable the zone out in the inference mode, please use tf.cond. Please check the keras.backend.dropout as an example.

failure-to-thrive

comment created time in a month

issue commenttensorflow/tensorflow

Bidirectional LSTM fail on TF2.0

I think there is some issue randomly happening the cudnn kernel, and yet I still don't have a stable way to reproduce the issue.

@kaixih, could u check this more on Nvidia side? I don't have much action to take on my end.

PierrePivert

comment created time in a month

issue commenttensorflow/tensorflow

Request to have ConvLSTM2D for TFLite

I think the code failed since it is trying to use keras rather than tf.keras. The implementation detail for tf.keras and keras has been very different, and can't be interchanged between each other.

I have tried the code with nightly tf.keras and it worked.

beniroquai

comment created time in a month

fork qlzh727/keras-tuner

Hyperparameter tuning for humans

fork in a month

issue commenttensorflow/tensorflow

LSTM, cudnn and masking

Thanks @kaixih for the findings. Please let us know if and when this will be fixed on the nvidia side.

gargargar

comment created time in a month

issue commenttensorflow/tensorflow

Weird block of RNN in TF2.2

Thanks for reporting the issue. I think there is some regression/corner case in the tf.function. I will check with core team first and rely.

BlueFisher

comment created time in a month

issue closedtensorflow/tensorflow

Different behavior tf.keras and Keras for `stateful=True`

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): All platforms (tested on Ubuntu 18.04 and macOS)
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): 2.1
  • Python version: 3.8

Describe the current behavior The original Keras API specifies that, when an LSTM is set stateful=True, it's batch size must be known beforehand (by specifying batch_shape). The same is true for tf.keras, but it adds another hidden requirement that was not there in the original Keras: tf.keras requires that the full input shape (including batch size) is known. If one of the dimensions is None, it emits the "If a RNN is stateful, it needs to know its batch size." error.

Describe the expected behavior As with the Keras API, it should be allowed to have None dimensions besides the batch_size.

Code to reproduce the issue Keras:

from keras.models import Model
from keras.layers import Input, LSTM, Reshape

def model():
    input_layer = Input(batch_shape=(1, None))
    reshape_layer = Reshape((1, 100))(input_layer)
    lstm_layer = LSTM(units=100, stateful=True)(reshape_layer)
    return Model(inputs=input_layer, outputs=lstm_layer)

model = model()

# Code runs perfectly fine.

tf.keras:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Reshape

def model():
    input_layer = Input(batch_shape=(1, None))
    reshape_layer = Reshape((1, 100))(input_layer)
    lstm_layer = LSTM(units=100, stateful=True)(reshape_layer)
    return Model(inputs=input_layer, outputs=lstm_layer)

model = model()

"""
ValueError: If a RNN is stateful, it needs to know its batch size. Specify the batch size of your input tensors: 
- If using a Sequential model, specify the batch size by passing a `batch_input_shape` argument to your first layer.
- If using the functional API, specify the batch size by passing a `batch_shape` argument to your Input layer.
"""

There are some very legitimate use cases for allowing non-batch dimensions to be unknown! This change in functionality prevents me from migrating a (variable) multi-stream CNN model from Keras to tf.kensor.

closed time in a month

gerwin3

issue commenttensorflow/tensorflow

Different behavior tf.keras and Keras for `stateful=True`

This should now be fixed.

gerwin3

comment created time in a month

issue commenttensorflow/tensorflow

LSTM, cudnn and masking

Thanks @kaixih for the debug, let me add @guptapriya from distribution strategy team.

btw, when the error occurs with CUDNN_STATUS_EXECUTION_FAILED, what's the error detail from the cudnn kernel?

gargargar

comment created time in a month

issue commenttensorflow/tensorflow

LSTM, cudnn and masking

Seems to be a cudnn kernel issue, which might be related to https://github.com/tensorflow/tensorflow/issues/33148.

Adding @kaixih from Nvidia side for more insights.

gargargar

comment created time in 2 months

issue closedtensorflow/tensorflow

TimeDistributed(Dropout()) with the same dropout mask

System information

  • TensorFlow version (you are using): 1.14
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Here is an example block of my code. I am trying to apply a time distributed dropout to the output of a many to many GRU. I would like to keep the dropout to have the same dropout mask for all time steps. However, I did not find a solution to this purpose based on the current API. Did I miss anything or is it a new feature on the roadmap? Thanks a lot!

from tensorflow.keras.layers import Dense, Input, GRU, Dropout, TimeDistributed
x= TimeDistributed(Dense(512, activation='relu', kernel_regularizer=l2(1e-5), \
                bias_regularizer=l2(1e-5), name='cam_fc'))(input_tensor)
out = GRU(
                512,
                dropout=0.1,
                recurrent_dropout=0.1,
                activation='relu', 
                kernel_regularizer=l2(1e-5),
                bias_regularizer=l2(1e-5),
                return_sequences=True, 
                name='intentNet_gru')(x, training=self.is_train)

out = TimeDistributed(Dropout(0.1))(out, training=self.is_train)

closed time in 2 months

bzhong2
more