profile
viewpoint

Ask questionsDense does not flatten inputs with rank >2 and behaves exactly like TimeDistributed(Dense)

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.6
  • TensorFlow installed from (source or binary): from pip install
  • TensorFlow version (use command below): v2.0.0-beta0-16-g1d91213fe7 2.0.0-beta1
  • Python version: v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14

Describe the current behavior A note in Dense documentation says that

Note: If the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

I don't see this happening in real life. Instead, Dense behaves on a 3-rank tensor as it would behave if it was wrapped in a TimeDistributed layer, making me question the utility of TimeDistributed at all.

Describe the expected behavior Dense should flatten its input like the documentation says. In the first example bellow, the shape of the kernel weights of dense should be (5 * 3, 2) = (15, 2) instead of (3, 2), which is the shape of dense2 (as expected in the case of dense2).

Code to reproduce the issue

First example:

import tensorflow as tf
import numpy as np

print('Using Tensorflow version {} (git version {})'.format(tf.version.VERSION, tf.version.GIT_VERSION))

tf.random.set_seed(12)
np.random.seed(12)

init = tf.keras.initializers.GlorotUniform(seed=12)

inp = tf.constant(np.random.normal(0, 1, (1, 5, 6)))
inp = tf.cast(inp, dtype=tf.float32)

gru = tf.keras.layers.GRU(3, return_sequences=True)(inp)
print(gru.shape)
#(1, 5, 3)

dense = tf.keras.layers.Dense(2, kernel_initializer=init, bias_initializer=init)
print(dense(gru))
#tf.Tensor(
#[[[ 1.5456871  -0.5280464 ]
#  [ 0.11647969 -0.20553198]
#  [ 0.58126366 -0.16031623]
#  [-0.22882831 -0.22649539]
#  [ 0.62777793 -0.32470667]]], shape=(1, 5, 2), dtype=float32)

for w in dense.weights:
    print(w.shape)
#(3, 2) instead of (5 * 3, 2) if Dense indeed flattened its input
#(2,)

tddense = tf.keras.layers.TimeDistributed(dense)
print(tddense(gru))
#tf.Tensor(
#[[[ 1.5456871  -0.5280464 ]
#  [ 0.11647969 -0.20553198]
#  [ 0.58126366 -0.16031623]
#  [-0.22882831 -0.22649539]
#  [ 0.62777793 -0.32470667]]], shape=(1, 5, 2), dtype=float32)
# if Dense kernel had shape (15, 2), this should result in the following error:
# InvalidArgumentError: Matrix size-incompatible: In[0]: [5,3], In[1]: [15,2] [Op:MatMul]
# but instead what we get is the same output
# than without TimeDistributed, without error

dense2 = tf.keras.layers.Dense(2, kernel_initializer=init, bias_initializer=init)
tddense = tf.keras.layers.TimeDistributed(dense2)
print(tddense(gru))
#tf.Tensor(
#[[[ 1.5456871  -0.5280464 ]
#  [ 0.11647969 -0.20553198]
#  [ 0.58126366 -0.16031623]
#  [-0.22882831 -0.22649539]
#  [ 0.62777793 -0.32470667]]], shape=(1, 5, 2), dtype=float32)

for w in dense2.weights:
    print(w.shape)
#(3, 2) as expected
#(2,)

Second example, with a rank even larger than 3:

import tensorflow as tf

print('Using Tensorflow version {} (git version {})'.format(tf.version.VERSION, tf.version.GIT_VERSION))

inp = tf.keras.Input(shape=(10, 25, 25, 3))
dense_layer1 = tf.keras.layers.Dense(78)
x = dense_layer1(inp)
print('Output shape without TimeDistributed:')
print(x.shape)

dense_layer2 = tf.keras.layers.Dense(78)
y=tf.keras.layers.TimeDistributed(dense_layer2)(inp)
print('Output shape with TimeDistributed:')
print(y.shape)

print('Weight shapes without TimeDistributed:')
for weight in dense_layer1.trainable_weights:
    if len(weight.shape) == 2:
        print('    kernel shape:')
    else:
        print('    bias shape:')
    print(weight.shape)
    
print('Weight shapes with TimeDistributed:')
for weight in dense_layer2.trainable_weights:
    if len(weight.shape) == 2:
        print('    kernel shape:')
    else:
        print('    bias shape:')
    print(weight.shape)

which outputs is:

Using Tensorflow version 2.0.0-beta1 (git version v2.0.0-beta0-16-g1d91213fe7)
Output shape without TimeDistributed:
(None, 10, 25, 25, 78)
Output shape with TimeDistributed:
(None, 10, 25, 25, 78)
Weight shapes without TimeDistributed:
    kernel shape:
(3, 78)
    bias shape:
(78,)
Weight shapes with TimeDistributed:
    kernel shape:
(3, 78)
    bias shape:
(78,)

We see, in this example, that Dense and TimeDistributed(Dense) behave the same in that they only touch to the last dimension of the input.

tensorflow/tensorflow

Answer questions pavithrasv

They are the same, TimeDistributed doesn't just apply to Dense layers but i see that the main example in the TimeDistributed docs is using Dense layer. i'll update that.

useful!

Related questions

ModuleNotFoundError: No module named 'tensorflow.contrib'
Error occurred when finalizing GeneratorDataset iterator
ModuleNotFoundError: No module named 'tensorflow.contrib'
When importing TensorFlow, error loading Hadoop hot 4
The flag 'log_dir' is defined twice. hot 3
[TF 2.0] Dataset has no attribute 'make_one_shot_iterator' hot 3
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning. hot 3
TF2.0 AutoGraph issue hot 3
Error loading tensorflow hot 3
AttributeError: module 'tensorflow' has no attribute 'set_random_seed' hot 3
AttributeError: module 'tensorflow' has no attribute 'Session' hot 3
No tf.lite.experimental.nn.bidirectional_dynamic_rnn ops is finded hot 3
AttributeError: module 'tensorflow' has no attribute 'app' hot 3
Incorrect Error TypeError: padded_batch() missing 1 required positional argument: 'padded_shapes' hot 3
tensorflow2.0 detected 'xla_gpu' , but 'gpu' expected hot 2
Github User Rank List