profile
viewpoint

rachellim/docs 0

TensorFlow documentation

rachellim/tensorflow 0

An Open Source Machine Learning Framework for Everyone

issue closedtensorflow/tensorflow

tf.data.experimental.make_csv_dataset header flag not working as described

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Custom code
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
os kernel version: #22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019
os release version: 5.0.0-21-generic
os platform: Linux-5.0.0-21-generic-x86_64-with-debian-buster-sid
linux distribution: ('debian', 'buster/sid', '')
linux os distribution: ('debian', 'buster/sid', '')
mac version: ('', ('', '', ''), '')
uname: uname_result(system='Linux', node='dev-XPS-13-9343', release='5.0.0-21-generic', version='#22-Ubuntu SMP Tue Jul 2 13:27:33 UTC 2019', machine='x86_64', processor='x86_64')
architecture: ('64bit', '')
machine: x86_64
GNU/Linux
  • TensorFlow installed from (source or binary): binary using pip
  • TensorFlow version (use command below):
Version: 2.0.0b1
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /home/dev/miniconda3/envs/pythonapu/lib/python3.6/site-packages
Required-by: 
  • Python version:
Python 3.6.8 :: Anaconda, Inc.
python version: 3.6.8
python branch: 
python build version: ('default', 'Dec 30 2018 01:22:34')
python compiler version: GCC 7.3.0
python implementation: CPython

Describe the current behavior

tf.data.experimental.make_csv_dataset(header=True) includes the header data in the dataset

Describe the expected behavior

According to the docs:

header: A bool that indicates whether the first rows of provided CSV files
      correspond to header lines with column names, and should not be included
      in the data.

The data should not be included in the data.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

I have a dataset in csv /tmp/foo.csv:

A,B,C
1,NA,1
1,NA,1
1,NA,1
1,NA,1

I can run something like with header=True

 dataset_file = tf.keras.utils.get_file("foo2" + str(uuid.uuid4()) + ".csv", "file:///tmp/foo.csv")
dataset = tf.data.experimental.make_csv_dataset(
        dataset_file, batch_size=4, header=False,
        label_name="A", na_value='NA', column_names=["A", "B", "C"],
        field_delim=',')
    for feature_batch, label_batch in dataset.take(1):
        print(label_batch)
        print("features:")
        for key, value in feature_batch.items():
            print(key + ' ' + value)

which gives:

Traceback (most recent call last):
  File "/home/xyz/workspace/pythonapi/main/services/dataloader.py", line 149, in <module>
    load()
  File "/home/xyz/workspace/pythonapi/main/services/dataloader.py", line 143, in load
    print(key + ' ' + value)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 909, in r_binary_op_wrapper
    x = ops.convert_to_tensor(x, dtype=y.dtype.base_dtype, name="x")
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1100, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1158, in convert_to_tensor_v2
    as_ref=False)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1237, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 305, in _constant_tensor_conversion_function
    return constant(v, dtype=dtype, name=name)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 246, in constant
    allow_broadcast=True)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 254, in _constant_impl
    t = convert_to_eager_tensor(value, ctx, dtype)
  File "/home/xyz/miniconda3/envs/pythonapu/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 115, in convert_to_eager_tensor
    return ops.EagerTensor(value, handle, device, dtype)
TypeError: Cannot convert provided value to EagerTensor. Provided value: C  Requested dtype: int32

When C should not be in the dataset.

Other info / logs

With header=False

works fine but not as described in the documentation as this suggests the csv does not contain the header and prints

tf.Tensor([b'1' b'1' b'1' b'A'], shape=(4,), dtype=string)
features:
tf.Tensor([b'B ' b'B ' b'B ' b'B B'], shape=(4,), dtype=string)
tf.Tensor([b'C 1' b'C 1' b'C 1' b'C C'], shape=(4,), dtype=string)

closed time in a day

gridcellcoder

issue commenttensorflow/tensorflow

tf.data.experimental.make_csv_dataset header flag not working as described

Apologies for the delayed response. Since you don't provide the column_defaults parameter, make_csv_dataset tries to infer the type of your data.

When the header=True flag is set, it ignores the first line of data. Therefore, it infers that column A contains ints, B contains strings, C contains ints. The issue here is that in line

print(key + ' ' + value)

key is a string tensor, and value is an int tensor (in the A and C case). These types cannot be added together, which is why you get the above error.

In the case where header=False, the first (header) line is not ignored. If you read the documentation, the header parameter is "A bool that indicates whether the first rows of provided CSV files correspond to header lines with column names, and should not be included in the data." -- so when header=False, the header line IS included in the data, i.e. it interprets the header line as regular data that's part of your dataset. Because this line has the data "A,B,C", it infers that all your columns are strings. Since string tensors can be added together, you don't get an error.

Does that make sense?

gridcellcoder

comment created time in a day

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

@Nixon59-lab, you're encountering a different issue (https://github.com/tensorflow/tensorflow/issues/34469) that has since been fixed at head. You might want to try the tf nightly build. This will also be fixed in 2.2.

dineshdharme

comment created time in 2 days

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

@dartlune -- please see my comment above (https://github.com/tensorflow/tensorflow/issues/24520#issuecomment-577325475). Does the workaround of using set_shape help?

dineshdharme

comment created time in 4 days

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

Update: Based on talking to @robieta, keras expects its inputs to have at least known rank (even if dimensions are unknown). So, if the dataset has components with unknown rank, keras will not work.

In some cases, tf.data is not able to statically infer the rank of its outputs (e.g. if you use a py_func), so you have to manually use set_shape to tell the dataset what shapes its outputs are, as @adriancaruana suggested in https://github.com/tensorflow/tensorflow/issues/24520#issuecomment-532958834. Note that you don't have to know the shape fully, you just need to know the number of dimensions. So, you could do something like:

def map_fn(x):
  result_tensor = ...
  result_tensor.set_shape([None for _ in range(rank)])
  return result_tensor

Does this resolve the issue for all who've encountered it?

On the keras side, we should surface a more informative error. @karmel , can you reassign this to someone on the keras team to surface a more informative error message when the input shapes are of unknown rank?

dineshdharme

comment created time in a month

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

There are two separate issues here. @rodyt, I think you're encountering an issue with datasets with unknown shape + distribution strategy, which we've fixed (#34469) at head. For other people, a temporary workaround while we work on a fix is using set_shape if you know the output shapes, as @adriancaruana pointed out.

dineshdharme

comment created time in a month

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

Thanks for flagging this. Looks like we don't check the shape correctly in several different places -- fix in progress.

dineshdharme

comment created time in a month

Pull request review commenttensorflow/tensorflow

Usage example using IRIS dataset to tf.data.experimental.make_csv_dataset()

 def make_csv_dataset_v2(   tuple that corresponds to a batch of CSV rows. The features dictionary   maps feature column names to `Tensor`s containing the corresponding   feature data, and labels is a `Tensor` containing the batch's label data.-+  +  Usage Example:+  +  Using IRIS dataset to show how to convert .csv file into a dataset.+  +  ```python

Ah, I wasn't aware that >>> == doctest earlier, and requested for the author to use backtick notation for consistency with the rest of the file. Apologies for the confusion.

boronhub

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Usage example using IRIS dataset to tf.data.experimental.make_csv_dataset()

 def make_csv_dataset_v2(   tuple that corresponds to a batch of CSV rows. The features dictionary   maps feature column names to `Tensor`s containing the corresponding   feature data, and labels is a `Tensor` containing the batch's label data.-+  +  Usage Example:+  +  Using IRIS dataset to show how to convert .csv file into a dataset.+  +    ```python

this code block should be at the same indentation level as line 346

boronhub

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Usage example using IRIS dataset to tf.data.experimental.make_csv_dataset()

 def make_csv_dataset_v2(   tuple that corresponds to a batch of CSV rows. The features dictionary   maps feature column names to `Tensor`s containing the corresponding   feature data, and labels is a `Tensor` containing the batch's label data.-+  +  Usage Example:+  +  Using IRIS dataset to show how to convert .csv file into a dataset.+  +    ```python+    train_dataset_url = "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"+    train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url), origin=train_dataset_url)++    column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']+    feature_names = column_names[:-1]+    label_name = column_names[-1]+     +    batch_size = 32+    train_dataset = tf.data.experimental.make_csv_dataset(+      train_dataset_fp,+      batch_size,+      column_names=column_names,+      label_name=label_name,+      num_epochs=1)+    features_batch, labels_batch = next(iter(train_dataset))+    print(features_batch)+    +    <OrderedDict([(..., <tf.Tensor: shape=(32,), dtype=float32, numpy=array([...], dtype=float32)>)])>

Use >> for this output line

boronhub

comment created time in 2 months

startedmdda/colab_helper

started time in 2 months

pull request commenttensorflow/tensorflow

Usage example using IRIS dataset to tf.data.experimental.make_csv_dataset()

I mean in the style of the other example code in the file, for example. Thanks!

boronhub

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Usage example using IRIS dataset to tf.data.experimental.make_csv_dataset()

 def make_csv_dataset_v2(   tuple that corresponds to a batch of CSV rows. The features dictionary   maps feature column names to `Tensor`s containing the corresponding   feature data, and labels is a `Tensor` containing the batch's label data.+  +  Usage Example:+  +  Using IRIS dataset to show how to convert .csv file into a dataset.+  +  >>> train_dataset_url = "https://storage.googleapis.com/download.tensorflow.org/data/iris_training.csv"+  >>> train_dataset_fp = tf.keras.utils.get_file(fname=os.path.basename(train_dataset_url), origin=train_dataset_url)+  >>>+  >>> column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']+  >>> feature_names = column_names[:-1]+  >>> label_name = column_names[-1]+  >>> +  >>> batch_size = 32+  >>> train_dataset = tf.data.experimental.make_csv_dataset(+  ...   train_dataset_fp,+  ...   batch_size,+  ...   column_names=column_names,+  ...   label_name=label_name,+  ...   num_epochs=1)+  >>> features_batch, labels_batch = next(iter(train_dataset))+  >>> print(features_batch)+  <OrderedDict([('sepal_length', <tf.Tensor: shape=(32,), dtype=float32, numpy=

+1

boronhub

comment created time in 2 months

issue commenttensorflow/tensorflow

ValueError with tf.data.Dataset and model.fit and tf.distribute

The issue here seems to be that your dataset has some elements with (completely) unknown shape, and we fail to handle that case correctly. Thanks for flagging this, I have a fix in review.

Side note: I don't expect the following code to work:

    # Set output shapes, types, and classes
    dataset.output_classes = (
        (tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor),
        (tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor, tf.Tensor),
    )
    dataset.output_types = (
        (tf.dtypes.string, tf.dtypes.int64, tf.dtypes.int64, tf.dtypes.uint8, tf.dtypes.uint8,),
        (tf.dtypes.string, tf.dtypes.int64, tf.dtypes.int64, tf.dtypes.float32, tf.dtypes.float32,),
    )
    dataset.output_shapes = (
        ([], [], [], tf.TensorShape([None, None, 3]), tf.TensorShape([None, None, 3]),),
        ([], [], [], tf.TensorShape([None, None, 1]), tf.TensorShape([None, None, 1]),),
    )

You'd get an error if you try to set dataset output shapes/types explicitly... unless you meant this as pseudocode?

Bidski

comment created time in 2 months

issue closedtensorflow/tensorflow

tf.data.Dataset fixed size batching with subsequent map() under tf.distribute.MirroredStrategy leads to a crash

System information The same environment as in https://github.com/tensorflow/tensorflow/issues/33531

Code to reproduce the issue

It took me a few weeks of debugging to reproduce! IMPORTANT: I DO NOT THINK IT WILL REPRODUCE IN COLAB, YOU NEED AT LEAST 2 GPUS.

#!/usr/bin/env python3
import sys
import tensorflow as tf

def main():
    strategy = tf.distribute.MirroredStrategy()
    batch_size = 12
    features_shape = 372, 558, 3
    labels = 10
    sample = tf.random.uniform(features_shape)

    def batch_print(b, l):
        tf.print("shape", b.shape, tf.shape(b))
        tf.print(b[10])  # <<< crash here
        return b, l

    ds_train = tf.data.Dataset.from_tensors([sample]).map(lambda s: (tf.squeeze(s), tf.ones((labels,)))) \
        .repeat().batch(batch_size, drop_remainder=True).map(batch_print)
    ds_val = tf.data.Dataset.from_tensors([sample]).map(lambda s: (tf.squeeze(s), tf.ones((labels,)))) \
        .repeat().batch(batch_size, drop_remainder=True).take(10)

    import tensorflow_core.python.keras.backend
    original_input = tensorflow_core.python.keras.layers.Input

    def create_input(*args, **kwargs):
        return original_input(*args, batch_size=batch_size, **kwargs)

    # monkey-patch the input layer to ensure the fixed tensor shape
    tensorflow_core.python.keras.layers.Input = create_input

    with strategy.scope():
        model = tf.keras.applications.DenseNet121(
            weights=None, input_shape=features_shape, classes=labels)
        model.build((batch_size,) + features_shape)
        model.summary()
        optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.001)
        cross_entropy = tf.keras.losses.CategoricalCrossentropy(label_smoothing=0.1)
        model.compile(optimizer=optimizer, loss=cross_entropy, metrics=["accuracy"])
    model.fit(ds_train, validation_data=ds_val, epochs=1, steps_per_epoch=100)


if __name__ == "__main__":
    sys.exit(main())

As you see, I am feeding a tf.data.Dataset pipeline to a Keras model under tf.distribute.MirroredStrategy. In my case, there are 4 GPUs. Here is the log which indicates a crash:

<details> <summary>Full log</summary> <pre> 2019-11-06 11:09:37.077575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1, 2, 3 2019-11-06 11:09:37.077858: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutorwith strength 1 edge matrix: 2019-11-06 11:09:37.077880: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 1 2 3 2019-11-06 11:09:37.077894: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N Y N N 2019-11-06 11:09:37.077904: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 1: Y N N N 2019-11-06 11:09:37.077914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 2: N N N Y 2019-11-06 11:09:37.077923: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 3: N N Y N 2019-11-06 11:09:37.084775: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:0 with 10470 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1) 2019-11-06 11:09:37.086075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:1 with 10470 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0, compute capability: 6.1) 2019-11-06 11:09:37.087140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:2 with 10470 MB memory) -> physical GPU (device: 2, name: GeForce GTX 1080 Ti, pci bus id: 0000:82:00.0, compute capability: 6.1) 2019-11-06 11:09:37.088126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/device:GPU:3 with 10470 MB memory) -> physical GPU (device: 3, name: GeForce GTX 1080 Ti, pci bus id: 0000:83:00.0, compute capability: 6.1) Model: "densenet121"


Layer (type) Output Shape Param # Connected to

input_1 (InputLayer) [(3, 372, 558, 3)] 0


zero_padding2d (ZeroPadding2D) (3, 378, 564, 3) 0 input_1[0][0]


conv1/conv (Conv2D) (3, 186, 279, 64) 9408 zero_padding2d[0][0]


conv1/bn (BatchNormalization) (3, 186, 279, 64) 256 conv1/conv[0][0]


conv1/relu (Activation) (3, 186, 279, 64) 0 conv1/bn[0][0]


zero_padding2d_1 (ZeroPadding2D (3, 188, 281, 64) 0 conv1/relu[0][0]


pool1 (MaxPooling2D) (3, 93, 140, 64) 0 zero_padding2d_1[0][0]


conv2_block1_0_bn (BatchNormali (3, 93, 140, 64) 256 pool1[0][0]


conv2_block1_0_relu (Activation (3, 93, 140, 64) 0 conv2_block1_0_bn[0][0]


conv2_block1_1_conv (Conv2D) (3, 93, 140, 128) 8192 conv2_block1_0_relu[0][0]


conv2_block1_1_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block1_1_conv[0][0]


conv2_block1_1_relu (Activation (3, 93, 140, 128) 0 conv2_block1_1_bn[0][0]


conv2_block1_2_conv (Conv2D) (3, 93, 140, 32) 36864 conv2_block1_1_relu[0][0]


conv2_block1_concat (Concatenat (3, 93, 140, 96) 0 pool1[0][0] conv2_block1_2_conv[0][0]


conv2_block2_0_bn (BatchNormali (3, 93, 140, 96) 384 conv2_block1_concat[0][0]


conv2_block2_0_relu (Activation (3, 93, 140, 96) 0 conv2_block2_0_bn[0][0]


conv2_block2_1_conv (Conv2D) (3, 93, 140, 128) 12288 conv2_block2_0_relu[0][0]


conv2_block2_1_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block2_1_conv[0][0]


conv2_block2_1_relu (Activation (3, 93, 140, 128) 0 conv2_block2_1_bn[0][0]


conv2_block2_2_conv (Conv2D) (3, 93, 140, 32) 36864 conv2_block2_1_relu[0][0]


conv2_block2_concat (Concatenat (3, 93, 140, 128) 0 conv2_block1_concat[0][0] conv2_block2_2_conv[0][0]


conv2_block3_0_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block2_concat[0][0]


conv2_block3_0_relu (Activation (3, 93, 140, 128) 0 conv2_block3_0_bn[0][0]


conv2_block3_1_conv (Conv2D) (3, 93, 140, 128) 16384 conv2_block3_0_relu[0][0]


conv2_block3_1_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block3_1_conv[0][0]


conv2_block3_1_relu (Activation (3, 93, 140, 128) 0 conv2_block3_1_bn[0][0]


conv2_block3_2_conv (Conv2D) (3, 93, 140, 32) 36864 conv2_block3_1_relu[0][0]


conv2_block3_concat (Concatenat (3, 93, 140, 160) 0 conv2_block2_concat[0][0] conv2_block3_2_conv[0][0]


conv2_block4_0_bn (BatchNormali (3, 93, 140, 160) 640 conv2_block3_concat[0][0]


conv2_block4_0_relu (Activation (3, 93, 140, 160) 0 conv2_block4_0_bn[0][0]


conv2_block4_1_conv (Conv2D) (3, 93, 140, 128) 20480 conv2_block4_0_relu[0][0]


conv2_block4_1_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block4_1_conv[0][0]


conv2_block4_1_relu (Activation (3, 93, 140, 128) 0 conv2_block4_1_bn[0][0]


conv2_block4_2_conv (Conv2D) (3, 93, 140, 32) 36864 conv2_block4_1_relu[0][0]


conv2_block4_concat (Concatenat (3, 93, 140, 192) 0 conv2_block3_concat[0][0] conv2_block4_2_conv[0][0]


conv2_block5_0_bn (BatchNormali (3, 93, 140, 192) 768 conv2_block4_concat[0][0]


conv2_block5_0_relu (Activation (3, 93, 140, 192) 0 conv2_block5_0_bn[0][0]


conv2_block5_1_conv (Conv2D) (3, 93, 140, 128) 24576 conv2_block5_0_relu[0][0]


conv2_block5_1_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block5_1_conv[0][0]


conv2_block5_1_relu (Activation (3, 93, 140, 128) 0 conv2_block5_1_bn[0][0]


conv2_block5_2_conv (Conv2D) (3, 93, 140, 32) 36864 conv2_block5_1_relu[0][0]


conv2_block5_concat (Concatenat (3, 93, 140, 224) 0 conv2_block4_concat[0][0] conv2_block5_2_conv[0][0]


conv2_block6_0_bn (BatchNormali (3, 93, 140, 224) 896 conv2_block5_concat[0][0]


conv2_block6_0_relu (Activation (3, 93, 140, 224) 0 conv2_block6_0_bn[0][0]


conv2_block6_1_conv (Conv2D) (3, 93, 140, 128) 28672 conv2_block6_0_relu[0][0]


conv2_block6_1_bn (BatchNormali (3, 93, 140, 128) 512 conv2_block6_1_conv[0][0]


conv2_block6_1_relu (Activation (3, 93, 140, 128) 0 conv2_block6_1_bn[0][0]


conv2_block6_2_conv (Conv2D) (3, 93, 140, 32) 36864 conv2_block6_1_relu[0][0]


conv2_block6_concat (Concatenat (3, 93, 140, 256) 0 conv2_block5_concat[0][0] conv2_block6_2_conv[0][0]


pool2_bn (BatchNormalization) (3, 93, 140, 256) 1024 conv2_block6_concat[0][0]


pool2_relu (Activation) (3, 93, 140, 256) 0 pool2_bn[0][0]


pool2_conv (Conv2D) (3, 93, 140, 128) 32768 pool2_relu[0][0]


pool2_pool (AveragePooling2D) (3, 46, 70, 128) 0 pool2_conv[0][0]


conv3_block1_0_bn (BatchNormali (3, 46, 70, 128) 512 pool2_pool[0][0]


conv3_block1_0_relu (Activation (3, 46, 70, 128) 0 conv3_block1_0_bn[0][0]


conv3_block1_1_conv (Conv2D) (3, 46, 70, 128) 16384 conv3_block1_0_relu[0][0]


conv3_block1_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block1_1_conv[0][0]


conv3_block1_1_relu (Activation (3, 46, 70, 128) 0 conv3_block1_1_bn[0][0]


conv3_block1_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block1_1_relu[0][0]


conv3_block1_concat (Concatenat (3, 46, 70, 160) 0 pool2_pool[0][0] conv3_block1_2_conv[0][0]


conv3_block2_0_bn (BatchNormali (3, 46, 70, 160) 640 conv3_block1_concat[0][0]


conv3_block2_0_relu (Activation (3, 46, 70, 160) 0 conv3_block2_0_bn[0][0]


conv3_block2_1_conv (Conv2D) (3, 46, 70, 128) 20480 conv3_block2_0_relu[0][0]


conv3_block2_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block2_1_conv[0][0]


conv3_block2_1_relu (Activation (3, 46, 70, 128) 0 conv3_block2_1_bn[0][0]


conv3_block2_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block2_1_relu[0][0]


conv3_block2_concat (Concatenat (3, 46, 70, 192) 0 conv3_block1_concat[0][0] conv3_block2_2_conv[0][0]


conv3_block3_0_bn (BatchNormali (3, 46, 70, 192) 768 conv3_block2_concat[0][0]


conv3_block3_0_relu (Activation (3, 46, 70, 192) 0 conv3_block3_0_bn[0][0]


conv3_block3_1_conv (Conv2D) (3, 46, 70, 128) 24576 conv3_block3_0_relu[0][0]


conv3_block3_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block3_1_conv[0][0]


conv3_block3_1_relu (Activation (3, 46, 70, 128) 0 conv3_block3_1_bn[0][0]


conv3_block3_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block3_1_relu[0][0]


conv3_block3_concat (Concatenat (3, 46, 70, 224) 0 conv3_block2_concat[0][0] conv3_block3_2_conv[0][0]


conv3_block4_0_bn (BatchNormali (3, 46, 70, 224) 896 conv3_block3_concat[0][0]


conv3_block4_0_relu (Activation (3, 46, 70, 224) 0 conv3_block4_0_bn[0][0]


conv3_block4_1_conv (Conv2D) (3, 46, 70, 128) 28672 conv3_block4_0_relu[0][0]


conv3_block4_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block4_1_conv[0][0]


conv3_block4_1_relu (Activation (3, 46, 70, 128) 0 conv3_block4_1_bn[0][0]


conv3_block4_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block4_1_relu[0][0]


conv3_block4_concat (Concatenat (3, 46, 70, 256) 0 conv3_block3_concat[0][0] conv3_block4_2_conv[0][0]


conv3_block5_0_bn (BatchNormali (3, 46, 70, 256) 1024 conv3_block4_concat[0][0]


conv3_block5_0_relu (Activation (3, 46, 70, 256) 0 conv3_block5_0_bn[0][0]


conv3_block5_1_conv (Conv2D) (3, 46, 70, 128) 32768 conv3_block5_0_relu[0][0]


conv3_block5_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block5_1_conv[0][0]


conv3_block5_1_relu (Activation (3, 46, 70, 128) 0 conv3_block5_1_bn[0][0]


conv3_block5_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block5_1_relu[0][0]


conv3_block5_concat (Concatenat (3, 46, 70, 288) 0 conv3_block4_concat[0][0] conv3_block5_2_conv[0][0]


conv3_block6_0_bn (BatchNormali (3, 46, 70, 288) 1152 conv3_block5_concat[0][0]


conv3_block6_0_relu (Activation (3, 46, 70, 288) 0 conv3_block6_0_bn[0][0]


conv3_block6_1_conv (Conv2D) (3, 46, 70, 128) 36864 conv3_block6_0_relu[0][0]


conv3_block6_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block6_1_conv[0][0]


conv3_block6_1_relu (Activation (3, 46, 70, 128) 0 conv3_block6_1_bn[0][0]


conv3_block6_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block6_1_relu[0][0]


conv3_block6_concat (Concatenat (3, 46, 70, 320) 0 conv3_block5_concat[0][0] conv3_block6_2_conv[0][0]


conv3_block7_0_bn (BatchNormali (3, 46, 70, 320) 1280 conv3_block6_concat[0][0]


conv3_block7_0_relu (Activation (3, 46, 70, 320) 0 conv3_block7_0_bn[0][0]


conv3_block7_1_conv (Conv2D) (3, 46, 70, 128) 40960 conv3_block7_0_relu[0][0]


conv3_block7_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block7_1_conv[0][0]


conv3_block7_1_relu (Activation (3, 46, 70, 128) 0 conv3_block7_1_bn[0][0]


conv3_block7_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block7_1_relu[0][0]


conv3_block7_concat (Concatenat (3, 46, 70, 352) 0 conv3_block6_concat[0][0] conv3_block7_2_conv[0][0]


conv3_block8_0_bn (BatchNormali (3, 46, 70, 352) 1408 conv3_block7_concat[0][0]


conv3_block8_0_relu (Activation (3, 46, 70, 352) 0 conv3_block8_0_bn[0][0]


conv3_block8_1_conv (Conv2D) (3, 46, 70, 128) 45056 conv3_block8_0_relu[0][0]


conv3_block8_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block8_1_conv[0][0]


conv3_block8_1_relu (Activation (3, 46, 70, 128) 0 conv3_block8_1_bn[0][0]


conv3_block8_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block8_1_relu[0][0]


conv3_block8_concat (Concatenat (3, 46, 70, 384) 0 conv3_block7_concat[0][0] conv3_block8_2_conv[0][0]


conv3_block9_0_bn (BatchNormali (3, 46, 70, 384) 1536 conv3_block8_concat[0][0]


conv3_block9_0_relu (Activation (3, 46, 70, 384) 0 conv3_block9_0_bn[0][0]


conv3_block9_1_conv (Conv2D) (3, 46, 70, 128) 49152 conv3_block9_0_relu[0][0]


conv3_block9_1_bn (BatchNormali (3, 46, 70, 128) 512 conv3_block9_1_conv[0][0]


conv3_block9_1_relu (Activation (3, 46, 70, 128) 0 conv3_block9_1_bn[0][0]


conv3_block9_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block9_1_relu[0][0]


conv3_block9_concat (Concatenat (3, 46, 70, 416) 0 conv3_block8_concat[0][0] conv3_block9_2_conv[0][0]


conv3_block10_0_bn (BatchNormal (3, 46, 70, 416) 1664 conv3_block9_concat[0][0]


conv3_block10_0_relu (Activatio (3, 46, 70, 416) 0 conv3_block10_0_bn[0][0]


conv3_block10_1_conv (Conv2D) (3, 46, 70, 128) 53248 conv3_block10_0_relu[0][0]


conv3_block10_1_bn (BatchNormal (3, 46, 70, 128) 512 conv3_block10_1_conv[0][0]


conv3_block10_1_relu (Activatio (3, 46, 70, 128) 0 conv3_block10_1_bn[0][0]


conv3_block10_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block10_1_relu[0][0]


conv3_block10_concat (Concatena (3, 46, 70, 448) 0 conv3_block9_concat[0][0] conv3_block10_2_conv[0][0]


conv3_block11_0_bn (BatchNormal (3, 46, 70, 448) 1792 conv3_block10_concat[0][0]


conv3_block11_0_relu (Activatio (3, 46, 70, 448) 0 conv3_block11_0_bn[0][0]


conv3_block11_1_conv (Conv2D) (3, 46, 70, 128) 57344 conv3_block11_0_relu[0][0]


conv3_block11_1_bn (BatchNormal (3, 46, 70, 128) 512 conv3_block11_1_conv[0][0]


conv3_block11_1_relu (Activatio (3, 46, 70, 128) 0 conv3_block11_1_bn[0][0]


conv3_block11_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block11_1_relu[0][0]


conv3_block11_concat (Concatena (3, 46, 70, 480) 0 conv3_block10_concat[0][0] conv3_block11_2_conv[0][0]


conv3_block12_0_bn (BatchNormal (3, 46, 70, 480) 1920 conv3_block11_concat[0][0]


conv3_block12_0_relu (Activatio (3, 46, 70, 480) 0 conv3_block12_0_bn[0][0]


conv3_block12_1_conv (Conv2D) (3, 46, 70, 128) 61440 conv3_block12_0_relu[0][0]


conv3_block12_1_bn (BatchNormal (3, 46, 70, 128) 512 conv3_block12_1_conv[0][0]


conv3_block12_1_relu (Activatio (3, 46, 70, 128) 0 conv3_block12_1_bn[0][0]


conv3_block12_2_conv (Conv2D) (3, 46, 70, 32) 36864 conv3_block12_1_relu[0][0]


conv3_block12_concat (Concatena (3, 46, 70, 512) 0 conv3_block11_concat[0][0] conv3_block12_2_conv[0][0]


pool3_bn (BatchNormalization) (3, 46, 70, 512) 2048 conv3_block12_concat[0][0]


pool3_relu (Activation) (3, 46, 70, 512) 0 pool3_bn[0][0]


pool3_conv (Conv2D) (3, 46, 70, 256) 131072 pool3_relu[0][0]


pool3_pool (AveragePooling2D) (3, 23, 35, 256) 0 pool3_conv[0][0]


conv4_block1_0_bn (BatchNormali (3, 23, 35, 256) 1024 pool3_pool[0][0]


conv4_block1_0_relu (Activation (3, 23, 35, 256) 0 conv4_block1_0_bn[0][0]


conv4_block1_1_conv (Conv2D) (3, 23, 35, 128) 32768 conv4_block1_0_relu[0][0]


conv4_block1_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block1_1_conv[0][0]


conv4_block1_1_relu (Activation (3, 23, 35, 128) 0 conv4_block1_1_bn[0][0]


conv4_block1_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block1_1_relu[0][0]


conv4_block1_concat (Concatenat (3, 23, 35, 288) 0 pool3_pool[0][0] conv4_block1_2_conv[0][0]


conv4_block2_0_bn (BatchNormali (3, 23, 35, 288) 1152 conv4_block1_concat[0][0]


conv4_block2_0_relu (Activation (3, 23, 35, 288) 0 conv4_block2_0_bn[0][0]


conv4_block2_1_conv (Conv2D) (3, 23, 35, 128) 36864 conv4_block2_0_relu[0][0]


conv4_block2_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block2_1_conv[0][0]


conv4_block2_1_relu (Activation (3, 23, 35, 128) 0 conv4_block2_1_bn[0][0]


conv4_block2_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block2_1_relu[0][0]


conv4_block2_concat (Concatenat (3, 23, 35, 320) 0 conv4_block1_concat[0][0] conv4_block2_2_conv[0][0]


conv4_block3_0_bn (BatchNormali (3, 23, 35, 320) 1280 conv4_block2_concat[0][0]


conv4_block3_0_relu (Activation (3, 23, 35, 320) 0 conv4_block3_0_bn[0][0]


conv4_block3_1_conv (Conv2D) (3, 23, 35, 128) 40960 conv4_block3_0_relu[0][0]


conv4_block3_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block3_1_conv[0][0]


conv4_block3_1_relu (Activation (3, 23, 35, 128) 0 conv4_block3_1_bn[0][0]


conv4_block3_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block3_1_relu[0][0]


conv4_block3_concat (Concatenat (3, 23, 35, 352) 0 conv4_block2_concat[0][0] conv4_block3_2_conv[0][0]


conv4_block4_0_bn (BatchNormali (3, 23, 35, 352) 1408 conv4_block3_concat[0][0]


conv4_block4_0_relu (Activation (3, 23, 35, 352) 0 conv4_block4_0_bn[0][0]


conv4_block4_1_conv (Conv2D) (3, 23, 35, 128) 45056 conv4_block4_0_relu[0][0]


conv4_block4_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block4_1_conv[0][0]


conv4_block4_1_relu (Activation (3, 23, 35, 128) 0 conv4_block4_1_bn[0][0]


conv4_block4_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block4_1_relu[0][0]


conv4_block4_concat (Concatenat (3, 23, 35, 384) 0 conv4_block3_concat[0][0] conv4_block4_2_conv[0][0]


conv4_block5_0_bn (BatchNormali (3, 23, 35, 384) 1536 conv4_block4_concat[0][0]


conv4_block5_0_relu (Activation (3, 23, 35, 384) 0 conv4_block5_0_bn[0][0]


conv4_block5_1_conv (Conv2D) (3, 23, 35, 128) 49152 conv4_block5_0_relu[0][0]


conv4_block5_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block5_1_conv[0][0]


conv4_block5_1_relu (Activation (3, 23, 35, 128) 0 conv4_block5_1_bn[0][0]


conv4_block5_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block5_1_relu[0][0]


conv4_block5_concat (Concatenat (3, 23, 35, 416) 0 conv4_block4_concat[0][0] conv4_block5_2_conv[0][0]


conv4_block6_0_bn (BatchNormali (3, 23, 35, 416) 1664 conv4_block5_concat[0][0]


conv4_block6_0_relu (Activation (3, 23, 35, 416) 0 conv4_block6_0_bn[0][0]


conv4_block6_1_conv (Conv2D) (3, 23, 35, 128) 53248 conv4_block6_0_relu[0][0]


conv4_block6_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block6_1_conv[0][0]


conv4_block6_1_relu (Activation (3, 23, 35, 128) 0 conv4_block6_1_bn[0][0]


conv4_block6_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block6_1_relu[0][0]


conv4_block6_concat (Concatenat (3, 23, 35, 448) 0 conv4_block5_concat[0][0] conv4_block6_2_conv[0][0]


conv4_block7_0_bn (BatchNormali (3, 23, 35, 448) 1792 conv4_block6_concat[0][0]


conv4_block7_0_relu (Activation (3, 23, 35, 448) 0 conv4_block7_0_bn[0][0]


conv4_block7_1_conv (Conv2D) (3, 23, 35, 128) 57344 conv4_block7_0_relu[0][0]


conv4_block7_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block7_1_conv[0][0]


conv4_block7_1_relu (Activation (3, 23, 35, 128) 0 conv4_block7_1_bn[0][0]


conv4_block7_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block7_1_relu[0][0]


conv4_block7_concat (Concatenat (3, 23, 35, 480) 0 conv4_block6_concat[0][0] conv4_block7_2_conv[0][0]


conv4_block8_0_bn (BatchNormali (3, 23, 35, 480) 1920 conv4_block7_concat[0][0]


conv4_block8_0_relu (Activation (3, 23, 35, 480) 0 conv4_block8_0_bn[0][0]


conv4_block8_1_conv (Conv2D) (3, 23, 35, 128) 61440 conv4_block8_0_relu[0][0]


conv4_block8_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block8_1_conv[0][0]


conv4_block8_1_relu (Activation (3, 23, 35, 128) 0 conv4_block8_1_bn[0][0]


conv4_block8_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block8_1_relu[0][0]


conv4_block8_concat (Concatenat (3, 23, 35, 512) 0 conv4_block7_concat[0][0] conv4_block8_2_conv[0][0]


conv4_block9_0_bn (BatchNormali (3, 23, 35, 512) 2048 conv4_block8_concat[0][0]


conv4_block9_0_relu (Activation (3, 23, 35, 512) 0 conv4_block9_0_bn[0][0]


conv4_block9_1_conv (Conv2D) (3, 23, 35, 128) 65536 conv4_block9_0_relu[0][0]


conv4_block9_1_bn (BatchNormali (3, 23, 35, 128) 512 conv4_block9_1_conv[0][0]


conv4_block9_1_relu (Activation (3, 23, 35, 128) 0 conv4_block9_1_bn[0][0]


conv4_block9_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block9_1_relu[0][0]


conv4_block9_concat (Concatenat (3, 23, 35, 544) 0 conv4_block8_concat[0][0] conv4_block9_2_conv[0][0]


conv4_block10_0_bn (BatchNormal (3, 23, 35, 544) 2176 conv4_block9_concat[0][0]


conv4_block10_0_relu (Activatio (3, 23, 35, 544) 0 conv4_block10_0_bn[0][0]


conv4_block10_1_conv (Conv2D) (3, 23, 35, 128) 69632 conv4_block10_0_relu[0][0]


conv4_block10_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block10_1_conv[0][0]


conv4_block10_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block10_1_bn[0][0]


conv4_block10_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block10_1_relu[0][0]


conv4_block10_concat (Concatena (3, 23, 35, 576) 0 conv4_block9_concat[0][0] conv4_block10_2_conv[0][0]


conv4_block11_0_bn (BatchNormal (3, 23, 35, 576) 2304 conv4_block10_concat[0][0]


conv4_block11_0_relu (Activatio (3, 23, 35, 576) 0 conv4_block11_0_bn[0][0]


conv4_block11_1_conv (Conv2D) (3, 23, 35, 128) 73728 conv4_block11_0_relu[0][0]


conv4_block11_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block11_1_conv[0][0]


conv4_block11_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block11_1_bn[0][0]


conv4_block11_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block11_1_relu[0][0]


conv4_block11_concat (Concatena (3, 23, 35, 608) 0 conv4_block10_concat[0][0] conv4_block11_2_conv[0][0]


conv4_block12_0_bn (BatchNormal (3, 23, 35, 608) 2432 conv4_block11_concat[0][0]


conv4_block12_0_relu (Activatio (3, 23, 35, 608) 0 conv4_block12_0_bn[0][0]


conv4_block12_1_conv (Conv2D) (3, 23, 35, 128) 77824 conv4_block12_0_relu[0][0]


conv4_block12_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block12_1_conv[0][0]


conv4_block12_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block12_1_bn[0][0]


conv4_block12_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block12_1_relu[0][0]


conv4_block12_concat (Concatena (3, 23, 35, 640) 0 conv4_block11_concat[0][0] conv4_block12_2_conv[0][0]


conv4_block13_0_bn (BatchNormal (3, 23, 35, 640) 2560 conv4_block12_concat[0][0]


conv4_block13_0_relu (Activatio (3, 23, 35, 640) 0 conv4_block13_0_bn[0][0]


conv4_block13_1_conv (Conv2D) (3, 23, 35, 128) 81920 conv4_block13_0_relu[0][0]


conv4_block13_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block13_1_conv[0][0]


conv4_block13_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block13_1_bn[0][0]


conv4_block13_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block13_1_relu[0][0]


conv4_block13_concat (Concatena (3, 23, 35, 672) 0 conv4_block12_concat[0][0] conv4_block13_2_conv[0][0]


conv4_block14_0_bn (BatchNormal (3, 23, 35, 672) 2688 conv4_block13_concat[0][0]


conv4_block14_0_relu (Activatio (3, 23, 35, 672) 0 conv4_block14_0_bn[0][0]


conv4_block14_1_conv (Conv2D) (3, 23, 35, 128) 86016 conv4_block14_0_relu[0][0]


conv4_block14_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block14_1_conv[0][0]


conv4_block14_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block14_1_bn[0][0]


conv4_block14_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block14_1_relu[0][0]


conv4_block14_concat (Concatena (3, 23, 35, 704) 0 conv4_block13_concat[0][0] conv4_block14_2_conv[0][0]


conv4_block15_0_bn (BatchNormal (3, 23, 35, 704) 2816 conv4_block14_concat[0][0]


conv4_block15_0_relu (Activatio (3, 23, 35, 704) 0 conv4_block15_0_bn[0][0]


conv4_block15_1_conv (Conv2D) (3, 23, 35, 128) 90112 conv4_block15_0_relu[0][0]


conv4_block15_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block15_1_conv[0][0]


conv4_block15_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block15_1_bn[0][0]


conv4_block15_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block15_1_relu[0][0]


conv4_block15_concat (Concatena (3, 23, 35, 736) 0 conv4_block14_concat[0][0] conv4_block15_2_conv[0][0]


conv4_block16_0_bn (BatchNormal (3, 23, 35, 736) 2944 conv4_block15_concat[0][0]


conv4_block16_0_relu (Activatio (3, 23, 35, 736) 0 conv4_block16_0_bn[0][0]


conv4_block16_1_conv (Conv2D) (3, 23, 35, 128) 94208 conv4_block16_0_relu[0][0]


conv4_block16_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block16_1_conv[0][0]


conv4_block16_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block16_1_bn[0][0]


conv4_block16_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block16_1_relu[0][0]


conv4_block16_concat (Concatena (3, 23, 35, 768) 0 conv4_block15_concat[0][0] conv4_block16_2_conv[0][0]


conv4_block17_0_bn (BatchNormal (3, 23, 35, 768) 3072 conv4_block16_concat[0][0]


conv4_block17_0_relu (Activatio (3, 23, 35, 768) 0 conv4_block17_0_bn[0][0]


conv4_block17_1_conv (Conv2D) (3, 23, 35, 128) 98304 conv4_block17_0_relu[0][0]


conv4_block17_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block17_1_conv[0][0]


conv4_block17_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block17_1_bn[0][0]


conv4_block17_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block17_1_relu[0][0]


conv4_block17_concat (Concatena (3, 23, 35, 800) 0 conv4_block16_concat[0][0] conv4_block17_2_conv[0][0]


conv4_block18_0_bn (BatchNormal (3, 23, 35, 800) 3200 conv4_block17_concat[0][0]


conv4_block18_0_relu (Activatio (3, 23, 35, 800) 0 conv4_block18_0_bn[0][0]


conv4_block18_1_conv (Conv2D) (3, 23, 35, 128) 102400 conv4_block18_0_relu[0][0]


conv4_block18_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block18_1_conv[0][0]


conv4_block18_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block18_1_bn[0][0]


conv4_block18_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block18_1_relu[0][0]


conv4_block18_concat (Concatena (3, 23, 35, 832) 0 conv4_block17_concat[0][0] conv4_block18_2_conv[0][0]


conv4_block19_0_bn (BatchNormal (3, 23, 35, 832) 3328 conv4_block18_concat[0][0]


conv4_block19_0_relu (Activatio (3, 23, 35, 832) 0 conv4_block19_0_bn[0][0]


conv4_block19_1_conv (Conv2D) (3, 23, 35, 128) 106496 conv4_block19_0_relu[0][0]


conv4_block19_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block19_1_conv[0][0]


conv4_block19_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block19_1_bn[0][0]


conv4_block19_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block19_1_relu[0][0]


conv4_block19_concat (Concatena (3, 23, 35, 864) 0 conv4_block18_concat[0][0] conv4_block19_2_conv[0][0]


conv4_block20_0_bn (BatchNormal (3, 23, 35, 864) 3456 conv4_block19_concat[0][0]


conv4_block20_0_relu (Activatio (3, 23, 35, 864) 0 conv4_block20_0_bn[0][0]


conv4_block20_1_conv (Conv2D) (3, 23, 35, 128) 110592 conv4_block20_0_relu[0][0]


conv4_block20_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block20_1_conv[0][0]


conv4_block20_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block20_1_bn[0][0]


conv4_block20_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block20_1_relu[0][0]


conv4_block20_concat (Concatena (3, 23, 35, 896) 0 conv4_block19_concat[0][0] conv4_block20_2_conv[0][0]


conv4_block21_0_bn (BatchNormal (3, 23, 35, 896) 3584 conv4_block20_concat[0][0]


conv4_block21_0_relu (Activatio (3, 23, 35, 896) 0 conv4_block21_0_bn[0][0]


conv4_block21_1_conv (Conv2D) (3, 23, 35, 128) 114688 conv4_block21_0_relu[0][0]


conv4_block21_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block21_1_conv[0][0]


conv4_block21_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block21_1_bn[0][0]


conv4_block21_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block21_1_relu[0][0]


conv4_block21_concat (Concatena (3, 23, 35, 928) 0 conv4_block20_concat[0][0] conv4_block21_2_conv[0][0]


conv4_block22_0_bn (BatchNormal (3, 23, 35, 928) 3712 conv4_block21_concat[0][0]


conv4_block22_0_relu (Activatio (3, 23, 35, 928) 0 conv4_block22_0_bn[0][0]


conv4_block22_1_conv (Conv2D) (3, 23, 35, 128) 118784 conv4_block22_0_relu[0][0]


conv4_block22_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block22_1_conv[0][0]


conv4_block22_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block22_1_bn[0][0]


conv4_block22_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block22_1_relu[0][0]


conv4_block22_concat (Concatena (3, 23, 35, 960) 0 conv4_block21_concat[0][0] conv4_block22_2_conv[0][0]


conv4_block23_0_bn (BatchNormal (3, 23, 35, 960) 3840 conv4_block22_concat[0][0]


conv4_block23_0_relu (Activatio (3, 23, 35, 960) 0 conv4_block23_0_bn[0][0]


conv4_block23_1_conv (Conv2D) (3, 23, 35, 128) 122880 conv4_block23_0_relu[0][0]


conv4_block23_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block23_1_conv[0][0]


conv4_block23_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block23_1_bn[0][0]


conv4_block23_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block23_1_relu[0][0]


conv4_block23_concat (Concatena (3, 23, 35, 992) 0 conv4_block22_concat[0][0] conv4_block23_2_conv[0][0]


conv4_block24_0_bn (BatchNormal (3, 23, 35, 992) 3968 conv4_block23_concat[0][0]


conv4_block24_0_relu (Activatio (3, 23, 35, 992) 0 conv4_block24_0_bn[0][0]


conv4_block24_1_conv (Conv2D) (3, 23, 35, 128) 126976 conv4_block24_0_relu[0][0]


conv4_block24_1_bn (BatchNormal (3, 23, 35, 128) 512 conv4_block24_1_conv[0][0]


conv4_block24_1_relu (Activatio (3, 23, 35, 128) 0 conv4_block24_1_bn[0][0]


conv4_block24_2_conv (Conv2D) (3, 23, 35, 32) 36864 conv4_block24_1_relu[0][0]


conv4_block24_concat (Concatena (3, 23, 35, 1024) 0 conv4_block23_concat[0][0] conv4_block24_2_conv[0][0]


pool4_bn (BatchNormalization) (3, 23, 35, 1024) 4096 conv4_block24_concat[0][0]


pool4_relu (Activation) (3, 23, 35, 1024) 0 pool4_bn[0][0]


pool4_conv (Conv2D) (3, 23, 35, 512) 524288 pool4_relu[0][0]


pool4_pool (AveragePooling2D) (3, 11, 17, 512) 0 pool4_conv[0][0]


conv5_block1_0_bn (BatchNormali (3, 11, 17, 512) 2048 pool4_pool[0][0]


conv5_block1_0_relu (Activation (3, 11, 17, 512) 0 conv5_block1_0_bn[0][0]


conv5_block1_1_conv (Conv2D) (3, 11, 17, 128) 65536 conv5_block1_0_relu[0][0]


conv5_block1_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block1_1_conv[0][0]


conv5_block1_1_relu (Activation (3, 11, 17, 128) 0 conv5_block1_1_bn[0][0]


conv5_block1_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block1_1_relu[0][0]


conv5_block1_concat (Concatenat (3, 11, 17, 544) 0 pool4_pool[0][0] conv5_block1_2_conv[0][0]


conv5_block2_0_bn (BatchNormali (3, 11, 17, 544) 2176 conv5_block1_concat[0][0]


conv5_block2_0_relu (Activation (3, 11, 17, 544) 0 conv5_block2_0_bn[0][0]


conv5_block2_1_conv (Conv2D) (3, 11, 17, 128) 69632 conv5_block2_0_relu[0][0]


conv5_block2_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block2_1_conv[0][0]


conv5_block2_1_relu (Activation (3, 11, 17, 128) 0 conv5_block2_1_bn[0][0]


conv5_block2_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block2_1_relu[0][0]


conv5_block2_concat (Concatenat (3, 11, 17, 576) 0 conv5_block1_concat[0][0] conv5_block2_2_conv[0][0]


conv5_block3_0_bn (BatchNormali (3, 11, 17, 576) 2304 conv5_block2_concat[0][0]


conv5_block3_0_relu (Activation (3, 11, 17, 576) 0 conv5_block3_0_bn[0][0]


conv5_block3_1_conv (Conv2D) (3, 11, 17, 128) 73728 conv5_block3_0_relu[0][0]


conv5_block3_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block3_1_conv[0][0]


conv5_block3_1_relu (Activation (3, 11, 17, 128) 0 conv5_block3_1_bn[0][0]


conv5_block3_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block3_1_relu[0][0]


conv5_block3_concat (Concatenat (3, 11, 17, 608) 0 conv5_block2_concat[0][0] conv5_block3_2_conv[0][0]


conv5_block4_0_bn (BatchNormali (3, 11, 17, 608) 2432 conv5_block3_concat[0][0]


conv5_block4_0_relu (Activation (3, 11, 17, 608) 0 conv5_block4_0_bn[0][0]


conv5_block4_1_conv (Conv2D) (3, 11, 17, 128) 77824 conv5_block4_0_relu[0][0]


conv5_block4_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block4_1_conv[0][0]


conv5_block4_1_relu (Activation (3, 11, 17, 128) 0 conv5_block4_1_bn[0][0]


conv5_block4_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block4_1_relu[0][0]


conv5_block4_concat (Concatenat (3, 11, 17, 640) 0 conv5_block3_concat[0][0] conv5_block4_2_conv[0][0]


conv5_block5_0_bn (BatchNormali (3, 11, 17, 640) 2560 conv5_block4_concat[0][0]


conv5_block5_0_relu (Activation (3, 11, 17, 640) 0 conv5_block5_0_bn[0][0]


conv5_block5_1_conv (Conv2D) (3, 11, 17, 128) 81920 conv5_block5_0_relu[0][0]


conv5_block5_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block5_1_conv[0][0]


conv5_block5_1_relu (Activation (3, 11, 17, 128) 0 conv5_block5_1_bn[0][0]


conv5_block5_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block5_1_relu[0][0]


conv5_block5_concat (Concatenat (3, 11, 17, 672) 0 conv5_block4_concat[0][0] conv5_block5_2_conv[0][0]


conv5_block6_0_bn (BatchNormali (3, 11, 17, 672) 2688 conv5_block5_concat[0][0]


conv5_block6_0_relu (Activation (3, 11, 17, 672) 0 conv5_block6_0_bn[0][0]


conv5_block6_1_conv (Conv2D) (3, 11, 17, 128) 86016 conv5_block6_0_relu[0][0]


conv5_block6_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block6_1_conv[0][0]


conv5_block6_1_relu (Activation (3, 11, 17, 128) 0 conv5_block6_1_bn[0][0]


conv5_block6_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block6_1_relu[0][0]


conv5_block6_concat (Concatenat (3, 11, 17, 704) 0 conv5_block5_concat[0][0] conv5_block6_2_conv[0][0]


conv5_block7_0_bn (BatchNormali (3, 11, 17, 704) 2816 conv5_block6_concat[0][0]


conv5_block7_0_relu (Activation (3, 11, 17, 704) 0 conv5_block7_0_bn[0][0]


conv5_block7_1_conv (Conv2D) (3, 11, 17, 128) 90112 conv5_block7_0_relu[0][0]


conv5_block7_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block7_1_conv[0][0]


conv5_block7_1_relu (Activation (3, 11, 17, 128) 0 conv5_block7_1_bn[0][0]


conv5_block7_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block7_1_relu[0][0]


conv5_block7_concat (Concatenat (3, 11, 17, 736) 0 conv5_block6_concat[0][0] conv5_block7_2_conv[0][0]


conv5_block8_0_bn (BatchNormali (3, 11, 17, 736) 2944 conv5_block7_concat[0][0]


conv5_block8_0_relu (Activation (3, 11, 17, 736) 0 conv5_block8_0_bn[0][0]


conv5_block8_1_conv (Conv2D) (3, 11, 17, 128) 94208 conv5_block8_0_relu[0][0]


conv5_block8_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block8_1_conv[0][0]


conv5_block8_1_relu (Activation (3, 11, 17, 128) 0 conv5_block8_1_bn[0][0]


conv5_block8_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block8_1_relu[0][0]


conv5_block8_concat (Concatenat (3, 11, 17, 768) 0 conv5_block7_concat[0][0] conv5_block8_2_conv[0][0]


conv5_block9_0_bn (BatchNormali (3, 11, 17, 768) 3072 conv5_block8_concat[0][0]


conv5_block9_0_relu (Activation (3, 11, 17, 768) 0 conv5_block9_0_bn[0][0]


conv5_block9_1_conv (Conv2D) (3, 11, 17, 128) 98304 conv5_block9_0_relu[0][0]


conv5_block9_1_bn (BatchNormali (3, 11, 17, 128) 512 conv5_block9_1_conv[0][0]


conv5_block9_1_relu (Activation (3, 11, 17, 128) 0 conv5_block9_1_bn[0][0]


conv5_block9_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block9_1_relu[0][0]


conv5_block9_concat (Concatenat (3, 11, 17, 800) 0 conv5_block8_concat[0][0] conv5_block9_2_conv[0][0]


conv5_block10_0_bn (BatchNormal (3, 11, 17, 800) 3200 conv5_block9_concat[0][0]


conv5_block10_0_relu (Activatio (3, 11, 17, 800) 0 conv5_block10_0_bn[0][0]


conv5_block10_1_conv (Conv2D) (3, 11, 17, 128) 102400 conv5_block10_0_relu[0][0]


conv5_block10_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block10_1_conv[0][0]


conv5_block10_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block10_1_bn[0][0]


conv5_block10_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block10_1_relu[0][0]


conv5_block10_concat (Concatena (3, 11, 17, 832) 0 conv5_block9_concat[0][0] conv5_block10_2_conv[0][0]


conv5_block11_0_bn (BatchNormal (3, 11, 17, 832) 3328 conv5_block10_concat[0][0]


conv5_block11_0_relu (Activatio (3, 11, 17, 832) 0 conv5_block11_0_bn[0][0]


conv5_block11_1_conv (Conv2D) (3, 11, 17, 128) 106496 conv5_block11_0_relu[0][0]


conv5_block11_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block11_1_conv[0][0]


conv5_block11_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block11_1_bn[0][0]


conv5_block11_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block11_1_relu[0][0]


conv5_block11_concat (Concatena (3, 11, 17, 864) 0 conv5_block10_concat[0][0] conv5_block11_2_conv[0][0]


conv5_block12_0_bn (BatchNormal (3, 11, 17, 864) 3456 conv5_block11_concat[0][0]


conv5_block12_0_relu (Activatio (3, 11, 17, 864) 0 conv5_block12_0_bn[0][0]


conv5_block12_1_conv (Conv2D) (3, 11, 17, 128) 110592 conv5_block12_0_relu[0][0]


conv5_block12_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block12_1_conv[0][0]


conv5_block12_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block12_1_bn[0][0]


conv5_block12_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block12_1_relu[0][0]


conv5_block12_concat (Concatena (3, 11, 17, 896) 0 conv5_block11_concat[0][0] conv5_block12_2_conv[0][0]


conv5_block13_0_bn (BatchNormal (3, 11, 17, 896) 3584 conv5_block12_concat[0][0]


conv5_block13_0_relu (Activatio (3, 11, 17, 896) 0 conv5_block13_0_bn[0][0]


conv5_block13_1_conv (Conv2D) (3, 11, 17, 128) 114688 conv5_block13_0_relu[0][0]


conv5_block13_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block13_1_conv[0][0]


conv5_block13_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block13_1_bn[0][0]


conv5_block13_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block13_1_relu[0][0]


conv5_block13_concat (Concatena (3, 11, 17, 928) 0 conv5_block12_concat[0][0] conv5_block13_2_conv[0][0]


conv5_block14_0_bn (BatchNormal (3, 11, 17, 928) 3712 conv5_block13_concat[0][0]


conv5_block14_0_relu (Activatio (3, 11, 17, 928) 0 conv5_block14_0_bn[0][0]


conv5_block14_1_conv (Conv2D) (3, 11, 17, 128) 118784 conv5_block14_0_relu[0][0]


conv5_block14_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block14_1_conv[0][0]


conv5_block14_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block14_1_bn[0][0]


conv5_block14_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block14_1_relu[0][0]


conv5_block14_concat (Concatena (3, 11, 17, 960) 0 conv5_block13_concat[0][0] conv5_block14_2_conv[0][0]


conv5_block15_0_bn (BatchNormal (3, 11, 17, 960) 3840 conv5_block14_concat[0][0]


conv5_block15_0_relu (Activatio (3, 11, 17, 960) 0 conv5_block15_0_bn[0][0]


conv5_block15_1_conv (Conv2D) (3, 11, 17, 128) 122880 conv5_block15_0_relu[0][0]


conv5_block15_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block15_1_conv[0][0]


conv5_block15_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block15_1_bn[0][0]


conv5_block15_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block15_1_relu[0][0]


conv5_block15_concat (Concatena (3, 11, 17, 992) 0 conv5_block14_concat[0][0] conv5_block15_2_conv[0][0]


conv5_block16_0_bn (BatchNormal (3, 11, 17, 992) 3968 conv5_block15_concat[0][0]


conv5_block16_0_relu (Activatio (3, 11, 17, 992) 0 conv5_block16_0_bn[0][0]


conv5_block16_1_conv (Conv2D) (3, 11, 17, 128) 126976 conv5_block16_0_relu[0][0]


conv5_block16_1_bn (BatchNormal (3, 11, 17, 128) 512 conv5_block16_1_conv[0][0]


conv5_block16_1_relu (Activatio (3, 11, 17, 128) 0 conv5_block16_1_bn[0][0]


conv5_block16_2_conv (Conv2D) (3, 11, 17, 32) 36864 conv5_block16_1_relu[0][0]


conv5_block16_concat (Concatena (3, 11, 17, 1024) 0 conv5_block15_concat[0][0] conv5_block16_2_conv[0][0]


bn (BatchNormalization) (3, 11, 17, 1024) 4096 conv5_block16_concat[0][0]


relu (Activation) (3, 11, 17, 1024) 0 bn[0][0]


avg_pool (GlobalAveragePooling2 (3, 1024) 0 relu[0][0]


fc1000 (Dense) (3, 10) 10250 avg_pool[0][0]

Total params: 7,047,754 Trainable params: 6,964,106 Non-trainable params: 83,648


Train for 100 steps, validate for 10 steps 2019-11-06 11:12:15.235702: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcublas.so.10.0 shape TensorShape([12, 372, 558, 3]) [12 372 558 3] 2019-11-06 11:12:15.481528: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds. shape TensorShape([12, 372, 558, 3]) [12 372 558 3] 2019-11-06 11:12:15.485747: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds. shape TensorShape([12, 372, 558, 3]) [12 372 558 3] 2019-11-06 11:12:15.488817: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds. 2019-11-06 11:12:15.489183: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_batch_print_35}} slice index 10 of dimension 0 out of bounds. [[{{node strided_slice}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext_2]] [[Identity_4/_188]] 2019-11-06 11:12:15.489398: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_batch_print_35}} slice index 10 of dimension 0 out of bounds. [[{{node strided_slice}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext_2]] shape TensorShape([12, 372, 558, 3]) [12 372 558 3] 2019-11-06 11:12:15.494247: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds. 2019-11-06 11:12:15.887854: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_batch_print_35}} slice index 10 of dimension 0 out of bounds. [[{{node strided_slice}}]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext_2]] [[replica_2/metrics/accuracy/AssignAddVariableOp_1/_39]] 1/100 [..............................] - ETA: 3:57:11Traceback (most recent call last): File "/user/vmarkovtsev/images/efficientoffice/efficientoffice/shape_bug.py", line 45, in <module> sys.exit(main()) File "/user/vmarkovtsev/images/efficientoffice/efficientoffice/shape_bug.py", line 41, in main model.fit(ds_train, validation_data=ds_val, epochs=1, steps_per_epoch=100) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch batch_outs = execution_function(iterator) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function distributed_function(input_fn)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 457, in call result = self._call(*args, **kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call return self._stateless_fn(*args, **kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1823, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call self.captured_inputs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat ctx, args, cancellation_manager=cancellation_manager) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call ctx=ctx) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: 4 root error(s) found. (0) Invalid argument: slice index 10 of dimension 0 out of bounds. [[node strided_slice (defined at /local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext_2]] [[Identity_4/_188]] (1) Invalid argument: slice index 10 of dimension 0 out of bounds. [[node strided_slice (defined at /local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]] [[MultiDeviceIteratorGetNextFromShard]] [[RemoteCall]] [[IteratorGetNext_2]] (2) Cancelled: (3) Cancelled: 0 successful operations. 1 derived errors ignored. [Op:__inference_distributed_function_166689]

Function call stack: distributed_function -> distributed_function -> distributed_function -> distributed_function -> distributed_function ->distributed_function </pre> </details>

This is how the log ends - the crash:

Train for 100 steps, validate for 10 steps
2019-11-06 11:12:15.235702: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamiclibrary libcublas.so.10.0
shape TensorShape([12, 372, 558, 3]) [12 372 558 3]
2019-11-06 11:12:15.481528: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds.
shape TensorShape([12, 372, 558, 3]) [12 372 558 3]
2019-11-06 11:12:15.485747: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds.
shape TensorShape([12, 372, 558, 3]) [12 372 558 3]
2019-11-06 11:12:15.488817: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds.
2019-11-06 11:12:15.489183: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_batch_print_35}} slice index 10 of dimension 0 out of bounds.
         [[{{node strided_slice}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext_2]]
         [[Identity_4/_188]]
2019-11-06 11:12:15.489398: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_batch_print_35}} slice index 10 of dimension 0 out of bounds.
         [[{{node strided_slice}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext_2]]
shape TensorShape([12, 372, 558, 3]) [12 372 558 3]
2019-11-06 11:12:15.494247: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at strided_slice_op.cc:108 : Invalid argument: slice index 10 of dimension 0 out of bounds.
2019-11-06 11:12:15.887854: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: {{function_node __inference_Dataset_map_batch_print_35}} slice index 10 of dimension 0 out of bounds.
         [[{{node strided_slice}}]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext_2]]
         [[replica_2/metrics/accuracy/AssignAddVariableOp_1/_39]]
  1/100 [..............................] - ETA: 3:57:11Traceback (most recent call last):
  File "/user/vmarkovtsev/images/efficientoffice/efficientoffice/shape_bug.py", line 45, in <module>
    sys.exit(main())
  File "/user/vmarkovtsev/images/efficientoffice/efficientoffice/shape_bug.py", line 41, in main
    model.fit(ds_train, validation_data=ds_val, epochs=1, steps_per_epoch=100)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 728, in fit
    use_multiprocessing=use_multiprocessing)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 324, in fit
    total_epochs=epochs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 123, in run_one_epoch
    batch_outs = execution_function(iterator)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 86, in execution_function
    distributed_function(input_fn))
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
    self.captured_inputs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 511, in call
    ctx=ctx)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: 4 root error(s) found.
  (0) Invalid argument:   slice index 10 of dimension 0 out of bounds.
         [[node strided_slice (defined at /local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext_2]]
         [[Identity_4/_188]]
  (1) Invalid argument:   slice index 10 of dimension 0 out of bounds.
         [[node strided_slice (defined at /local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1751) ]]
         [[MultiDeviceIteratorGetNextFromShard]]
         [[RemoteCall]]
         [[IteratorGetNext_2]]
  (2) Cancelled:
  (3) Cancelled:
0 successful operations.
1 derived errors ignored. [Op:__inference_distributed_function_166689]

Function call stack:
distributed_function -> distributed_function -> distributed_function -> distributed_function -> distributed_function ->distributed_function

This is why the bug is so spicy: both the static and dynamic shapes are 12, but if you try to access an element under index 3+ (3 = 12 / 4), you crash. I am really interested in why.

If you remove drop_remainder=True, the code works.

closed time in 2 months

vmarkovtsev

issue commenttensorflow/tensorflow

tf.data.Dataset fixed size batching with subsequent map() under tf.distribute.MirroredStrategy leads to a crash

This has to do with how tf.data rebatches datasets for distribution strategy. We recently changed the implementation of rebatching such that this will no longer happen; the fix will be available in TF 2.1. Let us know if this is still an issue.

vmarkovtsev

comment created time in 2 months

issue commenttensorflow/tensorflow

tf.data.experimental.choose_from_datasets freezes if one of the datasets is empty

Hi @thijskooi, I'm not able to reproduce your error with TF 1.14.0 (and at HEAD). Can you provide a repro?

thijskooi

comment created time in 3 months

push eventfrreiss/tensorflow-fred

Rachel Lim

commit sha c89247a90e206d1983b12e0665cd503ab4dbaf36

Update lmdb_dataset_op_test.cc Use `TensorFlowSrcRoot()` to get the tensorflow root path instead of the "tensorflow" path directly

view details

push time in 3 months

pull request commenttensorflow/tensorflow

Add tests of LMDBDataset

You should use testing::TensorFlowSrcRoot() instead of the tensorflow path directly, i.e.

string src_loc = io::JoinPath(testing::TensorFlowSrcRoot(), kDataFileLoc, kDataFileName);

frreiss

comment created time in 3 months

delete branch rachellim/tensorflow

delete branch : cherrypicks_WKNER

delete time in 3 months

push eventrachellim/timey-wimey

Rachel Lim

commit sha 3b72ba54066552fbf88d8fd34245b0f3f3ce0bfb

better error handling

view details

Rachel Lim

commit sha d8ebf5f9fdccdba4dd34ecd36f8dab94d83531e5

Merge branch 'master' of https://github.com/rachellim/timey-wimey

view details

push time in 3 months

PR opened tensorflow/tensorflow

[tf.data] Fix deadlock with Prefetch+ParallelMap

PiperOrigin-RevId: 281851149 Change-Id: I1b776edb68b45eabc9a0e931135470cae1b6e8f1

+12 -8

0 comment

1 changed file

pr created time in 3 months

create barnchrachellim/tensorflow

branch : cherrypicks_WKNER

created branch time in 3 months

create barnchrachellim/tensorflow

branch : cherrypicks_ZJDYY

created branch time in 3 months

push eventrachellim/tensorflow

nuka137

commit sha f7537260013b0b626bd3898e447414908bc6426c

Fix: typo in tensorflow/compiler/xla/g3doc/shapes.md

view details

A. Unique TensorFlower

commit sha 9f495d9a34fece2b8d8c03779a4a5c8e3ee5ea63

TFL: Replace %d with %PRId32 Use %PRId32 to give the correct format specifier for int32_t on any platform. PiperOrigin-RevId: 281835274 Change-Id: I3b20c654367b3ec0cddc4427f1e7e07a31b75e3a

view details

Hye Soo Yang

commit sha 96dfaba7c63972a905bb59e2587f4cf48102a60f

Python 3 Migration. //tensorflow/(contrib|python) PiperOrigin-RevId: 281835503 Change-Id: I675e3188389186941ef55964d31b309d16fddcaa

view details

David Majnemer

commit sha 44a2fcb38dcaa341d161bee719290fbced9b60e5

[XLA] Simplify the SplitF64ToF32 algorithm Compute hi by rounding the input double to float. This should be identical numerically but with fewer operations. PiperOrigin-RevId: 281836757 Change-Id: I982288ead9b4d8986db3ea0a17c29ce92e8dbd96

view details

Shanqing Cai

commit sha c974f7ed5f63278c24165d626e9e5dd63f18f7ae

[tfdbg] Support enable_check_numerics() and enable_dump_debug_info() callback on TPUs - Skip a set of TPU compilation-specific ops from tfdbg's op callbacks. PiperOrigin-RevId: 281836861 Change-Id: Ic7ff59a32eba26d5bb3ee2ac4f5f9166c78928c8

view details

Yanhua Sun

commit sha 0f2f9d54eefc6490609988e7b0eed4b4f747fc39

dropout optimization 1. remove .numpy() call, which is expensive 2. instead of using 1, using tensor(1), which avoid convert_to_tensor cost 3. In the case rate is scale, directly calculate instead of generating ops 4. Instead of calling math_op binary ops, directly call gen_math_op since the inputs are known to be tensor. PiperOrigin-RevId: 281837216 Change-Id: I9f96ae248f7528d715e353826c79df24a055c39a

view details

Yifei Feng

commit sha c5b2ef01e0c9e6758b3e0bf82bd64767555e6c6c

Do not use common script to install bazel the paths in docker image are different. PiperOrigin-RevId: 281837917 Change-Id: Id62738e214ef66ffd17226bc0c90ae9912fd9bf9

view details

Yifei Feng

commit sha 5b8ed8148bbcdf1775aa5140dbd86fcf5d2347b7

add missing "echo". PiperOrigin-RevId: 281838161 Change-Id: I396accac7be4e74f8afa00abb554d5e504e0b4f2

view details

Advait Jain

commit sha dfe292a4a58a255e80e08f46b14cf78b9007eb78

Minor cleanup of visibility. PiperOrigin-RevId: 281839738 Change-Id: Ib2e91e4787b42858722431f3e6f2bafb4a864243

view details

Yunxing Dai

commit sha 455656d1c4906f0ecbd8ec2df02d648db2eaa475

Optimize hash_set insertion/lookup into one operation in CSE. PiperOrigin-RevId: 281840130 Change-Id: Icd9f2298b4bd3f665940e614edde1f25890fb064

view details

TensorFlower Gardener

commit sha 3937a2d43ed23ac5d021befd2f0bd9d243ef165b

Merge pull request #34467 from nuka137:fix_xla_g3doc_shapes PiperOrigin-RevId: 281842189 Change-Id: I7da2eb98e4756228517f98d2df9911c5327747f9

view details

Daniel Situnayake

commit sha 5a1c283c85ba2c83606d1083e1e92c5d0c71320d

Improvements to TensorFlow Lite for Microcontrollers hello_world examples PiperOrigin-RevId: 281843628 Change-Id: I9e175c7b8fcb07395a6e832097975bcb1afcbfc8

view details

Nicolas Vasilache

commit sha af687256f42c247f2afd5833da437fc740e639f8

Move Linalg Transforms that are actually Conversions - NFC PiperOrigin-RevId: 281844602 Change-Id: Idc7cc22f1b3d86f3459696e452c75140690a4e8c

view details

Scott Zhu

commit sha 3b0ce23b5dcb8797b1800493c7efd9148a2cf397

Split the build file for keras/applications. PiperOrigin-RevId: 281844845 Change-Id: I6c6f34f8e377d7740b16a26b1383d017bd597185

view details

George Karpenkov

commit sha fea0136cd58c5abe9a3b7555da727f193d2182cc

[XLA GPU] [NFC] Invert guards in control flow for readability PiperOrigin-RevId: 281846806 Change-Id: I39fd55848f76d325818d7a7ac91e98d9b2427aa6

view details

YoungSeok Yoon

commit sha dd9f5e27a80324500af660f900895fe41fa68778

Use TensorFlowLiteBenchmarkC framework to build iOS benchmark app PiperOrigin-RevId: 281849264 Change-Id: I83aa932c3bfd3922c1f9c4b8e33fcb99f1b954bd

view details

Daniel Situnayake

commit sha 8816bdf4d69577002539cc99609c40f1a492aff8

Add Adafruit devices to TensorFlow Lite for Microcontrollers docs PiperOrigin-RevId: 281850325 Change-Id: Ifbd1be47595fad869cf29294db8b869ba2736344

view details

YoungSeok Yoon

commit sha f560b2745e59ac663bb554fc75304d8d3fa9d34e

Enable "use_gpu" option for iOS benchmarking PiperOrigin-RevId: 281850369 Change-Id: Ia2125e3f464804224c91ba378b30e101f0bf8239

view details

Mihai Maruseac

commit sha 70e2ec9d8744ad22e89857c701ade2555a31aa31

Extract directory contents under modular POSIX filesystem. We also provide tests to make sure all API requirements are satisfied. Just a small sized part of work for modular filesystem plugins. For more details, consult the RFC at https://github.com/tensorflow/community/blob/master/rfcs/20190506-filesystem-plugin-modular-tensorflow.md PiperOrigin-RevId: 281850777 Change-Id: Ie3d81c44611bb1fc648d0adb0b92c6204e5fa654

view details

Rachel Lim

commit sha 93a68feb340aca45b3c30fddc02229fc1d667993

[tf.data] Fix deadlock with Prefetch+ParallelMap PiperOrigin-RevId: 281851149 Change-Id: I1b776edb68b45eabc9a0e931135470cae1b6e8f1

view details

push time in 3 months

delete branch rachellim/tensorflow

delete branch : cherrypicks_9AS72

delete time in 3 months

delete branch rachellim/tensorflow

delete branch : cherrypicks_K34FB

delete time in 3 months

create barnchrachellim/tensorflow

branch : cherrypicks_K34FB

created branch time in 3 months

push eventrachellim/tensorflow

William D. Irons

commit sha 73f2e5666cce244852c369d7ba4945a6557ab94e

Update README.md for community ppc64le 2.x links Copying what ROCm did, for ppc64le we are providing stable build links for TensorFlow 1.15 and 2.X releases.

view details

Deven Desai

commit sha f5f3ef09671dabdedf6fdd422dda0d144efbd058

[ROCm] Update ROCm CI builds to use ROCm 2.8 This PR/commit updates the Dockerfile.rocm file to use ROCm version 2.8 (from the current 2.6). Switching to ROCm version 2.8, also adds to the requirement of specifying a couple of extra option to the `docker run` command. That change is also a part of this PR/commit.

view details

Hye Soo Yang

commit sha b44cbb442ae7422440d53a9268e5505de76130e9

Python 3 Migration. //tensorflow/(contrib|python) PiperOrigin-RevId: 281786340 Change-Id: I1c0d6428e2560722d7d244d26ec91af93ea73f4a

view details

Hye Soo Yang

commit sha 6be93e0bcab4a438b15191144adaa2025b09ce9b

Python 3 Migration. //tensorflow/(tools|contrib|python) PiperOrigin-RevId: 281786348 Change-Id: Idc03f3af289295fa478e42b2c83a4269c20b6a06

view details

Yifei Feng

commit sha 1067835d21f81d96b091bbf93b6b4e0e430b8893

Make docker has the same bazel version as the env that invokes it. PiperOrigin-RevId: 281789221 Change-Id: I6b2ebbe4bf787bb2e591905c8e5368cfac793e0e

view details

George Karpenkov

commit sha 746b4d5ac7170eb9c181238999e224a87edd236c

[XLA] Enable dot-strength-reduction rewrite for F64 PiperOrigin-RevId: 281790641 Change-Id: I7137839f0e52d190517fb6c5d3c8c7cfc6fb65bc

view details

Christian Sigg

commit sha b5dfef0366f09ae803ec8e09a47ce2941e98f7c1

Change CUDA tests to use print_memref. Swap dimensions in all-reduce-op test. PiperOrigin-RevId: 281791744 Change-Id: I96e48bb7a273936318523f59ea520b4cdf1b6b7d

view details

Yuanzhong Xu

commit sha 7c986d97cc4123172606d4ba83aa40995aeaf9f3

[MLIR:TF] Define VarHandleOp PiperOrigin-RevId: 281797761 Change-Id: I6fb9fa7a416e7849c0f5d180f654fbe5cb65b4a9

view details

A. Unique TensorFlower

commit sha 1703690e1e9e5f4847e8334f5a633d995d1ccde9

Use default strategy in distributed SavedModel tests. PiperOrigin-RevId: 281797938 Change-Id: I9337207d2272e57b8b18c88de790ccaecc92c577

view details

Benjamin Kramer

commit sha d04bfee67922810d4814c874ba4bd0c5fd451243

[XLA:CPU] Remove the global/module-level fast math flags These are deprecated in favor of instruction-level fast math, and most of LLVM's backend code was updated to use those instead. Not having them gives us more fine-grained control of fast math flags without loss of performance. Disabling UnsafeFPMath has the side effect of requiring __truncdfhf2 for double->half conversions, so provide that. Also always allow FMA formation, while it's not IEEE754 compliant it never decreases accuracy. PiperOrigin-RevId: 281801638 Change-Id: I2d96220fefebad4d11b1dab8f75b06ccb88a05bf

view details

Ashwin Murthy

commit sha 82e2a2f18fbc6b81e0b9047a6ea535becd64942e

[TFLite] - Add function inlining pass in the TF to TFLite pass pipeline. It is flag protected and not included by default. PiperOrigin-RevId: 281802221 Change-Id: I0541801b5d6e0ec8ba5a38dc2ec9cb773f072b6a

view details

Benjamin Kramer

commit sha 34f155c87279a3cc3a4cc9fff0bd3f1bce731586

[XLA:CPU] Set the denormal-fp-math function flag This tells LLVM that flushing denormals to zero is safe. This is what TF does and XLA also gets it from TF's thread pool. FTZ is necessary for getting vectorized code with ARM's neon instructions, which don't support denormal numbers. Add a test case to make sure ARM vectorization is working. This still requires fast math until everything is wired up in LLVM's backend. PiperOrigin-RevId: 281806316 Change-Id: I42ac5e83f0018b51003bb6154457b26eebd7e3f7

view details

Yuefeng Zhou

commit sha 713abe5391bbb1f1064c991df49c2e24fd28c405

Enable util_with_v1_optimizers_test.py test on kokoro windows. PiperOrigin-RevId: 281808681 Change-Id: Ifd51194fc65e8e5d50fc930e4e51b9d36067d0d9

view details

TensorFlower Gardener

commit sha 87b09a592974107df8d210ce3e2fcf8cd4ca0e8a

Merge pull request #34320 from ROCmSoftwarePlatform:google_upsrteam_rocm_update_to_rocm28 PiperOrigin-RevId: 281809391 Change-Id: Iee5883227c89f9a2fe8a6f698890febb566233cf

view details

Henry Tan

commit sha dea9dcde1e5369b17e488df19491400cee8d690b

Fixing grpc namespace to fix some OSS build compatibility issue. PiperOrigin-RevId: 281811660 Change-Id: I3dcb7f9f9760c8d4b9463bd9ac0cfd4206543a30

view details

TensorFlower Gardener

commit sha aee43406353af272d75940c17a0bf49aba066159

Merge pull request #33990 from wdirons:update_readme_for_ppc64le_2_0_builds PiperOrigin-RevId: 281812376 Change-Id: Ia6ddda35bf03943a56f8688129481f374ebe2ee2

view details

Srinivas Vasudevan

commit sha 0c48625520e43bad6c3d461ab88ac6e07b1cf812

Expose ndtri and erfinv under tf.math.ndtri and tf.math.erfinv. PiperOrigin-RevId: 281816005 Change-Id: Idded0bb39c0d32288f1bfa3d0288ba5847aa6fc1

view details

Berkin Ilbeyi

commit sha b97f564442116ce4ea9f6f87ad41f6830337ecfa

[XLA] Ensure begin time of async copy doesn't go beyond end time. PiperOrigin-RevId: 281819839 Change-Id: I1cc1c60e561d10f1f996574f05e4e921d77211ef

view details

Andy Ly

commit sha 847701500a43a40cc44ece767021229b67387e8d

Restrict single island graph canonicalization to only canonicalize if there are no target nodes/control rets. PiperOrigin-RevId: 281819882 Change-Id: Id3b67f0a38ac6db2a6a92a6416a2b07e3670f771

view details

Smit Hinsu

commit sha fee453de92f654161a2b8cd74fc914befbdc064c

Lower tf.ZerosLike op of int or float types to tf.BroadCastTo op PiperOrigin-RevId: 281825299 Change-Id: Ie8a191d8b7f60e712850d98a59a3f45b09e4d0de

view details

push time in 3 months

issue commenttensorflow/tensorflow

Dataset.map() with tf.data.experimental.AUTOTUNE runs out of memory when using batch size=1

This was indeed intriguing. I've submitted a fix, which will be in TF 2.1 :)

EduardoGRocha

comment created time in 3 months

delete branch rachellim/tensorflow

delete branch : cherrypicks_I5UCH

delete time in 3 months

PR opened tensorflow/tensorflow

[tf.data] Fix OOM when tf.data map_and_batch is used with num_paralle…

…l_calls = autotune, batch_size = 1.

Closes #33516.

PiperOrigin-RevId: 281775472 Change-Id: Ie10cea0ef1515d5aff8e3dddadc069ddee1a5a76

+8 -3

0 comment

2 changed files

pr created time in 3 months

create barnchrachellim/tensorflow

branch : cherrypicks_I5UCH

created branch time in 3 months

push eventrachellim/tensorflow

Nicolas Vasilache

commit sha 3878d7aa226126baaf6f9fd21c4f5db392f9c7a0

Update Linalg to use std.view Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear. PiperOrigin-RevId: 279073591

view details

Nicolas Vasilache

commit sha 148f07323f97ef54998f28cd95c195064ce2c426

Update Linalg to use std.view Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear. PiperOrigin-RevId: 279073591 Change-Id: I999b9ec25c924cd895b3d72cb301a43d6fc6bd74

view details

Guangda Lai

commit sha b0f61d8fbbc28a09de0017b93171ad99fb3a30be

Add some changes to trigger copybara import again.

view details

Jacques Pienaar

commit sha db3fb53f0964cd0544c137afa831845877491bd6

Add compatible query method to infer type interface A return type that differs from the inferred return type need not indicate that an operation is invalid (e.g., tensor<*xf32> vs tensor<10xf32>) but they should be compatible for the operation to be considered valid. Add method to query if inferred type is compatible with return type. Also add InferTypeOpIntefaceDefault trait that considers equality and compatibility as the same. Currently an op has to opt in to using it explicitly. PiperOrigin-RevId: 279085639

view details

Nicolas Vasilache

commit sha 184d722b6abdc4cde4d04e065dfef2aecaa70feb

Fix parameter name and document option in linalg::promoteSubViews PiperOrigin-RevId: 279086352

view details

TensorFlower Gardener

commit sha a44da74d9408a103e01a009a9f552b905fcf4ea9

Merge pull request #33157 from jvicenti:master PiperOrigin-RevId: 279075967 Change-Id: I2a6b4f6d0be2a8d5d590149acdf98f2904189890

view details

Andy Davis

commit sha 6fc957dba97726ef3c817cf4124fea658fee0833

Add canonicalizer for ViewOp which folds constants into the ViewOp memref shape and layout map strides and offset. PiperOrigin-RevId: 279088023

view details

Christian Sigg

commit sha d38c6a6434455b331f161bd58a2afb489fa2684d

Temporarily disable CUPTI tests on Windows. PiperOrigin-RevId: 279081007 Change-Id: I88810034398dd98ee7c1c3e07f901b87f4bb7c8d

view details

Adrian Kuegel

commit sha a080453a3b8fa2eb1951a9a667fd0462919432bf

Migrate backend_configs from xla_proto_library to tf_proto_library_cc. PiperOrigin-RevId: 279084129 Change-Id: I6b0dd551b14897ec893feef61c8a8056e820e6ca

view details

Adrian Kuegel

commit sha 504753ea04a9a4dd422b5e6e99d49b9432802c00

Migrate hlo_execution_profile_data and hlo_profile_printer_data to tf_proto_library_cc. PiperOrigin-RevId: 279084247 Change-Id: If8346cfcbf3046f4dcccad486eb4c58856922249

view details

Jacques Pienaar

commit sha 7d4eccf17ed1ffd61216bb8b1170bdefcc2c99ab

Add compatible query method to infer type interface A return type that differs from the inferred return type need not indicate that an operation is invalid (e.g., tensor<*xf32> vs tensor<10xf32>) but they should be compatible for the operation to be considered valid. Add method to query if inferred type is compatible with return type. Also add InferTypeOpIntefaceDefault trait that considers equality and compatibility as the same. Currently an op has to opt in to using it explicitly. PiperOrigin-RevId: 279085639 Change-Id: Ic702e4c3f6d0b5fb249ab7ceb9208074df31cd69

view details

Nicolas Vasilache

commit sha af202872507aae6f544e59167a5626d17b9d65bb

Fix parameter name and document option in linalg::promoteSubViews PiperOrigin-RevId: 279086352 Change-Id: Ib16645867f438db8530c353880c349ff29237924

view details

A. Unique TensorFlower

commit sha 700263d02a8b52c0ff4a2fc2d37416f4a8e3b71d

Add canonicalizer for ViewOp which folds constants into the ViewOp memref shape and layout map strides and offset. PiperOrigin-RevId: 279088023 Change-Id: I36794dc276ed15c5b735603981a5d08b2ec5f465

view details

A. Unique TensorFlower

commit sha 1746a9229a7e232bd2869e2ad3d7cb666c34fedc

Re-write a long macro as a template class instead. This will facilitate changes in the functions that were previously defined by macro calls. The re-write tries to be quite literal: no other refactors or optimizations were done in this change. PiperOrigin-RevId: 279088175 Change-Id: I2d6f24ad53eb70e330b3d3448a5616cb7f6b6cf7

view details

scentini

commit sha 88d59ed7338b67ab7cce136651b669f43ddb7349

Clarify the state of third_party/com_google_absl.patch

view details

TensorFlower Gardener

commit sha 8977382c533ad43864f73b630b496d440acf52f8

Merge pull request #28754 from samikama:GenerateBoxProposalsOp PiperOrigin-RevId: 279101236 Change-Id: Icf3e1b03365161708b906b44b7d544b2a4adba10

view details

Amit Patankar

commit sha b66e4e833c5aacc31d0feaa629f2d064766a7a0b

Export the checkpoint reader classes and functions from C++ to Python with pybind11 instead of swig. This is part of a larger effort to deprecate swig and eventually with modularization break pywrap_tensorflow into smaller components. It will also make exporting C++ ops to Python significantly easier. XLA is using the pybind11 macros already. Please refer to https://github.com/tensorflow/community/blob/master/rfcs/20190208-pybind11.md for more information. PiperOrigin-RevId: 279101529 Change-Id: I25502ed3d3718499abca41f5614681f41e4c7199

view details

River Riddle

commit sha f79fc6d0ae7a4c6a25940ddb027ae011f78380b4

Add Ch-7 of the toy tutorial detailing how to define new types. This chapter adds a new composite type to Toy, and shows the process of adding a new type to the IR, adding and updating operations to use it, and constant folding operations producing it. PiperOrigin-RevId: 279107885

view details

Robert David

commit sha 901537da7f28a3344d7da882018a82394627601f

Vectorize EvalUsingLookupTable using Aarch64 NEON. PiperOrigin-RevId: 279105405 Change-Id: I94b9a4e350cb2dd7bbdbb3c6c5fd1e27adfc85bf

view details

Tom Hennigan

commit sha 2ff9149a7747a48ef177cd13dd8453e3fd3ba05b

Support `nest.{flatten,map_structure}` with reference objects. PiperOrigin-RevId: 279107719 Change-Id: Ifffb644790fe05ede02e4efede127e062091d5b9

view details

push time in 3 months

issue commenttensorflow/tensorflow

Dataset.map() with tf.data.experimental.AUTOTUNE runs out of memory when using batch size=1

Thanks for the repro. Looking into it.

EduardoGRocha

comment created time in 3 months

push eventrachellim/timey-wimey

Rachel Lim

commit sha 3cd9ab56dc754868facf87f7c366bab201c60f89

Create README.md

view details

push time in 3 months

create barnchrachellim/timey-wimey

branch : master

created branch time in 3 months

created repositoryrachellim/timey-wimey

created time in 3 months

startedfastai/fastai_dev

started time in 3 months

startedWorldBrain/Memex

started time in 4 months

Pull request review commenttensorflow/tensorflow

Add tests of RandomDataset

+/* Copyright 2019 The TensorFlow Authors. All Rights Reserved.+Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at+    http://www.apache.org/licenses/LICENSE-2.0+Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#include "tensorflow/core/kernels/data/experimental/random_dataset_op.h"++#include "tensorflow/core/kernels/data/dataset_test_base.h"+#include "tensorflow/core/lib/random/philox_random.h"+#include "tensorflow/core/lib/random/random_distributions.h"++namespace tensorflow {+namespace data {+namespace experimental {+namespace {++constexpr char kNodeName[] = "random_dataset";+constexpr char kIteratorPrefix[] = "Iterator";++// Number of random samples generated per test+constexpr int kCount = 10;++// Generate the first `count` random numbers that the kernel should produce+// for a given seed/seed2 combo.+// For compatibility with the test harness, return value is a vector of scalar+// Tensors.+std::vector<Tensor> GenerateExpectedData(int64 seed, int64 seed2, int count) {+  std::vector<Tensor> ret;+  auto parent_generator = random::PhiloxRandom(seed, seed2);+  auto generator =+      random::SingleSampleAdapter<random::PhiloxRandom>(&parent_generator);++  for (int i = 0; i < count; ++i) {+    ret.push_back(CreateTensor<int64>(TensorShape({}), {generator()}));+  }+  return ret;+}++class RandomDatasetParams : public DatasetParams {+ public:+  RandomDatasetParams(int64 seed, int64 seed2, DataTypeVector output_dtypes,+                      std::vector<PartialTensorShape> output_shapes,+                      string node_name)+      : DatasetParams(std::move(output_dtypes), std::move(output_shapes),+                      std::move(node_name)),+        seed_(CreateTensor<int64>(TensorShape({}), {seed})),+        seed2_(CreateTensor<int64>(TensorShape({}), {seed2})) {}++  virtual std::vector<Tensor> GetInputTensors() const override {+    return {seed_, seed2_};+  }++  virtual Status GetInputNames(+      std::vector<string>* input_placeholder) const override {+    *input_placeholder = {RandomDatasetOp::kSeed, RandomDatasetOp::kSeed2};

s/placeholder/names

frreiss

comment created time in 4 months

pull request commenttensorflow/tensorflow

Add tests of LMDBDataset

This is failing internal tests (click on Details > {TARGET} > Target Log). Can you fix? Thanks.

frreiss

comment created time in 4 months

delete branch rachellim/tensorflow

delete branch : cherrypicks_OG6BP

delete time in 5 months

pull request commenttensorflow/tensorflow

Cherrypick RebatchDataset performance fix

Will defer to @goldiegadde as to whether this should be cherry picked.

rachellim

comment created time in 5 months

pull request commenttensorflow/tensorflow

Cherrypick RebatchDataset performance fix

Note that this changes the behavior of some tf.data.Dataset + distribution strategy code; as such, it may break things that rely on the old (less correct) behavior. I'm not sure what all code exists in 1.15 that this might affect, but thought I should give a heads up, given @tfboyd's comment here: https://github.com/tensorflow/tensorflow/pull/32245#issuecomment-536658498

rachellim

comment created time in 5 months

PR opened tensorflow/tensorflow

Cherrypick RebatchDataset performance fix
  • [tf.data] Add a new RebatchDatasetV2 op that does rebatching (instead of rebatching via graph rewrites) for performance and correctness.

PiperOrigin-RevId: 268544896

+249 -176

0 comment

5 changed files

pr created time in 5 months

create barnchrachellim/tensorflow

branch : cherrypicks_OG6BP

created branch time in 5 months

pull request commenttensorflow/tensorflow

When GlobalJitLevel is on, disable the Grappler memory opt.

Note that this might not be an XLA specific issue; https://github.com/tensorflow/tensorflow/commit/96a407a9186082045e368a680cc3e8af15d85d00 affects performance & behavior of tf.data datasets with rebatching (i.e. distribution strategies).

trentlo

comment created time in 5 months

pull request commenttensorflow/tensorflow

When GlobalJitLevel is on, disable the Grappler memory opt.

We'll cherry pick this into r1.15. As for 2.x, I'll let @goldiegadde comment further.

trentlo

comment created time in 5 months

pull request commenttensorflow/tensorflow

When GlobalJitLevel is on, disable the Grappler memory opt.

Does https://github.com/tensorflow/tensorflow/commit/96a407a9186082045e368a680cc3e8af15d85d00 fix the performance issue? If so, maybe we can cherrypick it into the release?

trentlo

comment created time in 5 months

issue commenttensorflow/tensorflow

Training stalls after saving checkpoint 0

The cause of the hanging here is that parallel_interleave_dataset_op.cc doesn't handle iterator creation errors correctly when the sloppy=True param is set. This fix above makes it handle the error more gracefully (i.e. it raises an error instead of hanging), but the actual dataset iterator creation error here is:

  (0) Unimplemented:  The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
         [[{{node Conv2D}}]]
  (1) Unimplemented:  The Conv2D op currently only supports the NHWC tensor format on the CPU. The op was given the format: NCHW
         [[{{node Conv2D}}]]
         [[Shape_3/_8]]

in the preprocessing function in the t2t code.

Victor-Almeida

comment created time in 5 months

issue closedtensorflow/tensorflow

Training stalls after saving checkpoint 0

Hello.

I'm trying to run the LibriSpeech problem using tensor2tensor on Google Colab's GPU runtime, but the training stalls after saving checkpoint 0 and opening dynamic library libcublas.so.10.0. There is no error message, it just stops there forever. I'm posting it here because the stalling point happens on Tensorflow's packages.

Python's version : 3.6.8 Tensorflow's version : 1.14.0 tensor2tensor's version : 1.14.0 CUDA's version : 10.1 OS : Ubuntu 18.04

This is the code

from tensor2tensor import models
from tensor2tensor.utils import registry

!t2t-trainer \
    --tmp_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/tmp/' \
    --problem='librispeech_clean_small' \
    --model='transformer' \
    --train_steps=10 \
    --hparams_set='transformer_librispeech' \
    --data_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/data/' \
    --output_dir='/content/gdrive/My Drive/TCC/T2T LibriSpeech/output/' \
    --worker-gpu=0

And here's the output :

WARNING: Logging before flag parsing goes to stderr.
W0827 17:43:33.747592 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/expert_utils.py:68: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.

W0827 17:43:35.111425 139969908836224 lazy_loader.py:50] 
The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.

W0827 17:43:36.899833 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/adafactor.py:27: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0827 17:43:36.900365 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/multistep_optimizer.py:32: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead.

W0827 17:43:36.911696 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/mesh_tensorflow/ops.py:4237: The name tf.train.CheckpointSaverListener is deprecated. Please use tf.estimator.CheckpointSaverListener instead.

W0827 17:43:36.911862 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/mesh_tensorflow/ops.py:4260: The name tf.train.SessionRunHook is deprecated. Please use tf.estimator.SessionRunHook instead.

W0827 17:43:36.928164 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/models/research/neural_stack.py:38: The name tf.nn.rnn_cell.RNNCell is deprecated. Please use tf.compat.v1.nn.rnn_cell.RNNCell instead.

W0827 17:43:36.975095 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/rl/gym_utils.py:235: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead.

W0827 17:43:36.993708 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:111: The name tf.OptimizerOptions is deprecated. Please use tf.compat.v1.OptimizerOptions instead.

W0827 17:43:37.006869 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensorflow_gan/python/contrib_utils.py:305: The name tf.estimator.tpu.TPUEstimator is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimator instead.

W0827 17:43:37.007008 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensorflow_gan/python/contrib_utils.py:310: The name tf.estimator.tpu.TPUEstimatorSpec is deprecated. Please use tf.compat.v1.estimator.tpu.TPUEstimatorSpec instead.

W0827 17:43:38.449517 139969908836224 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead.

W0827 17:43:38.449705 139969908836224 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:32: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead.

W0827 17:43:38.449807 139969908836224 deprecation_wrapper.py:119] From /usr/local/bin/t2t-trainer:33: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead.

I0827 17:43:38.450179 139969908836224 t2t_trainer.py:155] Found unparsed command-line arguments. Checking if any start with --hp_ and interpreting those as hparams settings.
W0827 17:43:38.450768 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/bin/t2t_trainer.py:165: The name tf.logging.warn is deprecated. Please use tf.compat.v1.logging.warn instead.

W0827 17:43:38.450837 139969908836224 t2t_trainer.py:165] Found unknown flag: --worker-gpu=0
W0827 17:43:38.451183 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/hparams_lib.py:49: The name tf.gfile.Exists is deprecated. Please use tf.io.gfile.exists instead.

W0827 17:43:38.451832 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:839: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead.

W0827 17:43:38.452693 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:123: The name tf.GraphOptions is deprecated. Please use tf.compat.v1.GraphOptions instead.

W0827 17:43:38.452859 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:129: The name tf.GPUOptions is deprecated. Please use tf.compat.v1.GPUOptions instead.

W0827 17:43:38.453019 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/trainer_lib.py:242: RunConfig.__init__ (from tensorflow.contrib.learn.python.learn.estimators.run_config) is deprecated and will be removed in a future version.
Instructions for updating:
When switching to tf.estimator.Estimator, use tf.estimator.RunConfig instead.
I0827 17:43:38.453181 139969908836224 trainer_lib.py:265] Configuring DataParallelism to replicate the model.
I0827 17:43:38.453252 139969908836224 devices.py:76] schedule=continuous_train_and_eval
I0827 17:43:38.453314 139969908836224 devices.py:77] worker_gpu=1
I0827 17:43:38.453381 139969908836224 devices.py:78] sync=False
W0827 17:43:38.453437 139969908836224 devices.py:141] Schedule=continuous_train_and_eval. Assuming that training is running on a single machine.
I0827 17:43:38.453504 139969908836224 devices.py:170] datashard_devices: ['gpu:0']
I0827 17:43:38.453559 139969908836224 devices.py:171] caching_devices: None
I0827 17:43:38.454001 139969908836224 devices.py:172] ps_devices: ['gpu:0']
I0827 17:43:38.454567 139969908836224 estimator.py:209] Using config: {'_task_type': None, '_task_id': 0, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f4cda9f8438>, '_master': '', '_num_ps_replicas': 0, '_num_worker_replicas': 0, '_environment': 'local', '_is_chief': True, '_evaluation_master': '', '_train_distribute': None, '_eval_distribute': None, '_experimental_max_worker_delay_secs': None, '_device_fn': None, '_tf_config': gpu_options {
  per_process_gpu_memory_fraction: 1.0
}
, '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_secs': None, '_log_step_count_steps': 100, '_protocol': None, '_session_config': gpu_options {
  per_process_gpu_memory_fraction: 0.95
}
allow_soft_placement: true
graph_options {
  optimizer_options {
    global_jit_level: OFF
  }
}
isolate_session_state: true
, '_save_checkpoints_steps': 1000, '_keep_checkpoint_max': 20, '_keep_checkpoint_every_n_hours': 10000, '_model_dir': '/content/gdrive/My Drive/TCC/T2T LibriSpeech/output/', 'use_tpu': False, 't2t_device_info': {'num_async_replicas': 1}, 'data_parallelism': <tensor2tensor.utils.expert_utils.Parallelism object at 0x7f4cda9f84a8>}
W0827 17:43:38.454751 139969908836224 model_fn.py:630] Estimator's model_fn (<function T2TModel.make_estimator_model_fn.<locals>.wrapping_model_fn at 0x7f4cda9e7ae8>) includes params argument, but params are not passed to Estimator.
W0827 17:43:38.454877 139969908836224 trainer_lib.py:783] ValidationMonitor only works with --schedule=train_and_evaluate
W0827 17:43:38.455530 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/bin/t2t_trainer.py:328: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead.

W0827 17:43:38.458196 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/bin/t2t_trainer.py:344: The name tf.gfile.Open is deprecated. Please use tf.io.gfile.GFile instead.

I0827 17:43:38.487565 139969908836224 estimator_training.py:186] Not using Distribute Coordinator.
I0827 17:43:38.487942 139969908836224 training.py:612] Running training and evaluation locally (non-distributed).
I0827 17:43:38.488237 139969908836224 training.py:700] Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
W0827 17:43:38.493283 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
I0827 17:43:38.502703 139969908836224 problem.py:644] Reading data files from /content/gdrive/My Drive/TCC/T2T LibriSpeech/data/librispeech_clean_small-train*
I0827 17:43:38.543926 139969908836224 problem.py:670] partition: 0 num_data_files: 100
W0827 17:43:38.545797 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/data_generators/problem.py:680: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE)` instead. If sloppy execution is desired, use `tf.data.Options.experimental_determinstic`.
W0827 17:43:38.581830 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/common_audio.py:92: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0827 17:43:38.823341 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/common_audio.py:115: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
W0827 17:43:38.987241 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:275: tf_record_iterator (from tensorflow.python.lib.io.tf_record) is deprecated and will be removed in a future version.
Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`
W0827 17:43:40.327878 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:395: DatasetV1.output_shapes (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.compat.v1.data.get_output_shapes(dataset)`.
W0827 17:43:40.328149 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:398: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead.

W0827 17:43:40.328256 139969908836224 data_reader.py:399] Shapes are not fully defined. Assuming batch_size means tokens.
W0827 17:43:40.374079 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/ops/grouping.py:193: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
W0827 17:43:40.414666 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/data_reader.py:231: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead.

I0827 17:43:40.470206 139969908836224 estimator.py:1145] Calling model_fn.
I0827 17:43:40.481091 139969908836224 t2t_model.py:2248] Setting T2TModel mode to 'train'
W0827 17:43:40.552857 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/t2t_model.py:244: The name tf.summary.text is deprecated. Please use tf.compat.v1.summary.text instead.

I0827 17:43:41.160171 139969908836224 api.py:255] Using variable initializer: uniform_unit_scaling
I0827 17:43:41.531091 139969908836224 t2t_model.py:2248] Transforming feature 'inputs' with speech_recognition_modality.bottom
W0827 17:43:41.532868 139969908836224 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/modalities.py:439: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.keras.layers.Conv2D` instead.
I0827 17:43:41.922302 139969908836224 t2t_model.py:2248] Transforming feature 'targets' with symbol_modality_256_384.targets_bottom
I0827 17:43:42.037450 139969908836224 t2t_model.py:2248] Building model body
W0827 17:43:42.094394 139969908836224 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/models/transformer.py:96: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0827 17:43:42.130389 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/common_layers.py:3077: The name tf.layers.Dense is deprecated. Please use tf.compat.v1.layers.Dense instead.

W0827 17:43:42.473380 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/layers/common_attention.py:1249: The name tf.summary.image is deprecated. Please use tf.compat.v1.summary.image instead.

I0827 17:43:49.011597 139969908836224 t2t_model.py:2248] Transforming body output with symbol_modality_256_384.top
W0827 17:43:49.118912 139969908836224 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/tensor2tensor/utils/learning_rate.py:120: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead.

I0827 17:43:49.120072 139969908836224 learning_rate.py:29] Base learning rate: 2.000000
I0827 17:43:49.131614 139969908836224 optimize.py:338] Trainable Variables Total size: 70343552
I0827 17:43:49.131888 139969908836224 optimize.py:338] Non-trainable variables Total size: 5
I0827 17:43:49.132170 139969908836224 optimize.py:193] Using optimizer adam
I0827 17:43:59.596418 139969908836224 estimator.py:1147] Done calling model_fn.
I0827 17:43:59.597772 139969908836224 basic_session_run_hooks.py:541] Create CheckpointSaverHook.
I0827 17:44:03.685569 139969908836224 monitored_session.py:240] Graph was finalized.
2019-08-27 17:44:03.685968: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-27 17:44:03.708726: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-08-27 17:44:03.898700: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 17:44:03.899340: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x207fb80 executing computations on platform CUDA. Devices:
2019-08-27 17:44:03.899389: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Tesla T4, Compute Capability 7.5
2019-08-27 17:44:03.901408: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-08-27 17:44:03.901570: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x207ea00 executing computations on platform Host. Devices:
2019-08-27 17:44:03.901594: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-08-27 17:44:03.901797: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 17:44:03.902276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:00:04.0
2019-08-27 17:44:03.902614: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-27 17:44:03.907500: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-08-27 17:44:03.908556: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-08-27 17:44:03.911851: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-08-27 17:44:03.916549: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-08-27 17:44:03.917606: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-08-27 17:44:03.925044: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-08-27 17:44:03.925147: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 17:44:03.925681: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 17:44:03.926137: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-27 17:44:03.926182: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-08-27 17:44:03.927269: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-27 17:44:03.927290: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-08-27 17:44:03.927300: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-08-27 17:44:03.927408: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 17:44:03.927907: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-08-27 17:44:03.928376: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:40] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2019-08-27 17:44:03.928411: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14325 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:04.0, compute capability: 7.5)
2019-08-27 17:44:07.049904: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
I0827 17:44:09.037412 139969908836224 session_manager.py:500] Running local_init_op.
I0827 17:44:09.280463 139969908836224 session_manager.py:502] Done running local_init_op.
I0827 17:44:18.882892 139969908836224 basic_session_run_hooks.py:606] Saving checkpoints for 0 into /content/gdrive/My Drive/TCC/T2T LibriSpeech/output/model.ckpt.
2019-08-27 17:44:39.361151: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0

closed time in 5 months

Victor-Almeida

issue commenttensorflow/tensorflow

Training stalls after saving checkpoint 0

Thanks for flagging it and the repro instructions. I found the issue in parallel_interleave_dataset_op.cc. Fix: https://github.com/tensorflow/tensorflow/commit/6274f037d4acc9d04cd4aafbda7547a3d89e5674

Victor-Almeida

comment created time in 5 months

issue commenttensorflow/tensorflow

Training stalls after saving checkpoint 0

I managed to get a repro, now trying to get to the bottom of it! (I'm using a GTX 1080 with driver version 430.34)

Victor-Almeida

comment created time in 5 months

issue commenttensorflow/tensorflow

Training stalls after saving checkpoint 0

Yup -- just trying to further narrow down the possible sources of this issue. Let me dig into this a little further. Thanks!

Victor-Almeida

comment created time in 5 months

issue commenttensorflow/tensorflow

Training stalls after saving checkpoint 0

I'm having trouble reproducing this with the following setup: tensorflow-gpu: 1.14.0 tensor2tensor: 1.14.0

$ t2t-trainer --problem=librispeech_clean_small --model=transformer --output_dir=/tmp/t2t_output --data_dir=/tmp/t2t_data/ --save_checkpoints_secs=1800 --schedule=train --hparams_set=transformer_librispeech

(It runs multiple steps without hanging)

Are you encountering this issue with non-gpu tensorflow as well, or just tensorflow-gpu?

Victor-Almeida

comment created time in 5 months

more