profile
viewpoint

Ask questionsProblem with read and get batch from 2d array tfrecords dataset

URL(s) with the issue:

https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata

Description of issue (what needs changing):

Problem with read and get batch from 2d array tfrecords dataset

Clear description

Hello. I use Tensorflow 2.0 version. I have some problems with reading Tfrecords file when get batch.

First, this is my read_tfrecords.py file.

import tensorflow as tf
import os
from glob import glob
import numpy as np


def serialize_example(batch, list1, list2):
    filename = "./train_set.tfrecords"
    writer = tf.io.TFRecordWriter(filename)

    for i in range(batch):
        feature = {}
        feature1 = np.load(list1[i])
        feature2 = np.load(list2[i])
        print('feature1 shape {} feature2 shape {}'.format(feature1.shape, feature2.shape)) 
        feature['input'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature1.flatten()))
        feature['target'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature2.flatten()))

        features = tf.train.Features(feature=feature)
        example = tf.train.Example(features=features)
        serialized = example.SerializeToString()
        writer.write(serialized)
        print("{}th input {} target {} finished".format(i, list1[i], list2[i]))



list_inp = sorted(glob('./input/2d_magnitude/*'))
list_tar = sorted(glob('./target/2d_magnitude/*'))


print(len(list_inp))
serialize_example(len(list_inp), list_inp, list_tar)

My input and target shapes are 2d array (Material of dataset is spectrogram). Therefore, my Tfrecords file includes two features likes [number_of_dataset, x, y]. About 100,000 dataset was successfully saved as Tfrecords file.

And I have problem when I read Tfrecords file to get batch. This is my code read_tfrecords.py:

import tensorflow as tf
import os
import numpy as np

shuffle_buffer_size = 50000
batch_size = 10
record_file = '/data2/dataset/tfrecords/train_set.tfrecords'

raw_dataset = tf.data.TFRecordDataset(record_file)
print('raw_dataset', raw_dataset) # ==> raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>

raw_dataset = raw_dataset.repeat()
print('repeat', raw_dataset) # ==> repeat <RepeatDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)
print('shuffle', raw_dataset) # ==> shuffle <ShuffleDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)
print('batch', raw_dataset) # ==> batch <BatchDataset shapes: (10,), types: tf.string>

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy()) # ==> read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted

print('parsed', parsed) # ==> ''

input = parsed.features.feature['input'].float_list.value
print('input', input) # ==> []
target = parsed.features.feature['target'].float_list.value
print('target', target) # ==> []

Here are results from code:

raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>
repeat <RepeatDataset shapes: (), types: tf.string>
shuffle <ShuffleDataset shapes: (), types: tf.string>
batch <BatchDataset shapes: (10,), types: tf.string>
read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted
  parsed = tf.train.Example.FromString(raw_example.numpy())
parsed
input []
target []

As a result, I wonder how I get the batch from Tfrecords file to train. read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted Could you give advice? Thank you very much.

Usage example

Maybe...

raw_dataset = tf.data.TFRecordDataset(record_file)

raw_dataset = raw_dataset.repeat()

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy())

input = parsed.features.feature['input'].float_list.value
target = parsed.features.feature['target'].float_list.value
tensorflow/tensorflow

Answer questions aaudiber

If your data is complex numbers, you can use the tf.complex64 Tensorflow type

useful!

Related questions

ModuleNotFoundError: No module named 'tensorflow.contrib' hot 8
Error occurred when finalizing GeneratorDataset iterator hot 6
ModuleNotFoundError: No module named 'tensorflow.contrib'
When importing TensorFlow, error loading Hadoop
tf.keras.layers.Conv1DTranspose ?
tensorflow-gpu CUPTI errors hot 4
[TF 2.0] tf.keras.optimizers.Adam hot 4
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning. hot 4
TF2.0 AutoGraph issue hot 4
Tf.Keras metrics issue hot 4
module 'tensorflow' has no attribute 'ConfigProto' hot 4
TF 2.0 'Tensor' object has no attribute 'numpy' while using .numpy() although eager execution enabled by default hot 4
ModuleNotFoundError: No module named 'tensorflow.examples.tutorials' hot 4
AttributeError: module &#39;tensorflow.python.framework.op_def_registry&#39; has no attribute &#39;register_op_list&#39; hot 4
tensorflow2.0 detected 'xla_gpu' , but 'gpu' expected hot 3
Github User Rank List