profile
viewpoint
June-Woo Kim kaen2891 Kyungpook National University Daegu, South Korea https://kaen2891.tistory.com/

kaen2891/BEP 1

Basic Education Programming: Image Processing

kaen2891/spec_augment 1

🔦 A Pytorch implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

kaen2891/End-to-end-ASR-Pytorch 0

This is an open source project (formerly named Listen, Attend and Spell - PyTorch Implementation) for end-to-end ASR implemented with Pytorch, the well known deep learning toolkit.

kaen2891/kaen2891.github.io 0

Research Results

kaen2891/Multi_DTW 0

Multi_DTW_with_Spectrogram

kaen2891/specAugment 0

Tensor2tensor experiment with SpecAugment

kaen2891/SpecAugment-1 0

A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain

kaen2891/tensor2tensor 0

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

kaen2891/utils 0

Some util things for using deep learning

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 8ab650d36bac6f8b80706c0fa5dea51ef4e5528e

update

view details

push time in 6 days

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 1fd8bc9efcbef4c43c20832647a782a2c0ccd11a

update README

view details

push time in 6 days

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha d3a43c242120ec3e4edf9ff93c70a6b352042334

upload results

view details

push time in 6 days

push eventkaen2891/utils

June-Woo Kim

commit sha d07431ae941192c2a60e3a5fedea00f637fc0809

add normalize

view details

push time in 23 days

push eventkaen2891/utils

June-Woo Kim

commit sha 73dd05d683436d727a126f3b038ef0704a0375a9

add read tfrecords file for training in tensorflow 2.0

view details

push time in a month

push eventkaen2891/utils

June-Woo Kim

commit sha 004c8d4eee216e8c8b0a9c7247905528c9bc65f9

add write tfrecords code

view details

push time in a month

create barnchkaen2891/utils

branch : master

created branch time in a month

created repositorykaen2891/utils

Some util things for using deep learning

created time in a month

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 3e1ee4c1f0cc61779e15b278b5287f3dd6ba304b

upload main

view details

push time in a month

issue closedtensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

URL(s) with the issue:

https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata

Description of issue (what needs changing):

Problem with read and get batch from 2d array tfrecords dataset

Clear description

Hello. I use Tensorflow 2.0 version. I have some problems with reading Tfrecords file when get batch.

First, this is my read_tfrecords.py file.

import tensorflow as tf
import os
from glob import glob
import numpy as np


def serialize_example(batch, list1, list2):
    filename = "./train_set.tfrecords"
    writer = tf.io.TFRecordWriter(filename)

    for i in range(batch):
        feature = {}
        feature1 = np.load(list1[i])
        feature2 = np.load(list2[i])
        print('feature1 shape {} feature2 shape {}'.format(feature1.shape, feature2.shape)) 
        feature['input'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature1.flatten()))
        feature['target'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature2.flatten()))

        features = tf.train.Features(feature=feature)
        example = tf.train.Example(features=features)
        serialized = example.SerializeToString()
        writer.write(serialized)
        print("{}th input {} target {} finished".format(i, list1[i], list2[i]))



list_inp = sorted(glob('./input/2d_magnitude/*'))
list_tar = sorted(glob('./target/2d_magnitude/*'))


print(len(list_inp))
serialize_example(len(list_inp), list_inp, list_tar)

My input and target shapes are 2d array (Material of dataset is spectrogram). Therefore, my Tfrecords file includes two features likes [number_of_dataset, x, y]. About 100,000 dataset was successfully saved as Tfrecords file.

And I have problem when I read Tfrecords file to get batch. This is my code read_tfrecords.py:

import tensorflow as tf
import os
import numpy as np

shuffle_buffer_size = 50000
batch_size = 10
record_file = '/data2/dataset/tfrecords/train_set.tfrecords'

raw_dataset = tf.data.TFRecordDataset(record_file)
print('raw_dataset', raw_dataset) # ==> raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>

raw_dataset = raw_dataset.repeat()
print('repeat', raw_dataset) # ==> repeat <RepeatDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)
print('shuffle', raw_dataset) # ==> shuffle <ShuffleDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)
print('batch', raw_dataset) # ==> batch <BatchDataset shapes: (10,), types: tf.string>

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy()) # ==> read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted

print('parsed', parsed) # ==> ''

input = parsed.features.feature['input'].float_list.value
print('input', input) # ==> []
target = parsed.features.feature['target'].float_list.value
print('target', target) # ==> []

Here are results from code:

raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>
repeat <RepeatDataset shapes: (), types: tf.string>
shuffle <ShuffleDataset shapes: (), types: tf.string>
batch <BatchDataset shapes: (10,), types: tf.string>
read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted
  parsed = tf.train.Example.FromString(raw_example.numpy())
parsed
input []
target []

As a result, I wonder how I get the batch from Tfrecords file to train. read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted Could you give advice? Thank you very much.

Usage example

Maybe...

raw_dataset = tf.data.TFRecordDataset(record_file)

raw_dataset = raw_dataset.repeat()

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy())

input = parsed.features.feature['input'].float_list.value
target = parsed.features.feature['target'].float_list.value

closed time in a month

kaen2891

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha a63b85c71abc672b5c0761865373a9e17e802417

0515

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha d511c9c89551f1e1900eaab2892f178c62dde2e5

update

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 8199bd645b020059db208c52f5c4aedb7abd092b

README update

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 11c642101e4e2fb848d9cc5ee735e44f9d85c355

status

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 3981e449ceb6aefab1413db6b0762a8b513f1818

README change

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 16ed7586b2f3c0a6f25c6b30180996b51f171bb8

0514

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 2de64267f9d004c0099313ce83865541c31517d5

change abstract

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 16b5349815ca1b1cba3d7edc1e89e7e1928dd03e

add

view details

push time in 2 months

startedF-Tag/python-vad

started time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@aaudiber, Thanks to you, I solve my problem. It worked as you said using FixedLenSequenceFeature. Thanks for your kindness.

On the other hand, is it possible to change complex ndarray to tfrecords file? If possible, what kinds of format do I use for change? tf.float32?

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@aaudiber , when I followed your comment, it doesn't work. Same error message. Could you check error message?

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@aaudiber, No. Not yet. Could you explain in more detail and kindness? In the example, it is not possible to put a batch. And I can't understand your words. When I add your comment to my code, I get the following error: tensorflow.python.framework.errors_impl.InvalidArgumentError: Key: input. Can't parse serialized Example. I wrote code with this reference.

I'll share my code via Colab. Could you solve this? In fact, if the data is small, I can just use directly, but in my case it is over 700GB. So I must use TFrecord to run the model. Please help me.

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@aaudiber , Okay. Then how can I read TFrecord to get batch? It's pretty hard to find information when I read TFrecord to get batch. Could you change the colab code?

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@Saduf2019 I don't know what you mean. Because the shared colab code is not changing. So should I just wait for this problem?

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@Saduf2019

The Tfrecords file that I made as an example is a conversion of datasets in the form of inp (100, 201, 22) and tar (100,201,23). Actually, it's same shape and format of the speech dataset which I actually have. For convenience, it is created with np.random.uniform(0,1,(100,201, 22)). Would you check it please?

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@Saduf2019

Okay. I changed code and now you can load the Tfrecords file in here

kaen2891

comment created time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 7f24bddaaaebfd775113229cd5aba591c4b50e8f

name changed

view details

push time in 2 months

push eventkaen2891/kaen2891.github.io

June-Woo Kim

commit sha 588d74ded9c0ebe7ccddfd27803b93b6f0a60a21

Notice for dataset

view details

push time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@Saduf2019

I don't know how to share the file with you. Instead of, I will share the link via GoogleDrive.

Thanks a lot.

kaen2891

comment created time in 2 months

issue commenttensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

@Saduf2019

I uploaded the code here.

Could you give me an advice? Thanks.

kaen2891

comment created time in 2 months

issue openedtensorflow/tensorflow

Problem with read and get batch from 2d array tfrecords dataset

URL(s) with the issue:

https://www.tensorflow.org/tutorials/load_data/tfrecord#tfrecord_files_using_tfdata

Description of issue (what needs changing):

Problem with read and get batch from 2d array tfrecords dataset

Clear description

I have some problems with reading Tfrecords file when get batch.

First, this is my read_tfrecords.py file.

import tensorflow as tf
import os
from glob import glob
import numpy as np


def serialize_example(batch, list1, list2):
    filename = "./train_set.tfrecords"
    writer = tf.io.TFRecordWriter(filename)

    for i in range(batch):
        feature = {}
        feature1 = np.load(list1[i])
        feature2 = np.load(list2[i])
        print('feature1 shape {} feature2 shape {}'.format(feature1.shape, feature2.shape)) 
        feature['input'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature1.flatten()))
        feature['target'] = tf.train.Feature(float_list=tf.train.FloatList(value=feature2.flatten()))

        features = tf.train.Features(feature=feature)
        example = tf.train.Example(features=features)
        serialized = example.SerializeToString()
        writer.write(serialized)
        print("{}th input {} target {} finished".format(i, list1[i], list2[i]))



list_inp = sorted(glob('./input/2d_magnitude/*'))
list_tar = sorted(glob('./target/2d_magnitude/*'))


print(len(list_inp))
serialize_example(len(list_inp), list_inp, list_tar)

My input and target shapes are 2d array. Therefore, my Tfrecords file includes two features likes [number_of_dataset, x, y]. About 100,000 dataset was successfully saved as Tfrecords file.

And I have problem when I read Tfrecords file to get batch. This is my code read_tfrecords.py:

import tensorflow as tf
import os
import numpy as np

shuffle_buffer_size = 50000
batch_size = 10
record_file = '/data2/dataset/tfrecords/train_set.tfrecords'

raw_dataset = tf.data.TFRecordDataset(record_file)
print('raw_dataset', raw_dataset) # ==> raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>

raw_dataset = raw_dataset.repeat()
print('repeat', raw_dataset) # ==> repeat <RepeatDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)
print('shuffle', raw_dataset) # ==> shuffle <ShuffleDataset shapes: (), types: tf.string>

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)
print('batch', raw_dataset) # ==> batch <BatchDataset shapes: (10,), types: tf.string>

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy()) # ==> read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted

print('parsed', parsed) # ==> ''

input = parsed.features.feature['input'].float_list.value
print('input', input) # ==> []
target = parsed.features.feature['target'].float_list.value
print('target', target) # ==> []

Here are results from code:

raw_dataset <TFRecordDatasetV2 shapes: (), types: tf.string>
repeat <RepeatDataset shapes: (), types: tf.string>
shuffle <ShuffleDataset shapes: (), types: tf.string>
batch <BatchDataset shapes: (10,), types: tf.string>
read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted
  parsed = tf.train.Example.FromString(raw_example.numpy())
parsed
input []
target []

As a result, I wonder how I get the batch from Tfrecords file to train. read_tfrecords.py:25: RuntimeWarning: Unexpected end-group tag: Not all data was converted Could you give advice? Thank you very much.

Usage example

Maybe...

raw_dataset = tf.data.TFRecordDataset(record_file)

raw_dataset = raw_dataset.repeat()

raw_dataset = raw_dataset.shuffle(shuffle_buffer_size)

raw_dataset = raw_dataset.batch(batch_size, drop_remainder=True)

raw_example = next(iter(raw_dataset)) 

parsed = tf.train.Example.FromString(raw_example.numpy())

input = parsed.features.feature['input'].float_list.value
target = parsed.features.feature['target'].float_list.value

created time in 3 months

more