profile
viewpoint

CoreyCole/.emacs.d 0

Emacs configuration

CoreyCole/alexa-skills-kit-nodejs-lambda-boilerplate 0

:hotsprings: A boilerplate for Alexa Skills on Node.js Lambdas

CoreyCole/angular 0

One framework. Mobile & desktop.

CoreyCole/angular2-seed-sass 0

Seed project for Angular 2 apps using Sass

CoreyCole/angular2-starter 0

:star: Angular 2 Starter for TypeScript with Gulp, SystemJS, SASS, Codelyzer, Karma, Jasmine, Istanbul, Protractor with CI and Coveralls (Updated to 2.0.0-rc.1!)

CoreyCole/angular2-store-example 0

WIP: @ngrx/store Best Practices

CoreyCole/audio-record 0

Python Application which records and displays raw audio data

CoreyCole/aws-cloudformation-fargate 0

Sample CloudFormation templates for how to run Docker containers in AWS Fargate with various networking configurations

startedadamstark/Chord-Detector-and-Chromagram

started time in 3 days

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

@bjacob setting ruy to false on arm throws a compile error, but that is likely out of the scope of this issue.

I did an additional x86 benchmark compiling with some different flags and managed to eek out a bit more performance, but still TRANSPOSE_CONV operator takes up the vast majority of compute time:

bazel build -c opt \
  --config=mkl --copt=-mavx --copt=-msse4.1 --copt=-msse4.2 \
  tensorflow/lite/tools/benchmark:benchmark_model_plus_flex
============================== Top by Computation Time ==============================
	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	          TRANSPOSE_CONV	          145.739	  334.845	  327.682	 30.711%	 30.711%	     0.000	        1	[device_0/g_ae/dec_0/conv2d_transpose1]:102
	          TRANSPOSE_CONV	          474.519	   98.657	   98.284	  9.211%	 39.922%	     0.000	        1	[device_0/g_ae/dec_1/conv2d_transpose1]:113
	          TRANSPOSE_CONV	          573.890	   68.139	   67.579	  6.334%	 46.255%	     0.000	        1	[device_0/g_ae/dec_2/conv2d_transpose1]:124
	          TRANSPOSE_CONV	          975.095	   56.976	   57.059	  5.348%	 51.603%	     0.000	        1	[device_0/g_ae/dec_9/conv2d_transpose1]:201
	          TRANSPOSE_CONV	          902.487	   52.394	   53.175	  4.984%	 56.587%	     0.000	        1	[device_0/g_ae/dec_8/conv2d_transpose1]:190
	          TRANSPOSE_CONV	          643.654	   51.886	   51.722	  4.847%	 61.434%	     0.000	        1	[device_0/g_ae/dec_3/conv2d_transpose1]:135
	          TRANSPOSE_CONV	          848.040	   44.327	   45.076	  4.225%	 65.659%	     0.000	        1	[device_0/g_ae/dec_7/conv2d_transpose1]:179
	          TRANSPOSE_CONV	          794.084	   43.691	   44.680	  4.187%	 69.846%	     0.000	        1	[device_0/g_ae/dec_6/conv2d_transpose1]:168
	          TRANSPOSE_CONV	          697.544	   47.731	   44.555	  4.176%	 74.022%	     0.000	        1	[device_0/g_ae/dec_4/conv2d_transpose1]:146
	          TRANSPOSE_CONV	          746.798	   42.475	   42.731	  4.005%	 78.027%	     0.000	        1	[device_0/g_ae/dec_5/conv2d_transpose1]:157

Number of nodes executed: 216
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	          TRANSPOSE_CONV	       11	   847.637	    79.448%	    79.448%	     0.000	       11
	                     ABS	       21	   106.461	     9.978%	    89.426%	     0.000	       21
	                 CONV_2D	       11	    79.186	     7.422%	    96.848%	     0.000	       11
	                     ADD	       32	     6.867	     0.644%	    97.492%	     0.000	       32
	                 RESHAPE	       44	     6.518	     0.611%	    98.103%	     0.000	       44
	                     MUL	       42	     6.445	     0.604%	    98.707%	     0.000	       42
	                     SUB	       21	     5.671	     0.532%	    99.239%	     0.000	       21
	                    RELU	       21	     4.125	     0.387%	    99.625%	     0.000	       21
	           CONCATENATION	       11	     3.426	     0.321%	    99.946%	     0.000	       11
	      TfLiteFlexDelegate	        1	     0.461	     0.043%	    99.990%	     0.000	        1
	                    TANH	        1	     0.111	     0.010%	   100.000%	     0.000	        1

Timings (microseconds): count=50 first=1067538 curr=1117788 min=1041534 max=1142071 avg=1.067e+06 std=24935

float32_mavx_msse4.1_msse4.2_benchmark.txt

CoreyCole

comment created time in 2 months

issue commenttensorflow/tensorflow

How to compile tensorflow using SSE4.1, SSE4.2, and AVX.

This article was a good tutorial on how to build from source including the flags https://medium.com/@pierreontech/setup-a-high-performance-conda-tensorflow-environment-976995158cb1

try forcing the inclusion of the appropriate extensions using additional bazel options like --copt=-mavx --copt=-msse4.1 --copt=-msse4.2

ajl412860

comment created time in 2 months

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

how can I disable ruy on arm so I can get a comparable benchmark to the x86 without ruy?

CoreyCole

comment created time in 2 months

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

From comparing these two architectures, the model seems to perform about 80% faster on arm while my clock speed on x86 is 2x faster.

x86 avg time = 1.1e+07 arm avg time = 1.9+06

x86 max gHz = 5.0 arm max gHz = 2.3

x86 timings:
Timings (microseconds): count=15 first=10617766 curr=10610651 min=10610651 max=10833023 avg=1.06631e+07 std=73060

arm timings:
Timings (microseconds): count=50 first=1849736 curr=1847769 min=1844926 max=1857462 avg=1.85012e+06 std=2777

( 1.1e7 - 1.8e6 ) / (1.1e7) ~~ 80% time reduction
CoreyCole

comment created time in 2 months

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

I wanted to see if this problem was specific to x86, so I built the benchmark tool on an AWS A1 arm instance. I did two benchmarks of the same model, one with --define=ruy_profiler=true and one without.

bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_flex \
  --graph=.../converted_model_float32.tflite --num_threads=4 --enable_op_profiling=true --verbose=true > ~/float32_arm_benchmark.txt

float32_arm_ruy_benchmark.txt float32_arm_benchmark.txt

CoreyCole

comment created time in 2 months

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

That flag seemed to work! It put out more output about cpu_backend_gemm::Gemm

float32_ruy_benchmark.txt

CoreyCole

comment created time in 2 months

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

I did some profiling of the TransposeConvV2 while running the benchmarking.

transpose_conv_profiling.txt

Evidently, the largest time sink in the function is the cpu_backend_gemm::Gemm that is called in the batch_size for loop.

I stepped through the cpu_backend_gemm.h code with gdb to check what this cpu_backend_gemm::Gemm function was doing. I noticed must_use_ruy and try_custom_gemv are both false, whether I build the benchmark tool with --define=tflite_with_ruy=true or not.

CoreyCole

comment created time in 2 months

issue commenttensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

Hey @renjie-liu, I added that ruy flag when building the benchmarking tool (from master 86db5756535f70f1b1fab61c6f3f0483141510e8)

bazel build -c opt \
  --config=monolithic --define=tflite_with_ruy=true \
  tensorflow/lite/tools/benchmark:benchmark_model_plus_flex

And then I ran the benchmarking tool on my float32 TFLite model

bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_flex \
  --graph=.../converted_model_float32.tflite --num_threads=4 --enable_op_profiling=true > ~/Desktop/float32_ruy_benchmark.txt

It seems it is running even more slowly and there is no additional information in the output (comparing to my original benchmark output linked in my original post). Do I need to run the benchmarking tool a different way to see the results of the ruy profiler?

float32_ruy_benchmark.txt

CoreyCole

comment created time in 2 months

delete branch CoreyCole/HypDB

delete branch : dependabot/npm_and_yarn/demo/client/papaparse-5.2.0

delete time in 2 months

push eventCoreyCole/HypDB

dependabot[bot]

commit sha f06e7c5d67470b2793772e1ccf2e81c2576c6ed1

chore(deps): bump papaparse from 4.4.0 to 5.2.0 in /demo/client Bumps [papaparse](https://github.com/mholt/PapaParse) from 4.4.0 to 5.2.0. - [Release notes](https://github.com/mholt/PapaParse/releases) - [Commits](https://github.com/mholt/PapaParse/compare/4.4.0...5.2.0) Signed-off-by: dependabot[bot] <support@github.com>

view details

Corey Cole

commit sha 77d6e77e75c158ec53d7bde567dd6cf3461f4d8f

Merge pull request #65 from CoreyCole/dependabot/npm_and_yarn/demo/client/papaparse-5.2.0 chore(deps): bump papaparse from 4.4.0 to 5.2.0 in /demo/client

view details

push time in 2 months

PR merged CoreyCole/HypDB

chore(deps): bump papaparse from 4.4.0 to 5.2.0 in /demo/client dependencies

Bumps papaparse from 4.4.0 to 5.2.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/mholt/PapaParse/releases">papaparse's releases</a>.</em></p> <blockquote> <h2>Release 5.2.0</h2> <p>We are happy to announce version 5.2.0.</p> <p>This version contains a new feature that allows to perform post request when downloading files.</p> <p>It also fixes a ReDOS vulnerability issue. See <a href="https://github-redirect.dependabot.com/mholt/PapaParse/issues/777">mholt/PapaParse#777</a> for more details.</p> <h2>5.1.0</h2> <p>The release brings a the option to use a function to determine the fields that will be quoted. This function accepts the cell value and column index as parameters. Thanks to <a href="https://github.com/Puzzleton">@Puzzleton</a> for contributing this feature.</p> <p>This release also includes some bug fixes. Thanks to all who contributed.</p> <h2>Version 5.0.0</h2> <p>We are happy to announce a new major release of PapaParse.</p> <p>This relase (5.0.0) introduces the following changes:</p> <ul> <li>We drop support for Node 6.x branch.</li> <li>Workers are now loaded with inline blobs, which means that is only needed to specify the worker: true option and Paparse will load it's code from a blob</li> <li>Step function returns only a single row</li> <li>A function to transform header columns is added. The trimheader option is removed as it can be achieved with this new transform function.</li> <li>The API throws now Error objects instead of using error Strings.</li> <li>Handle delimiter guessing, when not all of the fields are quoted</li> <li>Add ability to support escapeChar on unparse</li> <li>Allow to specify the columns used for unparse</li> <li>Add DelimitersToGuess config option</li> </ul> <p>Happy parsing!</p> <h2>5.0.0-beta.0</h2> <p>This is the first beta version of the 5 major version, which introduces the following changes:</p> <ul> <li>We drop support for Node 6.x branch.</li> <li>Workers are now loaded with inline blobs, which means that is only needed to specify the worker: true option and Paparse will load it's code from a blob</li> <li>Step function returns only a single row</li> <li>A function to transform header columns is added. The trimheader option is removed as it can be achieved with this new transform function.</li> <li>The API throws now Error objects instead of using error Strings.</li> </ul> <p>Please test it and report issues if you have on it. As this is a beta version it should be installed with the following command:</p> <p><code>npm install papaparse@beta</code></p> <h2>4.6.0</h2> <p>The release brings a the option to skip lines with no content but with separators.</p> <p>Thanks to <a href="https://github.com/MonkeyDZeke">@MonkeyDZeke</a> for it's contribution</p> <h2>4.5.0</h2> <p>This release brings several bug fixes and the the following improvements:</p> <!-- raw HTML omitted --> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/mholt/PapaParse/commit/4b192deef192e8025eabf0ac4a77f3c559b85baa"><code>4b192de</code></a> Minor version bump</li> <li><a href="https://github.com/mholt/PapaParse/commit/235a12758cd77266d2e98fd715f53536b34ad621"><code>235a127</code></a> Avoid ReDOS on float dynamic typing (<a href="https://github-redirect.dependabot.com/mholt/PapaParse/issues/779">#779</a>)</li> <li><a href="https://github.com/mholt/PapaParse/commit/a4cf371ff291ccb6b269a2a5872317fe83c31267"><code>a4cf371</code></a> Improve downloadRequestBody documentation</li> <li><a href="https://github.com/mholt/PapaParse/commit/e934deb1f61e2df8cdf6878513ada051b526d620"><code>e934deb</code></a> Support POST method when download is true</li> <li><a href="https://github.com/mholt/PapaParse/commit/7ec146cbc412189cfa2af87376f75cec961cc390"><code>7ec146c</code></a> Using self instead of this to preserve binding. (<a href="https://github-redirect.dependabot.com/mholt/PapaParse/issues/769">#769</a>)</li> <li><a href="https://github.com/mholt/PapaParse/commit/3497ded575f329095fad38c48e2196dd0e87ae4b"><code>3497ded</code></a> Patch version bump</li> <li><a href="https://github.com/mholt/PapaParse/commit/ae73d2a96639beec58a83326de6bd8e8ca0c02b3"><code>ae73d2a</code></a> Use chunk size to determine the processed length</li> <li><a href="https://github.com/mholt/PapaParse/commit/a318396c9d7ac9baab4731c39e6ee748048f3eac"><code>a318396</code></a> Reword newline docs</li> <li><a href="https://github.com/mholt/PapaParse/commit/47b356d6e06cdc2fc7d65c3710a2b5f7cd048b1a"><code>47b356d</code></a> <a href="https://github-redirect.dependabot.com/mholt/PapaParse/issues/727">#727</a> update delimiter and newline index if they are earlier than the current ...</li> <li><a href="https://github.com/mholt/PapaParse/commit/7ad8dda68c6bbfd0e1115dd99b997b2997f836da"><code>7ad8dda</code></a> Address deepEqual using compare by JSON strings. (<a href="https://github-redirect.dependabot.com/mholt/PapaParse/issues/724">#724</a>)</li> <li>Additional commits viewable in <a href="https://github.com/mholt/PapaParse/compare/4.4.0...5.2.0">compare view</a></li> </ul> </details> <br />

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+49 -17

0 comment

2 changed files

dependabot[bot]

pr closed time in 2 months

PullRequestReviewEvent

issue openedtensorflow/tensorflow

TFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): tf-nightly 2.4.0.dev20200902
  • Python version: 3.6.9
  • Bazel version (if compiling from source): 3.5.0
  • GCC/Compiler version (if compiling from source): gcc 7.5.0
  • CUDA/cuDNN version: CUDA 10.1
  • GPU model and memory: Using CPU (Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz)

Describe the current behavior The TRANSPOSE_CONV operator takes up >80% of total computation time when using the TFLite benchmarking tool.

============================ Top by Computation Time ==============================
	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	          TRANSPOSE_CONV	          235.882	  487.948	  482.097	 26.927%	 26.927%	     0.000	        1	[device_0/g_ae/dec_0/conv2d_transpose1]:102
	          TRANSPOSE_CONV	          719.253	  163.323	  162.187	  9.059%	 35.986%	     0.000	        1	[device_0/g_ae/dec_1/conv2d_transpose1]:113
	          TRANSPOSE_CONV	         1634.919	  108.162	  111.988	  6.255%	 42.241%	     0.000	        1	[device_0/g_ae/dec_9/conv2d_transpose1]:201
	          TRANSPOSE_CONV	         1501.813	  116.089	  109.471	  6.114%	 48.355%	     0.000	        1	[device_0/g_ae/dec_8/conv2d_transpose1]:190
	          TRANSPOSE_CONV	          882.714	  111.952	  108.459	  6.058%	 54.413%	     0.000	        1	[device_0/g_ae/dec_2/conv2d_transpose1]:124
	          TRANSPOSE_CONV	          993.583	  103.796	   97.807	  5.463%	 59.876%	     0.000	        1	[device_0/g_ae/dec_3/conv2d_transpose1]:135
	          TRANSPOSE_CONV	         1287.885	   92.329	   95.829	  5.352%	 65.229%	     0.000	        1	[device_0/g_ae/dec_6/conv2d_transpose1]:168
	          TRANSPOSE_CONV	         1394.631	  109.527	   95.786	  5.350%	 70.579%	     0.000	        1	[device_0/g_ae/dec_7/conv2d_transpose1]:179
	          TRANSPOSE_CONV	         1093.908	   92.043	   93.959	  5.248%	 75.827%	     0.000	        1	[device_0/g_ae/dec_4/conv2d_transpose1]:146
	          TRANSPOSE_CONV	         1193.164	   88.003	   89.509	  4.999%	 80.826%	     0.000	        1	[device_0/g_ae/dec_5/conv2d_transpose1]:157

Number of nodes executed: 216
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	          TRANSPOSE_CONV	       11	  1466.204	    81.898%	    81.898%	     0.000	       11
	                 CONV_2D	       11	   159.086	     8.886%	    90.785%	     0.000	       11
	                     ABS	       21	   111.310	     6.217%	    97.002%	     0.000	       21
	                     ADD	       32	    11.551	     0.645%	    97.647%	     0.000	       32
	                     MUL	       42	    10.792	     0.603%	    98.250%	     0.000	       42
	                 RESHAPE	       44	     9.645	     0.539%	    98.789%	     0.000	       44
	                     SUB	       21	     9.366	     0.523%	    99.312%	     0.000	       21
	                    RELU	       21	     6.514	     0.364%	    99.676%	     0.000	       21
	           CONCATENATION	       11	     5.129	     0.286%	    99.962%	     0.000	       11
	      TfLiteFlexDelegate	        1	     0.430	     0.024%	    99.986%	     0.000	        1
	                    TANH	        1	     0.245	     0.014%	   100.000%	     0.000	        1

Describe the expected behavior Faster execution of this operator. I expected my model to run inference faster once I converted to TFLite, but currently it is running more slowly than regular tensorflow on the same hardware.

Standalone code to reproduce the issue

bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_flex \
  --graph=.../converted_model_float32.tflite --num_threads=4 --enable_op_profiling=true > .../float32_benchmark.txt

(The benchmarking tool was built from tf master source commit 86db5756535f70f1b1fab61c6f3f0483141510e8)

Other info / logs Full TFLite benchmark output

I'd appreciate any tips for how I could profile this operator. How can I find out why it is taking so much time in my network? Can I use C++ profiling tools to find the computation time sinks in the transpose_conv.h?

created time in 2 months

PR opened mfarhan12/audio-record

Update README.txt

typo

+1 -1

0 comment

1 changed file

pr created time in 2 months

push eventCoreyCole/audio-record

Corey Cole

commit sha 21603bb543d098da4a947a2d424b3172d0978388

Update README.txt typo

view details

push time in 2 months

fork CoreyCole/audio-record

Python Application which records and displays raw audio data

fork in 2 months

issue closedtensorflow/tensorflow

TFLite Int8 Quantization Conversion - There are unresolved custom ops: []

Error

RuntimeError: Failed to initialize op resolver for calibration:
There are unresolved custom ops: []Encountered unresolved custom op: RandomStandardNormal.Node number 0 (RandomStandardNormal) failed to prepare.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (or github SHA if from source): 2.3.0

Command used to run the converter or code if you’re using the Python API If possible, please share a link to Colab/Jupyter/any notebook.

import os, glob
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow.compat.v1 as tf

from tensorflow.python.client import device_lib
import numpy as np

import faulthandler
faulthandler.enable()

import pdb

path = os.path.dirname(os.path.abspath(__file__))
pb_model_name = "model.pb"
pb_model_path = os.path.join(path, pb_model_name)

model_name = 'model_int8.tflite'

if __name__ == '__main__':
  with tf.device("/cpu:0"):
    configuration = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
    with tf.Session(config=configuration) as sess:
      converter = tf.lite.TFLiteConverter.from_frozen_graph(
        graph_def_file=pb_model_path,
        # input_arrays=["device_0/wav_and_noisy:1"],
        input_arrays=["device_0/wav_and_noisy:1"],
        output_arrays=["device_0/g_ae_1/Tanh"],
        input_shapes={"device_0/wav_and_noisy:1": [100, 16384]}
      )
      converter.allow_custom_ops = True
      # converter.experimental_new_converter = True
      converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
                                             tf.lite.OpsSet.SELECT_TF_OPS]
      # converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
      #                                        tf.lite.OpsSet.TFLITE_BUILTINS,
      #                                        tf.lite.OpsSet.SELECT_TF_OPS]
      converter.optimizations = [tf.lite.Optimize.DEFAULT]
      # converter.target_spec.supported_types = [tf.int8]
      converter.inference_input_type = tf.int8  # or tf.uint8
      converter.inference_output_type = tf.int8  # or tf.uint8

      def test():
        pdb.set_trace()
        zeros = np.zeros(shape=(1, 100, 16384), dtype='int8')
        dataset = tf.data.Dataset.from_tensor_slices(zeros).batch(1)
        yield [zeros]
      converter.representative_dataset = test

      # pdb.set_trace()
      tflite_model = converter.convert()

      tflite_model_size = open(model_name, 'wb').write(tflite_model)
      print('TFLite Model is %d bytes' % tflite_model_size)

I've tried also including the TFLITE_BUILTINS in addition to TFLITE_BUILTINS_INT8. In other issues on github, I've seen that adding SELECT_TF_OPS solves errors similar to mine, but it did not fix it for me.

The output from the converter invocation

2020-08-25 02:22:18.873117: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-08-25 02:22:18.873149: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From convert_model_to_tflite_int8.py:24: experimental_run_functions_eagerly (from tensorflow.python.eager.def_function) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.run_functions_eagerly` instead of the experimental version.
2020-08-25 02:22:19.959195: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-25 02:22:19.959229: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-25 02:22:19.959253: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-5-134): /proc/driver/nvidia/version does not exist
2020-08-25 02:22:19.959493: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-25 02:22:19.984669: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2999995000 Hz
2020-08-25 02:22:19.984899: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d64580 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-25 02:22:19.984922: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-25 02:22:19.987088: I tensorflow/core/common_runtime/direct_session.cc:360] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2020-08-25 02:22:22.412284: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-25 02:22:22.412445: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-25 02:22:23.196542: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] Optimization results for grappler item: graph_to_optimize
2020-08-25 02:22:23.196595: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0.005ms.
2020-08-25 02:22:23.196614: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0.001ms.
2020-08-25 02:22:24.510075: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:313] Ignored output_format.
2020-08-25 02:22:24.510118: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored drop_control_dependency.
Traceback (most recent call last):
  File "convert_model_to_tflite_int8.py", line 102, in <module>
    tflite_model = converter.convert()
  File "../python3.6/site-packages/tensorflow/lite/python/lite.py", line 1970, in convert
    return super(TFLiteConverter, self).convert()
  File "../python3.6/site-packages/tensorflow/lite/python/lite.py", line 1339, in convert
    result = self._calibrate_quantize_model(result, **flags)
  File "../python3.6/site-packages/tensorflow/lite/python/lite.py", line 452, in _calibrate_quantize_model
    inference_output_type, allow_float, activations_type)
  File "../python3.6/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 91, in calibrate_and_quantize
    self._calibrator.Prepare([list(s.shape) for s in sample])
RuntimeError: Failed to initialize op resolver for calibration:
There are unresolved custom ops: []Encountered unresolved custom op: RandomStandardNormal.Node number 0 (RandomStandardNormal) failed to prepare.

Also, please include a link to the saved model or GraphDef Unfortunately cannot provide publicly at this time. I can share privately if needed.

Failure details The conversion hits a RuntimeError and stops. From PDB (below) I know it is hitting my representative_dataset function twice before crashing.

Any other info / logs

With a break point inside my representative_dataset function:

2020-08-25 02:29:39.675277: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-08-25 02:29:39.675311: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py:18: experimental_run_functions_eagerly (from tensorflow.python.eager.def_function) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.run_functions_eagerly` instead of the experimental version.
2020-08-25 02:29:40.563260: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-25 02:29:40.563287: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-25 02:29:40.563307: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-5-134): /proc/driver/nvidia/version does not exist
2020-08-25 02:29:40.563535: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-25 02:29:40.588667: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2999995000 Hz
2020-08-25 02:29:40.588892: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3c1a8b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-25 02:29:40.588920: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-25 02:29:40.591168: I tensorflow/core/common_runtime/direct_session.cc:360] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2020-08-25 02:29:42.998639: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-25 02:29:42.998794: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-25 02:29:43.773981: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] Optimization results for grappler item: graph_to_optimize
2020-08-25 02:29:43.774020: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0.006ms.
2020-08-25 02:29:43.774031: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0ms.
2020-08-25 02:29:45.082195: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:313] Ignored output_format.
2020-08-25 02:29:45.082241: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored drop_control_dependency.
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(93)test()
-> zeros = np.zeros(shape=(1, 100, 16384), dtype='int8')
(Pdb) n
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(94)test()
-> dataset = tf.data.Dataset.from_tensor_slices(zeros).batch(1)
(Pdb)
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(97)test()
-> yield [zeros]
(Pdb)
GeneratorExit
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(97)test()
-> yield [zeros]
(Pdb)
Traceback (most recent call last):
  File "/usr/lib/python3.6/pdb.py", line 1667, in main
    pdb._runscript(mainpyfile)
  File "/usr/lib/python3.6/pdb.py", line 1548, in _runscript
    self.run(statement)
  File "/usr/lib/python3.6/bdb.py", line 434, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py", line 18, in <module>
    '''
  File ".../python3.6/site-packages/tensorflow/lite/python/lite.py", line 1970, in convert
    return super(TFLiteConverter, self).convert()
  File ".../python3.6/site-packages/tensorflow/lite/python/lite.py", line 1339, in convert
    result = self._calibrate_quantize_model(result, **flags)
  File ".../python3.6/site-packages/tensorflow/lite/python/lite.py", line 452, in _calibrate_quantize_model
    inference_output_type, allow_float, activations_type)
  File ".../python3.6/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 91, in calibrate_and_quantize
    self._calibrator.Prepare([list(s.shape) for s in sample])
RuntimeError: Failed to initialize op resolver for calibration:
There are unresolved custom ops: []Encountered unresolved custom op: RandomStandardNormal.Node number 0 (RandomStandardNormal) failed to prepare.

Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> .../python3.6/site-packages/tensorflow/lite/python/optimize/calibrator.py(91)calibrate_and_quantize()
-> self._calibrator.Prepare([list(s.shape) for s in sample])

closed time in 2 months

CoreyCole

issue commenttensorflow/tensorflow

TFLite Int8 Quantization Conversion - There are unresolved custom ops: []

Thank you for the support @jvishnuvardhan and @abattery

I believe turned out to be a problem with how I was exporting the saved model before further converting to tflite. By using the correct input node/shape and upgrading to tensorflow==2.3.0 the problem has gone away.

CoreyCole

comment created time in 2 months

startedsimon-weber/gmusicapi

started time in 2 months

issue openedtensorflow/tensorflow

TFLite Int8 Quantization Conversion - There are unresolved custom ops: []

Error

RuntimeError: Failed to initialize op resolver for calibration:
There are unresolved custom ops: []Encountered unresolved custom op: RandomStandardNormal.Node number 0 (RandomStandardNormal) failed to prepare.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (or github SHA if from source): 2.3.0

Command used to run the converter or code if you’re using the Python API If possible, please share a link to Colab/Jupyter/any notebook.

import os, glob
os.environ["CUDA_VISIBLE_DEVICES"]="-1"
import tensorflow.compat.v1 as tf

from tensorflow.python.client import device_lib
import numpy as np

import faulthandler
faulthandler.enable()

import pdb

path = os.path.dirname(os.path.abspath(__file__))
pb_model_name = "model.pb"
pb_model_path = os.path.join(path, pb_model_name)

model_name = 'model_int8.tflite'

if __name__ == '__main__':
  with tf.device("/cpu:0"):
    configuration = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True)
    with tf.Session(config=configuration) as sess:
      converter = tf.lite.TFLiteConverter.from_frozen_graph(
        graph_def_file=pb_model_path,
        # input_arrays=["device_0/wav_and_noisy:1"],
        input_arrays=["device_0/wav_and_noisy:1"],
        output_arrays=["device_0/g_ae_1/Tanh"],
        input_shapes={"device_0/wav_and_noisy:1": [100, 16384]}
      )
      converter.allow_custom_ops = True
      # converter.experimental_new_converter = True
      converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
                                             tf.lite.OpsSet.SELECT_TF_OPS]
      # converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8,
      #                                        tf.lite.OpsSet.TFLITE_BUILTINS,
      #                                        tf.lite.OpsSet.SELECT_TF_OPS]
      converter.optimizations = [tf.lite.Optimize.DEFAULT]
      # converter.target_spec.supported_types = [tf.int8]
      converter.inference_input_type = tf.int8  # or tf.uint8
      converter.inference_output_type = tf.int8  # or tf.uint8

      def test():
        pdb.set_trace()
        zeros = np.zeros(shape=(1, 100, 16384), dtype='int8')
        dataset = tf.data.Dataset.from_tensor_slices(zeros).batch(1)
        yield [zeros]
      converter.representative_dataset = test

      # pdb.set_trace()
      tflite_model = converter.convert()

      tflite_model_size = open(model_name, 'wb').write(tflite_model)
      print('TFLite Model is %d bytes' % tflite_model_size)

I've tried also including the TFLITE_BUILTINS in addition to TFLITE_BUILTINS_INT8. In other issues on github, I've seen that adding SELECT_TF_OPS solves errors similar to mine, but it did not fix it for me.

The output from the converter invocation

2020-08-25 02:22:18.873117: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-08-25 02:22:18.873149: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From convert_model_to_tflite_int8.py:24: experimental_run_functions_eagerly (from tensorflow.python.eager.def_function) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.run_functions_eagerly` instead of the experimental version.
2020-08-25 02:22:19.959195: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-25 02:22:19.959229: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-25 02:22:19.959253: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-5-134): /proc/driver/nvidia/version does not exist
2020-08-25 02:22:19.959493: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-25 02:22:19.984669: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2999995000 Hz
2020-08-25 02:22:19.984899: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3d64580 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-25 02:22:19.984922: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-25 02:22:19.987088: I tensorflow/core/common_runtime/direct_session.cc:360] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2020-08-25 02:22:22.412284: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-25 02:22:22.412445: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-25 02:22:23.196542: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] Optimization results for grappler item: graph_to_optimize
2020-08-25 02:22:23.196595: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0.005ms.
2020-08-25 02:22:23.196614: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0.001ms.
2020-08-25 02:22:24.510075: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:313] Ignored output_format.
2020-08-25 02:22:24.510118: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored drop_control_dependency.
Traceback (most recent call last):
  File "convert_model_to_tflite_int8.py", line 102, in <module>
    tflite_model = converter.convert()
  File "../python3.6/site-packages/tensorflow/lite/python/lite.py", line 1970, in convert
    return super(TFLiteConverter, self).convert()
  File "../python3.6/site-packages/tensorflow/lite/python/lite.py", line 1339, in convert
    result = self._calibrate_quantize_model(result, **flags)
  File "../python3.6/site-packages/tensorflow/lite/python/lite.py", line 452, in _calibrate_quantize_model
    inference_output_type, allow_float, activations_type)
  File "../python3.6/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 91, in calibrate_and_quantize
    self._calibrator.Prepare([list(s.shape) for s in sample])
RuntimeError: Failed to initialize op resolver for calibration:
There are unresolved custom ops: []Encountered unresolved custom op: RandomStandardNormal.Node number 0 (RandomStandardNormal) failed to prepare.

Also, please include a link to the saved model or GraphDef Unfortunately cannot provide publicly at this time. I can share privately if needed.

Failure details The conversion hits a RuntimeError and stops. From PDB (below) I know it is hitting my representative_dataset function twice before crashing.

Any other info / logs

With a break point inside my representative_dataset function:

2020-08-25 02:29:39.675277: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-08-25 02:29:39.675311: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
WARNING:tensorflow:From /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py:18: experimental_run_functions_eagerly (from tensorflow.python.eager.def_function) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.run_functions_eagerly` instead of the experimental version.
2020-08-25 02:29:40.563260: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-08-25 02:29:40.563287: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-08-25 02:29:40.563307: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (ip-172-31-5-134): /proc/driver/nvidia/version does not exist
2020-08-25 02:29:40.563535: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-25 02:29:40.588667: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2999995000 Hz
2020-08-25 02:29:40.588892: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3c1a8b0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-25 02:29:40.588920: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-08-25 02:29:40.591168: I tensorflow/core/common_runtime/direct_session.cc:360] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device

2020-08-25 02:29:42.998639: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0
2020-08-25 02:29:42.998794: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2020-08-25 02:29:43.773981: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:816] Optimization results for grappler item: graph_to_optimize
2020-08-25 02:29:43.774020: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0.006ms.
2020-08-25 02:29:43.774031: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:818]   function_optimizer: function_optimizer did nothing. time = 0ms.
2020-08-25 02:29:45.082195: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:313] Ignored output_format.
2020-08-25 02:29:45.082241: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored drop_control_dependency.
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(93)test()
-> zeros = np.zeros(shape=(1, 100, 16384), dtype='int8')
(Pdb) n
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(94)test()
-> dataset = tf.data.Dataset.from_tensor_slices(zeros).batch(1)
(Pdb)
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(97)test()
-> yield [zeros]
(Pdb)
GeneratorExit
> /home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py(97)test()
-> yield [zeros]
(Pdb)
Traceback (most recent call last):
  File "/usr/lib/python3.6/pdb.py", line 1667, in main
    pdb._runscript(mainpyfile)
  File "/usr/lib/python3.6/pdb.py", line 1548, in _runscript
    self.run(statement)
  File "/usr/lib/python3.6/bdb.py", line 434, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/home/ubuntu/denoise-gst/denoise_deploy/inference_module/convert_model_to_tflite_int8.py", line 18, in <module>
    '''
  File "/home/ubuntu/denoise-gst/venv/lib/python3.6/site-packages/tensorflow/lite/python/lite.py", line 1970, in convert
    return super(TFLiteConverter, self).convert()
  File "/home/ubuntu/denoise-gst/venv/lib/python3.6/site-packages/tensorflow/lite/python/lite.py", line 1339, in convert
    result = self._calibrate_quantize_model(result, **flags)
  File "/home/ubuntu/denoise-gst/venv/lib/python3.6/site-packages/tensorflow/lite/python/lite.py", line 452, in _calibrate_quantize_model
    inference_output_type, allow_float, activations_type)
  File "/home/ubuntu/denoise-gst/venv/lib/python3.6/site-packages/tensorflow/lite/python/optimize/calibrator.py", line 91, in calibrate_and_quantize
    self._calibrator.Prepare([list(s.shape) for s in sample])
RuntimeError: Failed to initialize op resolver for calibration:
There are unresolved custom ops: []Encountered unresolved custom op: RandomStandardNormal.Node number 0 (RandomStandardNormal) failed to prepare.

Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /home/ubuntu/denoise-gst/venv/lib/python3.6/site-packages/tensorflow/lite/python/optimize/calibrator.py(91)calibrate_and_quantize()
-> self._calibrator.Prepare([list(s.shape) for s in sample])

created time in 2 months

startedageron/handson-ml2

started time in 2 months

issue closedtensorflow/tensorflow

TFLiteConverter Segmentation Fault during integer quantization representative_dataset --add_postprocessing_op=true

I'm using tensorflow==1.15.3 and I'm hitting a segmentation fault attempting int8 post-training quantization. The documentation for the 1.15 version of the TFLiteConverter can be found here.

I found a similar issue on github, but their solution to provide --add_postprocessing_op=true has not solved the segmentation fault.

I've debugged it using PDB and found exactly where it crashes. It never reaches my representative_dataset function. It faults when running CreateWrapperCPPFromBuffer(model_content)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 18.04.4 LTS
  • TensorFlow installed from (source or binary): source/pip tensorflow==1.15.3
  • TensorFlow version (or github SHA if from source): 1.15.3

Command used to run the converter or code if you’re using the Python API If possible, please share a link to Colab/Jupyter/any notebook.

python -m pdb convert_model_to_tflite_int8.py --add_postprocessing_op=true

The output from the converter invocation

2020-08-20 20:25:22.552188: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-08-20 20:25:22.573942: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999995000 Hz
2020-08-20 20:25:22.574163: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x566ecb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-20 20:25:22.574183: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2020-08-20 20:25:22.574411: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2020-08-20 20:25:33.546206: I tensorflow/core/grappler/devices.cc:60] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)
2020-08-20 20:25:33.546355: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session

2020-08-20 20:25:37.496345: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: graph_to_optimize
2020-08-20 20:25:37.496382: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 433 nodes (-65), 484 edges (-65), time = 2221.83691ms.
2020-08-20 20:25:37.496394: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 433 nodes (0), 484 edges (0), time = 935.13ms.
> .../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py(51)__init__()
-> .CreateWrapperCPPFromBuffer(model_content))
(Pdb) s
Fatal Python error: Segmentation fault

Current thread 0x00007ff40ee9f740 (most recent call first):
  File ".../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 51 in __init__
  File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 236 in _calibrate_quantize_model
  File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 993 in convert
  File ".../convert_model_to_tflite_int8.py", line 97 in <module>
  File "<string>", line 1 in <module>
  File "/usr/lib/python3.6/bdb.py", line 434 in run
  File "/usr/lib/python3.6/pdb.py", line 1548 in _runscript
  File "/usr/lib/python3.6/pdb.py", line 1667 in main
  File "/usr/lib/python3.6/pdb.py", line 1694 in <module>
  File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
  File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main
[1]    17668 segmentation fault (core dumped)  python -m pdb convert_model_to_tflite_int8.py  --add_postprocessing_op=true
converter = tf.lite.TFLiteConverter.from_frozen_graph(
  graph_def_file=pb_model_path,
  input_arrays=["device_0/input_node_name:1"],
  output_arrays=["device_0/output_node_name"],
  input_shapes={"device_0/input_node_name:1": [100, 16384]}
)
converter.allow_custom_ops = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

def test():
  pdb.set_trace()
  print(' ! ! ! representative_dataset_gen ! ! ! ')
  zeros = np.zeros(shape=(1, 100, 16384), dtype='int8')
  ds = tf.data.Dataset.from_tensor_slices((zeros)).batch(1)
  for input_value in ds.take(1):
    yield [input_value]
converter.representative_dataset = test

pdb.set_trace()
tflite_model = converter.convert()

tflite_model_size = open(model_name, 'wb').write(tflite_model)
print('TFLite Model is %d bytes' % tflite_model_size)

Failure details

  • int8 quantization with representative_dataset
  • Segmentation fault occurs when running CreateWrapperCPPFromBuffer(model_content)

Any other info / logs

  • tf.float16 conversion works (without representative_dataset property)
  • asked on stack overflow here

closed time in 2 months

CoreyCole

issue commenttensorflow/tensorflow

TFLiteConverter Segmentation Fault during integer quantization representative_dataset --add_postprocessing_op=true

Upgrading my tf version to 2.3 solved the segmentation fault. My model code isn't compatible with tf==2.x yet, but luckily the conversion code is independent from that so the upgrade went smoothly.

CoreyCole

comment created time in 2 months

issue openedtensorflow/tensorflow

TFLiteConverter Segmentation Fault during integer quantization representative_dataset --add_postprocessing_op=true

I'm using tensorflow==1.15.3 and I'm hitting a segmentation fault attempting int8 post-training quantization. The documentation for the 1.15 version of the TFLiteConverter can be found here.

I found a similar issue on github, but their solution to provide --add_postprocessing_op=true has not solved the segmentation fault.

I've debugged it using PDB and found exactly where it crashes. It never reaches my representative_dataset function. It faults when running CreateWrapperCPPFromBuffer(model_content)

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 18.04.4 LTS
  • TensorFlow installed from (source or binary): source/pip tensorflow==1.15.3
  • TensorFlow version (or github SHA if from source): 1.15.3

Command used to run the converter or code if you’re using the Python API If possible, please share a link to Colab/Jupyter/any notebook.

python -m pdb convert_model_to_tflite_int8.py --add_postprocessing_op=true

The output from the converter invocation

2020-08-20 20:25:22.552188: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-08-20 20:25:22.573942: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2999995000 Hz
2020-08-20 20:25:22.574163: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x566ecb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-20 20:25:22.574183: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2020-08-20 20:25:22.574411: I tensorflow/core/common_runtime/direct_session.cc:359] Device mapping:
/job:localhost/replica:0/task:0/device:XLA_CPU:0 -> device: XLA_CPU device
2020-08-20 20:25:33.546206: I tensorflow/core/grappler/devices.cc:60] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 0 (Note: TensorFlow was not compiled with CUDA support)
2020-08-20 20:25:33.546355: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session

2020-08-20 20:25:37.496345: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:786] Optimization results for grappler item: graph_to_optimize
2020-08-20 20:25:37.496382: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 433 nodes (-65), 484 edges (-65), time = 2221.83691ms.
2020-08-20 20:25:37.496394: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:788]   constant_folding: Graph size after: 433 nodes (0), 484 edges (0), time = 935.13ms.
> .../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py(51)__init__()
-> .CreateWrapperCPPFromBuffer(model_content))
(Pdb) s
Fatal Python error: Segmentation fault

Current thread 0x00007ff40ee9f740 (most recent call first):
  File ".../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 51 in __init__
  File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 236 in _calibrate_quantize_model
  File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 993 in convert
  File ".../convert_model_to_tflite_int8.py", line 97 in <module>
  File "<string>", line 1 in <module>
  File "/usr/lib/python3.6/bdb.py", line 434 in run
  File "/usr/lib/python3.6/pdb.py", line 1548 in _runscript
  File "/usr/lib/python3.6/pdb.py", line 1667 in main
  File "/usr/lib/python3.6/pdb.py", line 1694 in <module>
  File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
  File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main
[1]    17668 segmentation fault (core dumped)  python -m pdb convert_model_to_tflite_int8.py  --add_postprocessing_op=true
converter = tf.lite.TFLiteConverter.from_frozen_graph(
  graph_def_file=pb_model_path,
  input_arrays=["device_0/input_node_name:1"],
  output_arrays=["device_0/output_node_name"],
  input_shapes={"device_0/input_node_name:1": [100, 16384]}
)
converter.allow_custom_ops = True
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8

def test():
  pdb.set_trace()
  print(' ! ! ! representative_dataset_gen ! ! ! ')
  zeros = np.zeros(shape=(1, 100, 16384), dtype='float64')
  ds = tf.data.Dataset.from_tensor_slices((zeros)).batch(1)
  for input_value in ds.take(1):
    yield [input_value]
converter.representative_dataset = test

pdb.set_trace()
tflite_model = converter.convert()

tflite_model_size = open(model_name, 'wb').write(tflite_model)
print('TFLite Model is %d bytes' % tflite_model_size)

Failure details

  • int8 quantization with representative_dataset
  • Segmentation fault occurs when running CreateWrapperCPPFromBuffer(model_content)

Any other info / logs

  • tf.float16 conversion works (without representative_dataset property)
  • asked on stack overflow here

created time in 2 months

issue commenttensorflow/tensorflow

Segmentation fault using tf.lite.TFLiteConverter with representative_dataset

Sorry to piggyback on an old issue. I'm using tf==1.15 and I'm hitting a segmentation fault attempting int8 quantization using converter.representative_dataset. I set a breakpoint at the beginning of my representative_dataset_gen function and it never hits it.

I get a segmentation fault here when running the conversion:

.../tensorflow_core/lite/python/optimize/calibrator.py", line 51 in __init__

The full stack trace is here:

> /home/ubuntu/denoise-gst/venv/lib/python3.6/site-packages/tensorflow_core/lite/python/lite.py(993)convert()
-> inference_output_type)
(Pdb) 
Fatal Python error: Segmentation fault

Current thread 0x00007ff40ee9f740 (most recent call first):
  File ".../python3.6/site-packages/tensorflow_core/lite/python/optimize/calibrator.py", line 51 in __init__
  File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 236 in _calibrate_quantize_model
  File ".../python3.6/site-packages/tensorflow_core/lite/python/lite.py", line 993 in convert
  File ".../convert_model_to_tflite_int8.py", line 97 in <module>
  File "<string>", line 1 in <module>
  File "/usr/lib/python3.6/bdb.py", line 434 in run
  File "/usr/lib/python3.6/pdb.py", line 1548 in _runscript
  File "/usr/lib/python3.6/pdb.py", line 1667 in main
  File "/usr/lib/python3.6/pdb.py", line 1694 in <module>
  File "/usr/lib/python3.6/runpy.py", line 85 in _run_code
  File "/usr/lib/python3.6/runpy.py", line 193 in _run_module_as_main
[1]    17668 segmentation fault (core dumped)  python -m pdb convert_model_to_tflite_int8.py

Let me know if this is unrelated and I will open a new issue. I can't find any other issues more similar to my problem than this issue.

Any other debugging tips to get to the root of my problem would be appreciated :)

hvico

comment created time in 2 months

more