profile
viewpoint
Suharsh Sivakumar suharshs Google San Francisco, CA https://twitter.com/suharshs

tensorflow/model-optimization 664

A suite of tools that users, both novice and advanced, can use to optimize machine learning models for deployment and execution.

rishibajekal/facebook-mood-graphs 5

A web-application to detect your mood over time and other parameters based on your Facebook data. Created in 24-hours for UIUC Facebook Hackathon 2012.

suharshs/piazzaapi 4

Python api to access and post data from piazza.

suharshs/bindibot 3

A classifier that attempts to correctly answer piazza questions based on historical course data.

suharshs/giraph 3

An incomplete javascript graph library geared towards educational use.

rishibajekal/fitquo 2

UIUC CS411 Fall 2012 - Database Systems

irtefa/skoop 1

Extensible search visualizer (ESV).

rishibajekal/drimp 1

A place to share why you're drinking what you're drinking...

suharshs/fingerpaint 1

make your fingers do things you never thought they could

suharshs/models 1

Models and examples built with TensorFlow

pull request commenttensorflow/tensorflow

Add 16 bit support to kernel operator TRANSPOSE_CONVOLUTION

I am still making some internal tests pass with this change. Should be in soon.

On Wed, Feb 26, 2020 at 3:11 AM Peng Sun notifications@github.com wrote:

I have a look on Windows Bazel build and Windows Bazel GPU build: none of the logs seems failing relative to this PR

https://source.cloud.google.com/results/invocations/ea48732b-6791-400f-9343-274a6bdce49f/log

https://source.cloud.google.com/results/invocations/1f43a2e1-15b2-4cbf-b0c8-a098297e2eed/log

@suharshs https://github.com/suharshs @rthadur https://github.com/rthadur can you help to have a look? Shall I rebase this PR?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35997?email_source=notifications&email_token=AALCE5VPJW6WKJR3ARMSL3LREZE57A5CNFSM4KINAG5KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEM72BZI#issuecomment-591372517, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALCE5RVGPSYVKM6VBMGD6TREZE57ANCNFSM4KINAG5A .

psunn

comment created time in 15 hours

pull request commenttensorflow/tensorflow

TransposeConv with Bias

Hi peng, this is messing with some internal tests that I am cleaning up. Should be merged once that is done. Thanks!

On Fri, Feb 21, 2020, 3:37 AM Peng Sun notifications@github.com wrote:

@psunn https://github.com/psunn requested your review on: #34903 https://github.com/tensorflow/tensorflow/pull/34903 TransposeConv with Bias.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/34903?email_source=notifications&email_token=AALCE5VSSOH4AFOXDJZZOFTRD64I3A5CNFSM4JW2PHR2YY3PNVWWK3TUL52HS4DFWZEXG43VMVCXMZLOORHG65DJMZUWGYLUNFXW5KTDN5WW2ZLOORPWSZGOWZTNKEA#event-3060192528, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALCE5WUZTP5HU3MRPYEVXTRD64I3ANCNFSM4JW2PHRQ .

psunn

comment created time in 6 days

issue commenttensorflow/tensorflow

Error while converting to quantized tflite.

Hmm, yes it seems that REDUCE_MAX doesn't yet have a quantized implementation, can you print all ops in your tensorflow graph and see is Max is in there for some reason?

shashank-industrail

comment created time in 8 days

Pull request review commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

 struct OperatorProperty {    // Op version.   int version = 1;++  // When we quantize activations into 16 bit and weights into 8 bit,+  // we want to quantize all inputs, including constant tensors,+  // for the operators like Add, Mul into 16-bit as well. The constant+  // inputs are quantized as weights and this variable indicates+  // that we want to do quantizations of these tensors as activations.+  bool quantize_input_as_activations = false;

SGTM.

wwwind

comment created time in 8 days

Pull request review commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

 def _is_calibration_quantize(self):    def _calibrate_quantize_model(self, result, inference_input_type,                                 inference_output_type, enable_mlir_quantizer):-    allow_float = not self._is_int8_target_required()+    allow_float = not self._is_int8_target_required() and not self._is_int16x8_target_required()     calibrate_quantize = _calibrator.Calibrator(result)+    activations_type = constants.INT16 if self._is_int16x8_target_required() else constants.INT8

Got it. So the plan is to do that in a followup?

wwwind

comment created time in 8 days

Pull request review commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

 struct OperatorProperty {    // Op version.   int version = 1;++  // When we quantize activations into 16 bit and weights into 8 bit,+  // we want to quantize all inputs, including constant tensors,+  // for the operators like Add, Mul into 16-bit as well. The constant+  // inputs are quantized as weights and this variable indicates+  // that we want to do quantizations of these tensors as activations.+  bool quantize_input_as_activations = false;

I don't fully understand what this flag is exactly.

Does this flag force that all the inputs to an operation are 16bit?

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

 TfLiteStatus QuantizeModel(flatbuffers::FlatBufferBuilder* builder, TfLiteStatus QuantizeModel(flatbuffers::FlatBufferBuilder* builder,                            ModelT* input_model, const TensorType& input_type,                            const TensorType& output_type, bool allow_float,+                           const TensorType& activations_type,

This is the only call invoked by the python tflite converter code. So I think instead of adding activations_type to every QuantizeModel function here, perhaps we can create a new QuantizeModel function that just adds activations_type to this function signature. (I am concerned about callers of the c++ code getting unexpected breakages.)

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

 def _is_calibration_quantize(self):    def _calibrate_quantize_model(self, result, inference_input_type,                                 inference_output_type, enable_mlir_quantizer):-    allow_float = not self._is_int8_target_required()+    allow_float = not self._is_int8_target_required() and not self._is_int16x8_target_required()     calibrate_quantize = _calibrator.Calibrator(result)+    activations_type = constants.INT16 if self._is_int16x8_target_required() else constants.INT8

This should not check if self._is_int16x8_target_required right? Shouldn't it be sufficient if the int16 flag is just in the list. For instance in the allow_float case where both float and in16x8 are specified in the optimizations?

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

 void GetAsymmetricQuantizationParams(   quantization_params->zero_point = std::vector<int64_t>(1, zero_point); } +void GetSymmetricQuantizationParams(+    float min, float max, const int half_quant_range,+    QuantizationParametersT* quantization_params) {+  // Adjust the boundaries to guarantee 0 is included.+  min = std::min(min, 0.0f);+  max = std::max(max, 0.0f);+  const float scale = std::max(std::abs(max), std::abs(min)) / half_quant_range;+  int64_t zero_point = 0;+  quantization_params->min = std::vector<float>(1, min);+  quantization_params->max = std::vector<float>(1, max);+  quantization_params->scale = std::vector<float>(1, scale);+  quantization_params->zero_point = std::vector<int64_t>(1, 0);+}++TfLiteStatus GetQuantizationParams(TensorT* tensor, TensorType activations_type,+                                   QuantizationParametersT* quantization_params,+                                   ErrorReporter* error_reporter) {+  if (activations_type == TensorType_INT8) {+    GetAsymmetricQuantizationParams(+        tensor->quantization->min[0], tensor->quantization->max[0],+        std::numeric_limits<int8_t>::min(), std::numeric_limits<int8_t>::max(),+        quantization_params);+  } else if (activations_type == TensorType_INT16) {+    float range = std::max(std::abs(tensor->quantization->min[0]),

Did you want to call the above GetSymmetricQuantizationParams function here?

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit version of ADD/SUB reference kernel operators

 inline void Add(const ArithmeticParams& params,                 const RuntimeShape& output_shape, int16* output_data) {   TFLITE_DCHECK_LE(params.quantized_activation_min,                    params.quantized_activation_max);+  const int flat_size =+      MatchingElementsSize(input1_shape, input2_shape, output_shape);++  int max_value = std::numeric_limits<int16>::max();++  TFLITE_DCHECK_GT(params.input1_offset, -max_value);+  TFLITE_DCHECK_GT(params.input2_offset, -max_value);+  TFLITE_DCHECK_LT(params.input1_offset, max_value);+  TFLITE_DCHECK_LT(params.input2_offset, max_value);+  AddElementwise(flat_size, params, input1_data, input2_data, output_data);+}++inline void AddLSTM(const ArithmeticParams& params,

I am not sure I understand, this Add and Sub operation are specific to LSTM?

How are they created in the TFLite graph. If possible we should avoid having unfused versions of add and sub that are specific to LSTM.

wwwind

comment created time in 21 days

issue commenttensorflow/tensorflow

Error while converting to quantized tflite.

Have you tried converting the model without any optimizations specified? Want to make sure first that your model can convert successfully for the float version before trying to add quantization.

Thanks! -Suharsh

shashank-industrail

comment created time in 24 days

issue closedtensorflow/tensorflow

fake_quant_with_min_max_vars innefficiencies

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04.3 Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: NA TensorFlow installed from (source or binary): Binary TensorFlow version (use command below): v1.12.1-21401-gd908b50 2.1.0-dev20191230 Python version: 3.6.9 Bazel version (if compiling from source): NA GCC/Compiler version (if compiling from source): NA CUDA/cuDNN version: CUDA Version 10.1.243 / cuDNN 7.6.4.38-1 GPU model and memory: TITAN V, 12GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior A call to fake_quant_with_min_max_vars consistently results in a couple of D2H transfers before the quantization kernel is executed. I believe these transfers are part of ValidateInputTypeAndPlacement and largely dominate the operation cost. This is slowing down to unbearable levels the training of large NNs with fake quantization nodes.

An image of a profile resulting from back to back dependent quantization calls:

image

Describe the expected behavior

I would expect little to zero overhead before the actual quantization kernel.

Code to reproduce the issue

`import tensorflow as tf import numpy as np import time from tensorflow.python import eager import os

x = tf.random.uniform(shape=[10000,1000]) xmax = tf.Variable(0.5)

eager.profiler.start()

xmax_val = xmax.value()

for n in range(10): x = tf.quantization.fake_quant_with_min_max_vars(inputs=x, min=xmax_val, max=xmax_val, num_bits=8)

profiler_result = eager.profiler.stop() eager.profiler.save(os.path.join('quant','log'), profiler_result)

`

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in a month

ramonmatas

issue commenttensorflow/tensorflow

fake_quant_with_min_max_vars innefficiencies

Hi, I dug into this and the issue is that the FakeQuant operation seems to do some computation using the min and max values that must happen on the CPU host. The issue can be resolved by forcing the min/max variable to be placed on the CPU. I see ValidateInputTypeAndPlacement disappear from my profiler when I do this.

with tf.device('/cpu:0'): xmax = ... xmax_val =

...

Hope that helps!

ramonmatas

comment created time in a month

Pull request review commenttensorflow/tensorflow

[TFLite int16] Added 16/8 bit support to kernel operator CONCAT

 TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {       TF_LITE_CONCATENATION(int32);       break;     case kTfLiteUInt8:-      TF_LITE_CONCATENATION_QUANTIZED();+      TF_LITE_CONCATENATION_QUANTIZED(uint8_t);       break;     case kTfLiteInt8:       TF_LITE_CONCATENATION(int8_t);

Can we instead make this code path shared with the int8 rather than the uint8 path. The uint8 path does rescaling whereas in the int8 we updated the code to require scales to match, and any rescaling of inputs should happen outside of the operation.

wwwind

comment created time in a month

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit version of MUL reference kernel operator

 TfLiteStatus EvalQuantized(TfLiteContext* context, TfLiteNode* node,           TF_LITE_MUL(optimized_integer_ops, Mul, int8_t);         }       }+    } else if (input1->type == kTfLiteInt16) {+      // We have this check, because in case of int16+      // input1_val*input2_val can overflow int32:+      // see MulElementwise -+      // tensorflow/lite/kernels/internal/reference/integer_ops/mul.h in case of+      // 16-bit this function is used in symmetric quantization, so offset+      // should be zero.+      TF_LITE_ENSURE_EQ(context, op_params.input1_offset, 0.0);

For 16bit we are only supporting symmetric for now.

wwwind

comment created time in a month

Pull request review commenttensorflow/tensorflow

16-bit version of ADD/SUB reference kernel operators

 inline void Add(const ArithmeticParams& params,                 const RuntimeShape& output_shape, int16* output_data) {   TFLITE_DCHECK_LE(params.quantized_activation_min,                    params.quantized_activation_max);+  const int flat_size =+      MatchingElementsSize(input1_shape, input2_shape, output_shape);++  int max_value = std::numeric_limits<int16>::max();++  TFLITE_DCHECK_GT(params.input1_offset, -max_value);+  TFLITE_DCHECK_GT(params.input2_offset, -max_value);+  TFLITE_DCHECK_LT(params.input1_offset, max_value);+  TFLITE_DCHECK_LT(params.input2_offset, max_value);+  AddElementwise(flat_size, params, input1_data, input2_data, output_data);+}++inline void AddLSTM(const ArithmeticParams& params,

What is AddLSTM?

wwwind

comment created time in a month

Pull request review commenttensorflow/tensorflow

Symmetric 16-bit activations and 8-bit weights: reference kernel CONV_2D

 class SingleOpModel {   // Quantize and populate data for bias with per channel quantization.   void PerChannelQuantizeBias(int index, const std::vector<float>& input_data) {     const int32_t num_inputs = input_data.size();-    std::vector<int32_t> quantized_output(num_inputs);     TfLiteTensor* t = interpreter_->tensor(index);     auto* params =         reinterpret_cast<TfLiteAffineQuantization*>(t->quantization.params);-    for (int i = 0; i < num_inputs; ++i) {-      quantized_output[i] = input_data[i] / params->scale->data[i];+    CHECK(t->type == kTfLiteInt32 || t->type == kTfLiteInt64);+    if (t->type == kTfLiteInt32) {+      std::vector<int32_t> quantized_output(num_inputs);+      for (int i = 0; i < num_inputs; ++i) {+        const float scale = params->scale->size == 1 ? params->scale->data[0]+                                              : params->scale->data[i];+        quantized_output[i] = input_data[i] / scale;+      }+      PopulateTensor(index, /*offset=*/0, quantized_output.data(),+                     quantized_output.data() + quantized_output.size());

Is this portion of the code the same in both cases, if so, can we share.

wwwind

comment created time in a month

Pull request review commenttensorflow/tensorflow

Symmetric 16-bit activations and 8-bit weights: reference kernel CONV_2D

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#include <sys/types.h>++#include <stdio.h>+#include <algorithm>+#include <cmath>+#include <cstdint>+#include <cstdlib>+#include <iterator>+#include <limits>+#include <string>+#include <type_traits>+#include <vector>++#include <gtest/gtest.h>+#include "tensorflow/lite/kernels/internal/common.h"+#include "tensorflow/lite/kernels/internal/quantization_util.h"+#include "tensorflow/lite/kernels/internal/reference/conv.h"+#include "tensorflow/lite/kernels/internal/reference/integer_ops/conv.h"+#include "tensorflow/lite/kernels/internal/test_util.h"+#include "tensorflow/lite/kernels/internal/types.h"++namespace tflite {+namespace {++void PickOutputMultiplier(+    const ConvParams& params, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    float* output_multiplier) {+  const int stride_width = params.stride_width;+  const int stride_height = params.stride_height;+  const int dilation_width_factor = params.dilation_width_factor;+  const int dilation_height_factor = params.dilation_height_factor;+  const int pad_width = params.padding_values.width;+  const int pad_height = params.padding_values.height;+  const int32 input_offset = params.input_offset;++  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int input_depth = input_shape.Dims(3);+  const int filter_height = filter_shape.Dims(1);+  const int filter_width = filter_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);+  const int output_depth = output_shape.Dims(3);++  std::int64_t output_accu_min = std::numeric_limits<std::int64_t>::max();+  std::int64_t output_accu_max = std::numeric_limits<std::int64_t>::min();++  for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int output_channel = 0; output_channel < output_depth;+             ++output_channel) {+          const int in_x_origin = (out_x * stride_width) - pad_width;+          const int in_y_origin = (out_y * stride_height) - pad_height;+          std::int64_t acc = 0;+          for (int filter_y = 0; filter_y < filter_height; ++filter_y) {+            for (int filter_x = 0; filter_x < filter_width; ++filter_x) {+              for (int in_channel = 0; in_channel < input_depth; ++in_channel) {+                const int in_x = in_x_origin + dilation_width_factor * filter_x;+                const int in_y =+                    in_y_origin + dilation_height_factor * filter_y;+                // Zero padding by omitting the areas outside the image.+                const bool is_point_inside_image =+                    (in_x >= 0) && (in_x < input_width) && (in_y >= 0) &&+                    (in_y < input_height);+                if (is_point_inside_image) {+                  int32 input_val = input_data[Offset(input_shape, batch, in_y,+                                                      in_x, in_channel)];+                  int32 filter_val =+                      filter_data[Offset(filter_shape, output_channel, filter_y,+                                         filter_x, in_channel)];+                  acc += (std::int64_t)filter_val *

please use std::static_cast for the casts here and throughout the code.

wwwind

comment created time in a month

pull request commenttensorflow/tensorflow

symmetric 16-bit activation quantization

Just to clarify - are you suggesting that we can merge this PR as is, and introduce the changes you propose for supported_ops as a separate PR? Do you have any other issues that need to be resolved for this PR to progress?

No, I am suggesting that once the API comment is addressed in https://github.com/tensorflow/tensorflow/pull/33343#issuecomment-572251334, we can move forward with the tooling change and review this PR as is. I meant that for additional PRs one reference kernel implementation per PR would make sense and help us review things faster.

Our initial plans are to introduce the tooling and reference kernels for 16-bit activations. We have tentative plans for optimized implementations. Our other relevant efforts are currently focused on implementing corresponding kernels in TensorFlow Lite Micro, but that work depends on the progress with introducing the reference code to TFLite first.

Makes sense, thanks!

wwwind

comment created time in 2 months

pull request commenttensorflow/tensorflow

symmetric 16-bit activation quantization

I think a separate PRs for the reference implementations and a separate PR for the tooling changes would be great! That way we can iterate on the tooling and pipeline review of each reference op implementation.

Also, are optimized CPU implementations on your roadmap as well for these ops after the reference code and tooling is submitted?

wwwind

comment created time in 2 months

pull request commenttensorflow/tensorflow

symmetric 16-bit activation quantization

Hi, sorry for the delays, been in and out frequently during the break.

Let's do the following when exposing this via the API, there should be two valid states.

supported_ops = [TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8] : should throw an error if an op doesn't have the 16bit activations 8 bits weight version.

supported_ops = [TFLITE_BUILTINS_ACTIVATIONS_INT16_WEIGHTS_INT8, TFLITE_BUILTINS]: Should fall back to the floating point builtin operation and and add the corresponding quantize and dequantize operations in between 16bit and float operations, so that only supported 16bit operations are quantized and no error is thrown.

Does that make sense?

wwwind

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

symmetric 16-bit activation quantization

+/* Copyright 2019 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");

Let's rename this file to conv_per_channel_quantized_16x8_test.cc

wwwind

comment created time in 2 months

pull request commenttensorflow/tensorflow

[tflite] Fix and tests for the operator PACK

Hi, I believe the aribitrary_inputs and rescrict_same_input_output_scale params are sufficient, the tests are welcome additions though! Thanks!

wwwind

comment created time in 3 months

pull request commenttensorflow/tensorflow

Fix for minimum/maximum quantization problem

The c++ tests would still be welcome additions though. Thanks!

wwwind

comment created time in 3 months

pull request commenttensorflow/tensorflow

Fix for minimum/maximum quantization problem

Comment from Anton: "Hi @suharshs. Thanks for checking this PR. Can you please review the PR #34484 we raised to replace this one? - It extends your fix to cover for the case of inputs with different quantization parameters, and includes the C++ tests to cover both aspects of the problem."

Reply: I think the current code should apply rescales on the inputs correctly as it does for Concat operations. The combination of arbitrary_inputs and restrict_same_input_output_scale should be sufficient AFAICT. Does that make sense, or am I missing something?

wwwind

comment created time in 3 months

pull request commenttensorflow/tensorflow

[TFLite] Fix for minimum/maximum quantization problem

Sure will reply on that pr, thanks.

konstantinARM

comment created time in 3 months

PR closed tensorflow/tensorflow

Reviewers
[TFLite] Fix for minimum/maximum quantization problem cla: yes comp:lite size:M stat:awaiting response

Fix for minimum/maximum quantization problem

Problem description: MINIMUM and MAXIMUM operations were not quantized properly. Specifically, the problem was that only one of the inputs was quantized while another one was left in the original data type. Because of this reason, TFLite interpreter was failing to prepare since quantization params of ops did not match.

Problem cause: MINIMUM and MAXIMUM operators were created in the way that they only had one input, not two. Hence when looping through inputs while quantizing them, only one of two inputs was quantized.

Problem fix: Change the definition of the aforementioned properties.

This patch contains fixes for the problem described above.

  1. Properties of MINIMUM and MAXIMUM operators were altered to have two inputs rather than one. This is safe since both 1.* and 2.* branches have only two inputs for these ops
  2. Test suite for testing Minimum and Maximum ops quantization is added
  3. Two small binaries have been added for testing purposes
+77 -2

6 comments

7 changed files

konstantinARM

pr closed time in 3 months

pull request commenttensorflow/tensorflow

[TFLite] Fix for minimum/maximum quantization problem

Hi sorry for the delay, this change actual requires a different solution similar to Concat. I have submitted the fix here: https://github.com/tensorflow/tensorflow/commit/481366eab297011fed94ccc599e27825c905a18c

konstantinARM

comment created time in 3 months

pull request commenttensorflow/tensorflow

lite: enable (u)int8 quantization and int32 for ABS

Trying re-approving

jackwish

comment created time in 3 months

pull request commenttensorflow/tensorflow

lite: enable (u)int8 quantization and int32 for ABS

Hmm, strange I have approved, but am not seeing the internal review request.

jackwish

comment created time in 3 months

issue commenttensorflow/tensorflow

why invoke() take so much time? use pos_training_quantized=True to convert pb into a tflite model.

Right now post training quantization is only optimized for ARM CPU, not x86. So you will see speed up on mobile but not desktop just yet. We are working on x86 support.

zhuohuiyuan

comment created time in 3 months

issue commenttensorflow/tensorflow

Error trying to convert a model using full integer quantization

Hi @mapeima

We have code that is about to be submitted to enable the uint8 parameters in the 2.0 converter. The fastest path for you will be to wait until that is in the nightly, and then upgrade to using the new version of the converter that has support for IdentityN. Will let you know when that is ready.

Thanks! -Suharsh

mapeima

comment created time in 3 months

issue commenttensorflow/tensorflow

Tensorflow Lite fully integer quantization error :Got tensor of type STRING but expected type FLOAT32

Its seems that your representative dataset is feeding string tensors can you double check the type of input_value and make sure it is the correct type (FLOAT)

Eagle223

comment created time in 3 months

issue commenttensorflow/tensorflow

No module named 'tensorflow.tools.graph_transforms' in TF2.0

This is WIA. The code is still in the repository but this code is not exposed intentionally in TF 2.0 as graphs are not central in 2.0, look to Grappler as the place for TF rewrites now.

1duo

comment created time in 3 months

issue closedtensorflow/tensorflow

No module named 'tensorflow.tools.graph_transforms' in TF2.0

This functionality seems still included in TensorFlow 2.0, however, it raises an error when calling.

% ipython
Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 16:52:21) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.8.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from tensorflow.tools.graph_transforms import TransformGraph                                                 
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-1fd86d9792e0> in <module>
----> 1 from tensorflow.tools.graph_transforms import TransformGraph

ModuleNotFoundError: No module named 'tensorflow.tools.graph_transforms'

TensorFlow version:

% pip show tensorflow
Name: tensorflow
Version: 2.0.0
Summary: TensorFlow is an open source machine learning framework for everyone.
Home-page: https://www.tensorflow.org/
Author: Google Inc.
Author-email: packages@tensorflow.org
License: Apache 2.0
Location: /Volumes/data/venv-tf2/lib/python3.7/site-packages
Requires: gast, six, tensorboard, grpcio, termcolor, tensorflow-estimator, protobuf, keras-preprocessing, keras-applications, numpy, opt-einsum, absl-py, wrapt, wheel, astor, google-pasta
Required-by: tfcoreml

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.15
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): binary pip install
  • TensorFlow version (use command below): 2.0.0
  • Python version: 3.7.3
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A

closed time in 3 months

1duo

issue commenttensorflow/tensorflow

[tflite] Support INT8 quantization for PACK with TFLITE_BUILTINS_INT8 OpsSet

Hi, I have a CL in progress to fix this.

dreamibor

comment created time in 3 months

issue closedtensorflow/tensorflow

Inference time by quantized model is longer than that by non-quantized model in Tensorflow

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 1806
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: ARMv8 AARCH64
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 1.12.3
  • Python version: 2.7.15
  • Bazel version (if compiling from source): 0.15.0-dist
  • GCC/Compiler version (if compiling from source): 7.3
  • CUDA/cuDNN version: NA
  • GPU model and memory: NA

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior I want to measure Tensorflow inference time with different models on my 4-core ARMv8 CPU. I downloaded quantized and float point models from https://www.tensorflow.org/lite/guide/hosted_models. Then run benchmark_model to measure inference time. Please see attached picture for my benchmarking result. From the result, you can see the inference time of the quantized models is always a little bit longer than the float point model. I also did the same benchmark with TFLite, the inference time of the quantized models is much lower as I expect.

Describe the expected behavior I think the inference time of quantized models should be much lower than that of float point models.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

The tensorflow commands as below: For quantized models: echo "inception_v4" bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=model/classfication_int8/inception_v4/inception_v4.pb --input_layer=input --input_layer_type=float --input_layer_shape=1,299,299,3 --output_layer=InceptionV4/Logits/Predictions --show_run_order=false --num_threads=1 --show_flops

echo "mobilenet_v1_1.0_224"
bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=model/classfication_int8/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,224,224,3 --output_layer=MobilenetV1/Predictions/Reshape_1  --show_run_order=false --num_threads=1

echo "mobilenet_v2_1.0_224"
bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=model/classfication_int8/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.pb  --output_layer=output  --show_run_order=false --num_threads=1 --show_flops

For float point models: echo "inception_v4" bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=model/classfication-fp32/inception_v4/inception_v4.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,299,299,3 --output_layer=InceptionV4/Logits/Predictions --show_run_order=false --num_threads=1

echo "mobilenet_v1_1.0_224"
bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=model/classfication-fp32/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.pb --show_flops --input_layer=input --input_layer_type=float --input_layer_shape=1,224,224,3 --output_layer=MobilenetV1/Predictions/Reshape_1  --show_run_order=false --num_threads=1

echo "mobilenet_v2_1.0_224"
bazel-bin/tensorflow/tools/benchmark/benchmark_model --graph=model/classfication-fp32/mobilenet_v2_1.0_224/mobilenet_v2_1.0_224.pb --show_flops --input_layer=input --input_layer_type=float  --output_layer=MobilenetV2/Predictions/Reshape_1  --show_run_order=false --num_threads=1

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 5 months

ZhangShuoAlreadyExists

issue commenttensorflow/tensorflow

Inference time by quantized model is longer than that by non-quantized model in Tensorflow

Hi,

This is because the included models are not fully quantized for TensorFlow, but instead Quantization-aware trained model. This means they are float, but have FakeQuantWithMinMaxVars operations emulating the effect of quantization during training. The intention of these models is to emulate the effects of quantization in TensorFlow graphs so that they can be easily converted to TensorFlow Lite models.

Thanks!

ZhangShuoAlreadyExists

comment created time in 5 months

issue commenttensorflow/tensorflow

Controlling the Quantization Graph-Rewriter

We are working on a Keras approach to this that will allow the proper configurability. Closing this issue.

smcgregor

comment created time in 5 months

issue closedtensorflow/tensorflow

Controlling the Quantization Graph-Rewriter

System information

Not applicable to this feature request.

Describe the problem

Our (Syntiant Corp's) neural network inference chips support a continuous range of parameter and activation quantization levels for reducing power consumption. Consequently, we aggressively tune our quantization levels for each application. Based on the research literature and product datasheets we are seeing, it is highly likely there are other chip makers with similar requirements. TF's current graph re-writer approach finds matching blocks in the graph and wraps them in fake quantization operations. This approach poorly serves our use cases for the following reasons:

  1. Different layers can have different quantizations. The graph re-writing approach is global to the graph.
  2. The graph re-writer attempts to heuristically match the properties of operations that should be re-written. This will generally work for traditional stored-program architectures, but when you are meddling with layers to match silicon you need to drop into TensorBoard to figure out whether the re-writer picked up the unit. If the unit is not picked up, then you are better off not using the re-writer.
  3. We have little transparency into changes in the TF codebase on these features. With more explicit specification of layer quantization it is possible to know when the quantization assumptions change and we can track the latest releases of TF.

Our request: We would like to work with an API in which the quantization operations are more explicitly specified at the layer (Keras) or op level. We could then plug the API into our specification of neural network layers built to explicitly match the low-level operations implemented in silicon.

Thank you for open sourcing TF and your efforts in supporting the community. :)

For reference:

Source code / logs

Not applicable to this feature request.

closed time in 5 months

smcgregor

issue closedtensorflow/tensorflow

Fine-Grained Control Over TOCO Quantization

Our (Syntiant Corp’s) neural network inference chips use quantized weights and biases in order to minimize storage and energy consumption. The new Tensorflow experimental quantization feature tf.contrib.quantize.experimental_create_training_graph supports quantizing weights to between 2 and n bits, but the tf.contrib.lite.toco_convert tool currently only supports 8 bit quantization. As a result, we have to internally fork the TFLite pipeline before generating the Flatbuffer.

Feature request: Update TOCO to support arbitrary (i.e., 2 to n bit) signed fixed point quantization of weights and biases for both symmetric and asymmetric quantization. Our desired solution would process the quantization specified at the op or Keras layer level and not involve quantization specification within the TOCO tool API.

closed time in 5 months

deepconvneuralnet

issue commenttensorflow/tensorflow

Fine-Grained Control Over TOCO Quantization

We are working to make things more configurable in new converter iterations, will close this issue since its a bit stale.

deepconvneuralnet

comment created time in 5 months

issue closedtensorflow/models

[tflite][quantization][deeplabv3] Constant array MobilenetV2/expanded_conv_7/depthwise/depthwise_weights lacks MinMax information

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 16.04
  • TensorFlow installed from (source or binary):Source
  • TensorFlow version (use command below):1.9.0
  • Python version:2.7.12
  • Bazel version (if compiling from source):0.12.0
  • GCC/Compiler version (if compiling from source):5.4.0
  • CUDA/cuDNN version:cuda-9.0/7.0
  • GPU model and memory:GeForce GTX 1080/8105MiB
  • Phone:xiaomi5 (Snapdragon 820)
  • Exact command to reproduce: bazel run --config=opt //tensorflow/contrib/lite/toco:toco -- --input_file=/external_home/data/model/deeplabv3_mnv2_pascal_train_aug/frozen_inference_graph.pb --output_file=/external_home/data/model/deeplabv3_mnv2_pascal_train_aug/kanul.tflite --inference_type=QUANTIZED_UINT8 --input_shape=1,513,513,3 --input_array=sub_7 --output_array=ResizeBilinear_3

Describe the problem I have tried to quantize MobileNetV2 for deeplabV3+ with TFlite. But I fail to convert the model. From the following issue, I saw that the operations were not supported for the option of quantization.

https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md Checkpoint name: mobilenetv2_coco_voc_trainaug

As we can see graphs from the tensorboard, there is one big problem.

In "import/MobilenetV2/expanded_conv_7/depthwise/depthwise",

the operation of depthwise consists of the subgraph with 3 nodes: (depthwise) and BatchToSpaceND, SpaceToBatchND.

But, in "import/MobilenetV2/expanded_conv_6/depthwise/depthwise",

the operation of depthwise is DepthwiseConv2dNative itself.

From the difference, we can not quantize deeplabv3 based on mobilenetv2.

The one thing is that MobilenetV2/expanded_conv_7~16 does not have min/max value to be needed for quantization with tflite.

Although I implement the needed min/max value in hardcode_min_max.cc,

This model does not run well in mobile environments.

The ultimate problem is caused by the fact that depthwise_conv_7~16 consist of 3 nodes including BatchToSpaceND and SpaceToBatchND.

I request you to notify the method to resolve above issues.

Source code / logs

bazel run --config=opt //tensorflow/contrib/lite/toco:toco -- --input_file=/external_home/data/model/deeplabv3_mnv2_pascal_train_aug/frozen_inference_graph.pb --output_file=/external_home/data/model/deeplabv3_mnv2_pascal_train_aug/kanul.tflite --inference_type=QUANTIZED_UINT8 --input_shape=1,513,513,3 --input_array=sub_7 --output_array=ResizeBilinear_3

2018-07-11 04:40:01.330069: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before pre-quantization graph transformations: 166 operators, 340 arrays (1 quantized) 2018-07-11 04:40:01.330711: W tensorflow/contrib/lite/toco/graph_transformations/hardcode_min_max.cc:339] Tweaking the MinMax of array ResizeBilinear_1, which is an input to {Concatenation operator with output concat}, because we want all inputs and outputs of a Concatenation operator to have the same MinMax so that it can be implemented as a pure byte-copy, no arithmetic. 2018-07-11 04:40:01.332983: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] After pre-quantization graph transformations pass 1: 111 operators, 285 arrays (1 quantized) 2018-07-11 04:40:01.335731: I tensorflow/contrib/lite/toco/graph_transformations/graph_transformations.cc:39] Before quantization graph transformations: 111 operators, 285 arrays (1 quantized) 2018-07-11 04:40:01.337575: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_7/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.337670: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_7/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.337695: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_7/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.338553: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_8/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.338711: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_8/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.338786: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_8/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.339777: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_9/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.339918: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_9/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.339985: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_9/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.340933: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_10/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.341034: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_10/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.341059: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_10/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.342497: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_11/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.342593: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_11/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.342620: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_11/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.344311: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_12/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.344422: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_12/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.344452: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_12/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.345978: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_13/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.346094: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_13/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.346122: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_13/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.349163: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_14/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.349318: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_14/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.349351: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_14/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.353356: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_15/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.353511: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_15/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.353545: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_15/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.357264: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_16/depthwise/depthwise_weights lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.357400: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_16/depthwise/BatchNorm/FusedBatchNorm_mul_0_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy. 2018-07-11 04:40:01.357432: W tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:92] Constant array MobilenetV2/expanded_conv_16/depthwise/BatchNorm/FusedBatchNorm_add_param lacks MinMax information. To make up for that, we will now compute the MinMax from actual array elements. That will result in quantization parameters that probably do not match whichever arithmetic was used during training, and thus will probably be a cause of poor inference accuracy.

closed time in 5 months

kanul

issue commenttensorflow/models

[tflite][quantization][deeplabv3] Constant array MobilenetV2/expanded_conv_7/depthwise/depthwise_weights lacks MinMax information

This should be resolved in the newest version of TF AFAIK.

kanul

comment created time in 5 months

issue commenttensorflow/models

[tflite][quantization][deeplabv3] Constant array MobilenetV2/expanded_conv_7/depthwise/depthwise_weights lacks MinMax information

Hi @kanul ,

Are you able to convert to float tflite and see if the spacetobatch and batchtospace are fused?

kanul

comment created time in 5 months

issue closedtensorflow/models

TFLite toco failed to conver quantized model ( mobilenet_v1_1.0_224 ) to tflite format

Download the network model

MobileNet_v1_1.0_224_quant

https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet_v1.md Turn tflite error

bazel run --config=opt
tensorflow/contrib/lite/toco:toco --
--input_file=/Users/dchealth/Desktop/mobilenet/quantized_graph.pb
--output_file=/Users/dchealth/Desktop/mobilenet/frozen_graphnew.tflite
--input_type=FLOAT
--input_shape=1,128,128,3
--input_arrays=input
--output_arrays=MobilenetV1/Predictions/Reshape_1 INFO: Options provided by the client: Inherited 'common' options: --isatty=1 --terminal_columns=102 INFO: Reading rc options for 'run' from /Users/dchealth/tensorflow/tools/bazel.rc: Inherited 'build' options: --distinct_host_configuration=false --define framework_shared_object=true --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true --define=grpc_no_ares=true --spawn_strategy=standalone --genrule_strategy=standalone -c opt ERROR: Config value opt is not defined in any .rc file DCHealthdeMac-mini:toco dchealth$ --inference_type=QUANTIZED_UINT8 -bash: --inference_type=QUANTIZED_UINT8: command not found DCHealthdeMac-mini:toco dchealth$ --std_values=128 -bash: --std_values=128: command not found DCHealthdeMac-mini:toco dchealth$ --mean_values=128


System information Have I written custom code (as opposed to using a stock example script provided in TensorFlow):NO

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):mac os 10.13.5

TensorFlow installed from (source or binary):pip

TensorFlow version (use command below):'1.8.0

Python version: 3.6.4

Bazel version (if compiling from source): Build label: 0.14.0-homebrew Build target: bazel-out/darwin-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar Build time: Fri Jun 1 14:26:58 2018 (1527863218) Build timestamp: 1527863218 Build timestamp as int: 1527863218

GCC/Compiler version (if compiling from source):no

CUDA/cuDNN version:no

GPU model and memory:no

Exact command to reproduce:no

closed time in 5 months

baihualinxin

issue commenttensorflow/models

TFLite toco failed to conver quantized model ( mobilenet_v1_1.0_224 ) to tflite format

It looks like your entered the command in your terminal incorrectly:

DCHealthdeMac-mini:toco dchealth$ --inference_type=QUANTIZED_UINT8 -bash: --inference_type=QUANTIZED_UINT8: command not found DCHealthdeMac-mini:toco dchealth$ --std_values=128 -bash: --std_values=128: command not found DCHealthdeMac-mini:toco dchealth$ --mean_values=128

For object detection please follow the instructions here: https://github.com/tensorflow/models/tree/master/research/object_detection

If the error continues please file a new issue with the proper instructions for us to reproduce the error. Thanks!

baihualinxin

comment created time in 5 months

pull request commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

I will spend some time to think more about the requantization case for abs, for now i think going to int32 like you are doing is fine. Thanks!

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 inline std::vector<float> Dequantize(const std::vector<T>& data, float scale,   return f; } +float GetQuantizeTolerance(int min, int max) {+  float QuantizedStep = (max - min) / 255.0;

nit: Name QuantizedStep quantized_step and same for other names.

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 ::tensorflow::Status HardcodeMinMax::Run(Model* model, std::size_t op_index,       changed = HardcodeMinMaxForAverageOrMaxPool(model, op);       break; +    case OperatorType::kAbs:

is this hardcode case intended? This hardcodes the min/max of the output to the min/max of the input. this means if the abs input is [-2, -1] than the output range will be the same, which seems off.

Couple options: (1) Choose to not have a hardcode case here, instead relying on a Fakequant from contrib/quatnize? (2) Hardcode to: r = max(abs(input_min), (input_max)) [-r, r]

WYDT?

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 class Lstm : public BuiltinOperator<LstmCellOperator, ::tflite::LSTMOptions,       case LstmCellOperator::KERNEL_BASIC:         // KERNEL_BASIC was added in version 2.         return 2;+      default:+        return -1;

this was intentionally left out to have a compiler error if we miss an case

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 TEST_F(OperatorTest, SimpleOperators) {   CheckSimpleOperator<TensorFlowRankOperator>("RANK", OperatorType::kRank); } +TEST_F(OperatorTest, BuiltinAbs) {+  AbsOperator abs_op;+  abs_op.inputs = {"input"};+  auto operator_by_type_map = BuildOperatorByTypeMap(false /*enable_flex_ops*/);+  const BaseOperator* op = operator_by_type_map.at(abs_op.type).get();++  Model float_model;+  Array& input_float_array = float_model.GetOrCreateArray(abs_op.inputs[0]);+  input_float_array.data_type = ArrayDataType::kFloat;+  OperatorSignature float_signature = {.op = &abs_op, .model = &float_model};

unfortunately this .op notation breaks on come compilers :(

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 class DepthwiseConvolution   } }; +class Abs : public SimpleOperator<FloorDivOperator> {+ public:+  explicit Abs() : SimpleOperator("ABS", OperatorType::kAbs) {}+  int GetVersion(const OperatorSignature& op_signature) const override {+    const string& input_name = op_signature.op->inputs[0];+    const Array& input_array = op_signature.model->GetArray(input_name);+    // Version 2 supports signed/unsigned int8 and signed int32 input types.+    if (input_array.data_type == ArrayDataType::kInt8 ||+        input_array.data_type == ArrayDataType::kUint8 ||+        input_array.data_type == ArrayDataType::kInt32) {

thanks! please update version in op resolver too https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/register.cc#L161

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 struct ArithmeticParams {   int broadcast_shape[5]; }; +struct AbsParams {+  // uint8 inference params.

uint8/int8

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

 _RELU_TYPES = {'Relu', 'Relu6'}  _QUANTIZATION_OP = {'FakeQuantWithMinMaxVars'}-_VALID_SRC_OP = {'Add', 'AddV2', 'Mul'}-_INTERMEDIATE_OP = {'Add', 'AddV2', 'Mul'}+_VALID_SRC_OP = {'Add', 'AddV2', 'Mul', 'Sub', 'ConcatV2'}

could we add a test for the contrib/quantize changes, and move the contrib/quantize changes to a separate pr from the tflite changes.

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

Enhance Quantization-aware Training and TFLite tools/runtime for some operators

+/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.

nit: could you make these 2019

jackwish

comment created time in 6 months

Pull request review commenttensorflow/tensorflow

[tflite] Fix for CalibratorTest

 def input_gen():    def test_invalid_model_buffer(self):     float_model = b'\0' * 100-    with self.assertRaisesWithRegexpMatch(ValueError,+    with self.assertRaisesRegex(ValueError,                                           'Failed to parse the model'):

thanks could you also fix the formatting here, now that the indent has changed.

akarmi

comment created time in 6 months

more