profile
viewpoint

issue closedtensorflow/tensorflow

name: Bug Issue about: why the performance of nnapi is much lower than cpu . labels: 'type:bug_template'

#36088 # System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): android O
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: GLK-AL00,Mi 6,vivo NEX,MI 8 SE

Describe the problem

I write nnapidelegate in my Engine just like nnapidelegate in tensorflowlite. When i run the same model, the performance on devices GLK-AL00,Mi 6,vivo NEX,MI 8 SE is more than 200ms, on Pixel 2 XL is 25ms, on mi 9 ,SamSung G9700 is lower than 10ms. Who can tell me why the performance on GLK-AL00,Mi 6,vivo NEX,MI 8 SE is such lower than Pixel 2 XL.

closed time in 3 days

songxuemei

issue commenttensorflow/tensorflow

name: Bug Issue about: why the performance of nnapi is much lower than cpu . labels: 'type:bug_template'

nnapi is implemented by OEM, it's possible the hardware vendors have not provided fast implementation yet.

songxuemei

comment created time in 5 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#include <sys/types.h>++#include <stdio.h>+#include <algorithm>+#include <cmath>+#include <cstdint>+#include <cstdlib>+#include <iterator>+#include <limits>+#include <string>+#include <type_traits>+#include <vector>++#include <gtest/gtest.h>+#include "tensorflow/lite/kernels/internal/common.h"+#include "tensorflow/lite/kernels/internal/quantization_util.h"+#include "tensorflow/lite/kernels/internal/reference/depthwiseconv_float.h"+#include "tensorflow/lite/kernels/internal/reference/integer_ops/depthwise_conv.h"+#include "tensorflow/lite/kernels/internal/test_util.h"+#include "tensorflow/lite/kernels/internal/types.h"++namespace tflite {+namespace {++void PickOutputMultiplier(+    const DepthwiseParams& params, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    float* output_multiplier) {+  const int stride_width = params.stride_width;+  const int stride_height = params.stride_height;+  const int dilation_width_factor = params.dilation_width_factor;+  const int dilation_height_factor = params.dilation_height_factor;+  const int pad_width = params.padding_values.width;+  const int pad_height = params.padding_values.height;+  const int depth_multiplier = params.depth_multiplier;++  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int input_depth = input_shape.Dims(3);+  const int filter_height = filter_shape.Dims(1);+  const int filter_width = filter_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);++  std::int64_t output_accu_min = std::numeric_limits<std::int64_t>::max();+  std::int64_t output_accu_max = std::numeric_limits<std::int64_t>::min();++  for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int in_channel = 0; in_channel < input_depth; ++in_channel) {+          for (int m = 0; m < depth_multiplier; ++m) {+            const int output_channel = m + in_channel * depth_multiplier;+            const int in_x_origin = (out_x * stride_width) - pad_width;+            const int in_y_origin = (out_y * stride_height) - pad_height;+            std::int64_t acc = 0;+            for (int filter_y = 0; filter_y < filter_height; ++filter_y) {+              for (int filter_x = 0; filter_x < filter_width; ++filter_x) {+                const int in_x = in_x_origin + dilation_width_factor * filter_x;+                const int in_y =+                    in_y_origin + dilation_height_factor * filter_y;+                // Zero padding by omitting the areas outside the image.+                const bool is_point_inside_image =+                    (in_x >= 0) && (in_x < input_width) && (in_y >= 0) &&+                    (in_y < input_height);+                if (is_point_inside_image) {+                  int32 input_val = input_data[Offset(input_shape, batch, in_y,+                                                      in_x, in_channel)];+                  int32 filter_val = filter_data[Offset(+                      filter_shape, 0, filter_y, filter_x, output_channel)];+                  acc += static_cast<int64_t>(filter_val) *+                         static_cast<int64_t>(input_val);+                }+              }+            }+            if (bias_data) {+              acc += bias_data[output_channel];+            }+            output_accu_max = std::max(acc, output_accu_max);+            output_accu_min = std::min(acc, output_accu_min);+          }+        }+      }+    }+  }++  // Since int16 ranges from -32768 to 32767, we need to squeeze the accumulator+  // min/max fit in those ranges correspondingly as much as possible.+  if (std::abs(output_accu_max) > std::abs(output_accu_min)) {+    *output_multiplier = 32767.0f / std::abs(output_accu_max);+  } else {+    *output_multiplier = 32768.0f / std::abs(output_accu_min);+  }+}++void PickReasonableMultiplier(+    const DepthwiseParams& params, int output_activation_min,+    int output_activation_max, int output_depth,+    const RuntimeShape& input_shape_inference, const std::int16_t* input_data,+    const RuntimeShape& filter_shape_inference, const std::int8_t* filter_data,+    const RuntimeShape& bias_shape_inference, const std::int64_t* bias_data,+    const RuntimeShape& output_shape_inference,+    std::int32_t* output_multiplier_ptr, std::int32_t* output_shift_ptr,+    std::int16_t* output_data) {+  float output_multiplier;+  PickOutputMultiplier(params, input_shape_inference, input_data,+                       filter_shape_inference, filter_data,+                       bias_shape_inference, bias_data, output_shape_inference,+                       &output_multiplier);++  int base_multiplier;+  int base_shift;+  QuantizeMultiplier(output_multiplier, &base_multiplier, &base_shift);+  for (int i = 0; i < output_depth; ++i) {+    // multipliers typically range in [2^30 ; 2^31 - 1].+    // Values in [0, 2^30 - 1] are normally unused, but harmless.+    // Thus a good way to randomize multipliers is to subtract from them+    // a random value smaller than 2^30 but still significant compared to it.+    output_multiplier_ptr[i] = base_multiplier - (std::rand() % (1 << 26));+    output_shift_ptr[i] = base_shift - 1 + (std::rand() % 4);+  }+}++bool GenerateValidShapeConfigurations(+    int filter_width, int filter_height, int depth_multiplier,+    int dilation_width_factor, int dilation_height_factor,+    RuntimeShape* input_shape_inference, RuntimeShape* filter_shape_inference,+    RuntimeShape* output_shape_inference, int* pad_width, int* pad_height,+    int* stride) {+  const int batch = UniformRandomInt(1, 3);+  const int input_depth = 8 * ExponentialRandomPositiveInt(0.9f, 10, 50);+  const int input_width = UniformRandomInt(5, 50);+  const int input_height = UniformRandomInt(5, 50);+  *stride = UniformRandomInt(1, 2);+  const bool test_pad = UniformRandomInt(0, 1);+  const auto padding_type = test_pad ? PaddingType::kValid : PaddingType::kSame;++  const int output_depth = input_depth * depth_multiplier;++  input_shape_inference->BuildFrom(+      {batch, input_height, input_width, input_depth});++  filter_shape_inference->BuildFrom(+      {1, filter_height, filter_width, output_depth});++  EXPECT_TRUE(ComputeConvSizes(+      *input_shape_inference, output_depth, filter_width, filter_height,+      *stride, dilation_width_factor, dilation_height_factor, padding_type,+      output_shape_inference, pad_width, pad_height));++  return true;+}++void IntToFloat(std::vector<float>* d, std::vector<std::int8_t>* s) {+  for (unsigned int i = 0; i < s->size(); i++) {+    d->data()[i] = (float)s->data()[i];+  }+}++void IntToFloat(std::vector<float>* d, std::vector<std::int64_t>* s) {+  for (unsigned int i = 0; i < s->size(); i++) {+    d->data()[i] = (float)s->data()[i];+  }+}++void TryTestOneDepthwiseConv3x3Filter() {

the reference implementation does not limit to 3x3 right? consider make filter configurable and extend the test, thanks!

wwwind

comment created time in 15 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 TEST_P(PerChannelQuantizedDepthwiseConvolutionOpTest, Simple3x3FilterTest) {               ElementsAreArray(ArrayFloatNear({9, 18, 0, 0, 36, 54, 0, 0}))); } +class PerChannelQuantizedDepthwiseConvolutionOpModel16x8

looks like the test class is setup, but the test cases are not added?

wwwind

comment created time in 15 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 TfLiteStatus EvalQuantizedPerChannel(TfLiteContext* context, TfLiteNode* node,   return kTfLiteOk; } +TfLiteStatus EvalQuantizedPerChannel16x8(+    TfLiteContext* context, TfLiteNode* node, TfLiteDepthwiseConvParams* params,+    OpData* data, const TfLiteTensor* input, const TfLiteTensor* filter,+    const TfLiteTensor* bias, TfLiteTensor* output) {+  DepthwiseParams op_params;+  op_params.padding_type = PaddingType::kSame;

I'm not sure what's the best way to leave a TODO for OSS, +jdduke

wwwind

comment created time in 15 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 TfLiteStatus EvalQuantizedPerChannel(TfLiteContext* context, TfLiteNode* node,   return kTfLiteOk; } +TfLiteStatus EvalQuantizedPerChannel16x8(TfLiteDepthwiseConvParams* params,+                                         OpData* data,

params and data can be const right?

wwwind

comment created time in 15 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 TfLiteStatus EvalQuantizedPerChannel(TfLiteContext* context, TfLiteNode* node,   return kTfLiteOk; } +TfLiteStatus EvalQuantizedPerChannel16x8(+    TfLiteContext* context, TfLiteNode* node, TfLiteDepthwiseConvParams* params,+    OpData* data, const TfLiteTensor* input, const TfLiteTensor* filter,+    const TfLiteTensor* bias, TfLiteTensor* output) {+  DepthwiseParams op_params;+  op_params.padding_type = PaddingType::kSame;

this should read from params right? if valid padding is not supported for now, please leave a TODO

wwwind

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#include <sys/types.h>++#include <stdio.h>+#include <algorithm>+#include <cmath>+#include <cstdint>+#include <cstdlib>+#include <iterator>+#include <limits>+#include <string>+#include <type_traits>+#include <vector>++#include <gtest/gtest.h>+#include "tensorflow/lite/kernels/internal/common.h"+#include "tensorflow/lite/kernels/internal/quantization_util.h"+#include "tensorflow/lite/kernels/internal/reference/depthwiseconv_float.h"+#include "tensorflow/lite/kernels/internal/reference/integer_ops/depthwise_conv.h"+#include "tensorflow/lite/kernels/internal/test_util.h"+#include "tensorflow/lite/kernels/internal/types.h"++namespace tflite {+namespace {++void PickOutputMultiplier(+    const DepthwiseParams& params, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    float* output_multiplier) {+  const int stride_width = params.stride_width;+  const int stride_height = params.stride_height;+  const int dilation_width_factor = params.dilation_width_factor;+  const int dilation_height_factor = params.dilation_height_factor;+  const int pad_width = params.padding_values.width;+  const int pad_height = params.padding_values.height;+  const int depth_multiplier = params.depth_multiplier;++  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int input_depth = input_shape.Dims(3);+  const int filter_height = filter_shape.Dims(1);+  const int filter_width = filter_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);++  std::int64_t output_accu_min = std::numeric_limits<std::int64_t>::max();+  std::int64_t output_accu_max = std::numeric_limits<std::int64_t>::min();++  for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int in_channel = 0; in_channel < input_depth; ++in_channel) {+          for (int m = 0; m < depth_multiplier; ++m) {+            const int output_channel = m + in_channel * depth_multiplier;+            const int in_x_origin = (out_x * stride_width) - pad_width;+            const int in_y_origin = (out_y * stride_height) - pad_height;+            std::int64_t acc = 0;+            for (int filter_y = 0; filter_y < filter_height; ++filter_y) {+              for (int filter_x = 0; filter_x < filter_width; ++filter_x) {+                const int in_x = in_x_origin + dilation_width_factor * filter_x;+                const int in_y =+                    in_y_origin + dilation_height_factor * filter_y;+                // Zero padding by omitting the areas outside the image.+                const bool is_point_inside_image =+                    (in_x >= 0) && (in_x < input_width) && (in_y >= 0) &&+                    (in_y < input_height);+                if (is_point_inside_image) {+                  int32 input_val = input_data[Offset(input_shape, batch, in_y,+                                                      in_x, in_channel)];+                  int32 filter_val = filter_data[Offset(+                      filter_shape, 0, filter_y, filter_x, output_channel)];+                  acc += static_cast<int64_t>(filter_val) *+                         static_cast<int64_t>(input_val);+                }+              }+            }+            if (bias_data) {+              acc += bias_data[output_channel];+            }+            output_accu_max = std::max(acc, output_accu_max);+            output_accu_min = std::min(acc, output_accu_min);+          }+        }+      }+    }+  }++  // Since int16 ranges from -32768 to 32767, we need to squeeze the accumulator+  // min/max fit in those ranges correspondingly as much as possible.+  if (std::abs(output_accu_max) > std::abs(output_accu_min)) {+    *output_multiplier = 32767.0f / std::abs(output_accu_max);+  } else {+    *output_multiplier = 32768.0f / std::abs(output_accu_min);+  }+}++void PickReasonableMultiplier(+    const DepthwiseParams& params, int output_activation_min,+    int output_activation_max, int output_depth,+    const RuntimeShape& input_shape_inference, const std::int16_t* input_data,+    const RuntimeShape& filter_shape_inference, const std::int8_t* filter_data,+    const RuntimeShape& bias_shape_inference, const std::int64_t* bias_data,+    const RuntimeShape& output_shape_inference,+    std::int32_t* output_multiplier_ptr, std::int32_t* output_shift_ptr,+    std::int16_t* output_data) {+  float output_multiplier;+  PickOutputMultiplier(params, input_shape_inference, input_data,+                       filter_shape_inference, filter_data,+                       bias_shape_inference, bias_data, output_shape_inference,+                       &output_multiplier);++  int base_multiplier;+  int base_shift;+  QuantizeMultiplier(output_multiplier, &base_multiplier, &base_shift);+  for (int i = 0; i < output_depth; ++i) {+    // multipliers typically range in [2^30 ; 2^31 - 1].+    // Values in [0, 2^30 - 1] are normally unused, but harmless.+    // Thus a good way to randomize multipliers is to subtract from them+    // a random value smaller than 2^30 but still significant compared to it.+    output_multiplier_ptr[i] = base_multiplier - (std::rand() % (1 << 26));+    output_shift_ptr[i] = base_shift - 1 + (std::rand() % 4);+  }+}++bool GenerateValidShapeConfigurations(+    int filter_width, int filter_height, int depth_multiplier,+    int dilation_width_factor, int dilation_height_factor,+    RuntimeShape* input_shape_inference, RuntimeShape* filter_shape_inference,+    RuntimeShape* output_shape_inference, int* pad_width, int* pad_height,+    int* stride) {+  const int batch = UniformRandomInt(1, 3);+  const int input_depth = 8 * ExponentialRandomPositiveInt(0.9f, 10, 50);+  const int input_width = UniformRandomInt(5, 50);+  const int input_height = UniformRandomInt(5, 50);+  *stride = UniformRandomInt(1, 2);+  const bool test_pad = UniformRandomInt(0, 1);+  const auto padding_type = test_pad ? PaddingType::kValid : PaddingType::kSame;++  const int output_depth = input_depth * depth_multiplier;++  input_shape_inference->BuildFrom(+      {batch, input_height, input_width, input_depth});++  filter_shape_inference->BuildFrom(+      {1, filter_height, filter_width, output_depth});++  EXPECT_TRUE(ComputeConvSizes(+      *input_shape_inference, output_depth, filter_width, filter_height,+      *stride, dilation_width_factor, dilation_height_factor, padding_type,+      output_shape_inference, pad_width, pad_height));++  return true;+}++void IntToFloat(std::vector<float>* d, std::vector<std::int8_t>* s) {+  for (unsigned int i = 0; i < s->size(); i++) {+    d->data()[i] = (float)s->data()[i];+  }+}++void IntToFloat(std::vector<float>* d, std::vector<std::int64_t>* s) {+  for (unsigned int i = 0; i < s->size(); i++) {+    d->data()[i] = (float)s->data()[i];+  }+}++void TryTestOneDepthwiseConv3x3Filter() {

what does this test against? the original test is to use the reference impl as the ground truth and make sure both the neon impl & asm impl (for fast 3x3 kernel) agree with the reference impl.

seems the neon impl is not there yet?

wwwind

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 TfLiteStatus EvalQuantizedPerChannel(TfLiteContext* context, TfLiteNode* node,   return kTfLiteOk; } +TfLiteStatus EvalQuantizedPerChannel16x8(+    TfLiteContext* context, TfLiteNode* node, TfLiteDepthwiseConvParams* params,

nit: please remove unused arguments like node or context, and make arguments const wherever possible.

wwwind

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 inline void DepthwiseConvPerChannel(   } } +inline void DepthwiseConvPerChannel(+    const DepthwiseParams& params, const int32* output_multiplier,+    const int32* output_shift, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    int16* output_data) {+  // Get parameters.+  const int stride_width = params.stride_width;+  const int stride_height = params.stride_height;+  const int dilation_width_factor = params.dilation_width_factor;+  const int dilation_height_factor = params.dilation_height_factor;+  const int pad_width = params.padding_values.width;+  const int pad_height = params.padding_values.height;+  const int depth_multiplier = params.depth_multiplier;+  const int32 output_activation_min = params.quantized_activation_min;+  const int32 output_activation_max = params.quantized_activation_max;++  // Check dimensions of the tensors.+  TFLITE_DCHECK_EQ(input_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(filter_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(output_shape.DimensionsCount(), 4);++  TFLITE_DCHECK_LE(output_activation_min, output_activation_max);+  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int output_depth = MatchingDim(filter_shape, 3, output_shape, 3);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int input_depth = input_shape.Dims(3);+  const int filter_height = filter_shape.Dims(1);+  const int filter_width = filter_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);+  TFLITE_DCHECK_EQ(output_depth, input_depth * depth_multiplier);+  TFLITE_DCHECK_EQ(bias_shape.FlatSize(), output_depth);++  for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int in_channel = 0; in_channel < input_depth; ++in_channel) {+          for (int m = 0; m < depth_multiplier; ++m) {+            const int output_channel = m + in_channel * depth_multiplier;+            const int in_x_origin = (out_x * stride_width) - pad_width;+            const int in_y_origin = (out_y * stride_height) - pad_height;+            std::int64_t acc = 0;+            for (int filter_y = 0; filter_y < filter_height; ++filter_y) {+              for (int filter_x = 0; filter_x < filter_width; ++filter_x) {+                const int in_x = in_x_origin + dilation_width_factor * filter_x;+                const int in_y =+                    in_y_origin + dilation_height_factor * filter_y;+                // Zero padding by omitting the areas outside the image.+                const bool is_point_inside_image =+                    (in_x >= 0) && (in_x < input_width) && (in_y >= 0) &&+                    (in_y < input_height);+                if (is_point_inside_image) {+                  int32 input_val = input_data[Offset(input_shape, batch, in_y,+                                                      in_x, in_channel)];+                  int32 filter_val = filter_data[Offset(+                      filter_shape, 0, filter_y, filter_x, output_channel)];+                  // Accumulate with 64 bits accumulator.+                  // We assume maximum of 2^16 accumulations as with the 8-bit+                  // case so actually the value in the accumulator should not+                  // exceed 40 bits+                  acc += static_cast<int64_t>(filter_val) *+                         static_cast<int64_t>(input_val);+                }+              }+            }+            if (bias_data) {+              acc += bias_data[output_channel];+            }+            int32 scaled_acc = MultiplyByQuantizedMultiplier(

MultiplyByQuantizedMultiplier takes int32 input right?

wwwind

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

 namespace neg { constexpr int kInputTensor = 0; constexpr int kOutputTensor = 0; +struct OpDataInt8 {

I mean the name should be OpData, int8 or uint8 or other datatypes can be just different fields in OpData. (it's easier for us to allocate one type)

lamarrr

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 inline void DepthwiseConvPerChannel(   } } +inline void DepthwiseConvPerChannel(+    const DepthwiseParams& params, const int32* output_multiplier,+    const int32* output_shift, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    int16* output_data) {+  // Get parameters.+    const int stride_width = params.stride_width;

nit: indent

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 inline void DepthwiseConvPerChannel(   } } +inline void DepthwiseConvPerChannel(+    const DepthwiseParams& params, const int32* output_multiplier,+    const int32* output_shift, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    int16* output_data) {+  // Get parameters.+    const int stride_width = params.stride_width;+  const int stride_height = params.stride_height;+  const int dilation_width_factor = params.dilation_width_factor;+  const int dilation_height_factor = params.dilation_height_factor;+  const int pad_width = params.padding_values.width;+  const int pad_height = params.padding_values.height;+  const int depth_multiplier = params.depth_multiplier;+  const int32 output_activation_min = params.quantized_activation_min;+  const int32 output_activation_max = params.quantized_activation_max;+    // Check dimensions of the tensors.+  TFLITE_DCHECK_EQ(input_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(filter_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(output_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_LE(output_activation_min, output_activation_max);+  +  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int output_depth = MatchingDim(filter_shape, 3, output_shape, 3);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int input_depth = input_shape.Dims(3);+  const int filter_height = filter_shape.Dims(1);+  const int filter_width = filter_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);+  +  TFLITE_DCHECK_EQ(output_depth, input_depth * depth_multiplier);+  TFLITE_DCHECK_EQ(bias_shape.FlatSize(), output_depth);+  +    for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int in_channel = 0; in_channel < input_depth; ++in_channel) {+          for (int m = 0; m < depth_multiplier; ++m) {+            const int output_channel = m + in_channel * depth_multiplier;+            const int in_x_origin = (out_x * stride_width) - pad_width;+            const int in_y_origin = (out_y * stride_height) - pad_height;+            +            std::int64_t acc = 0;+            +            for (int filter_y = 0; filter_y < filter_height; ++filter_y) {+              for (int filter_x = 0; filter_x < filter_width; ++filter_x) {+                const int in_x = in_x_origin + dilation_width_factor * filter_x;+                const int in_y =+                    in_y_origin + dilation_height_factor * filter_y;+                // Zero padding by omitting the areas outside the image.+                const bool is_point_inside_image =+                    (in_x >= 0) && (in_x < input_width) && (in_y >= 0) &&+                    (in_y < input_height);+                if (is_point_inside_image) {+                  int32 input_val = input_data[Offset(input_shape, batch, in_y,+                                                      in_x, in_channel)];+                  int32 filter_val = filter_data[Offset(+                      filter_shape, 0, filter_y, filter_x, output_channel)];+                  +                  // Accumulate with 64 bits accumulator.+                  // We assume maximum of 2^16 accumulations as with the 8-bit+                  // case so actually the value in the accumulator should not+                  // exceed 40 bits+                  acc += static_cast<int64_t>(filter_val) *+                         static_cast<int64_t>(input_val);+                }+              }+            }+            if (bias_data) {+              acc += bias_data[output_channel];+            }+            int32 scaled_acc = MultiplyByQuantizedMultiplier(+                acc, output_multiplier[output_channel],+                output_shift[output_channel]);+            scaled_acc = std::max(scaled_acc, output_activation_min);+            scaled_acc = std::min(scaled_acc, output_activation_max);+            output_data[Offset(output_shape, batch, out_y, out_x,+                               output_channel)] =+                static_cast<int16_t>(scaled_acc);+            +            acc += filter_val * (input_val - input_offset[batch]);+                }

this style looks a little bit odd to me, can you use clang format the file? thanks

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel for DEPTHWISE_CONV_2D

 inline void DepthwiseConvPerChannel(   } } +inline void DepthwiseConvPerChannel(+    const DepthwiseParams& params, const int32* output_multiplier,+    const int32* output_shift, const RuntimeShape& input_shape,+    const int16* input_data, const RuntimeShape& filter_shape,+    const int8* filter_data, const RuntimeShape& bias_shape,+    const std::int64_t* bias_data, const RuntimeShape& output_shape,+    int16* output_data) {+  // Get parameters.+    const int stride_width = params.stride_width;+  const int stride_height = params.stride_height;+  const int dilation_width_factor = params.dilation_width_factor;+  const int dilation_height_factor = params.dilation_height_factor;+  const int pad_width = params.padding_values.width;+  const int pad_height = params.padding_values.height;+  const int depth_multiplier = params.depth_multiplier;+  const int32 output_activation_min = params.quantized_activation_min;+  const int32 output_activation_max = params.quantized_activation_max;+    // Check dimensions of the tensors.+  TFLITE_DCHECK_EQ(input_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(filter_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(output_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_LE(output_activation_min, output_activation_max);+  +  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int output_depth = MatchingDim(filter_shape, 3, output_shape, 3);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int input_depth = input_shape.Dims(3);+  const int filter_height = filter_shape.Dims(1);+  const int filter_width = filter_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);+  +  TFLITE_DCHECK_EQ(output_depth, input_depth * depth_multiplier);+  TFLITE_DCHECK_EQ(bias_shape.FlatSize(), output_depth);+  +    for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int in_channel = 0; in_channel < input_depth; ++in_channel) {+          for (int m = 0; m < depth_multiplier; ++m) {+            const int output_channel = m + in_channel * depth_multiplier;+            const int in_x_origin = (out_x * stride_width) - pad_width;+            const int in_y_origin = (out_y * stride_height) - pad_height;+            +            std::int64_t acc = 0;+            +            for (int filter_y = 0; filter_y < filter_height; ++filter_y) {+              for (int filter_x = 0; filter_x < filter_width; ++filter_x) {+                const int in_x = in_x_origin + dilation_width_factor * filter_x;+                const int in_y =+                    in_y_origin + dilation_height_factor * filter_y;+                // Zero padding by omitting the areas outside the image.+                const bool is_point_inside_image =+                    (in_x >= 0) && (in_x < input_width) && (in_y >= 0) &&+                    (in_y < input_height);+                if (is_point_inside_image) {+                  int32 input_val = input_data[Offset(input_shape, batch, in_y,+                                                      in_x, in_channel)];+                  int32 filter_val = filter_data[Offset(+                      filter_shape, 0, filter_y, filter_x, output_channel)];+                  +                  // Accumulate with 64 bits accumulator.+                  // We assume maximum of 2^16 accumulations as with the 8-bit+                  // case so actually the value in the accumulator should not+                  // exceed 40 bits+                  acc += static_cast<int64_t>(filter_val) *+                         static_cast<int64_t>(input_val);+                }+              }+            }+            if (bias_data) {+              acc += bias_data[output_channel];+            }+            int32 scaled_acc = MultiplyByQuantizedMultiplier(+                acc, output_multiplier[output_channel],+                output_shift[output_channel]);+            scaled_acc = std::max(scaled_acc, output_activation_min);+            scaled_acc = std::min(scaled_acc, output_activation_max);+            output_data[Offset(output_shape, batch, out_y, out_x,+                               output_channel)] =+                static_cast<int16_t>(scaled_acc);+            +            acc += filter_val * (input_val - input_offset[batch]);+                }+              }+            }+            float acc_float = static_cast<float>(acc);+            acc_float *=

why here is doing float accumulation and where does the scaling_factors_ptr come from? is it misplaced?

wwwind

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel operators MAX_POOL_2D and AVERAGE_POOL_2D

 inline void MaxPool(const PoolParams& params, const RuntimeShape& input_shape,   } } +inline void AveragePool(const PoolParams& params,+                        const RuntimeShape& input_shape,+                        const int16* input_data,+                        const RuntimeShape& output_shape, int16* output_data) {+  TFLITE_DCHECK_LE(params.quantized_activation_min,+                   params.quantized_activation_max);+  TFLITE_DCHECK_EQ(input_shape.DimensionsCount(), 4);+  TFLITE_DCHECK_EQ(output_shape.DimensionsCount(), 4);+  const int batches = MatchingDim(input_shape, 0, output_shape, 0);+  const int depth = MatchingDim(input_shape, 3, output_shape, 3);+  const int input_height = input_shape.Dims(1);+  const int input_width = input_shape.Dims(2);+  const int output_height = output_shape.Dims(1);+  const int output_width = output_shape.Dims(2);+  const int stride_height = params.stride_height;+  const int stride_width = params.stride_width;+  for (int batch = 0; batch < batches; ++batch) {+    for (int out_y = 0; out_y < output_height; ++out_y) {+      for (int out_x = 0; out_x < output_width; ++out_x) {+        for (int channel = 0; channel < depth; ++channel) {+          const int in_x_origin =+              (out_x * stride_width) - params.padding_values.width;+          const int in_y_origin =+              (out_y * stride_height) - params.padding_values.height;+          // Compute the boundaries of the filter region clamped so as to+          // ensure that the filter window fits in the input array.+          const int filter_x_start = std::max(0, -in_x_origin);+          const int filter_x_end =+              std::min(params.filter_width, input_width - in_x_origin);+          const int filter_y_start = std::max(0, -in_y_origin);+          const int filter_y_end =+              std::min(params.filter_height, input_height - in_y_origin);+          int32 acc = 0;+          int filter_count = 0;+          for (int filter_y = filter_y_start; filter_y < filter_y_end;+               ++filter_y) {+            for (int filter_x = filter_x_start; filter_x < filter_x_end;+                 ++filter_x) {+              const int in_x = in_x_origin + filter_x;+              const int in_y = in_y_origin + filter_y;+              acc +=+                  input_data[Offset(input_shape, batch, in_y, in_x, channel)];+              filter_count++;+            }+          }+          // Round to the closest integer value.+          acc = acc > 0 ? (acc + filter_count / 2) / filter_count

nit: ideally this should be integer-only, but I guess it's a little bit tricky for "SAME" padding case since the filter window is not a constant.

feel free to ignore this comment as well.

wwwind

comment created time in 21 days

pull request commenttensorflow/tensorflow

Symmetric quantization with activations 16-bit and weights 8-bit: interface

Sorry for the late response, adding Suharsh to take a look. thanks!

wwwind

comment created time in 21 days

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

adding Tiezhen to take a look at the micro kernel & test

lamarrr

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_+#define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_++#include "tensorflow/lite/kernels/internal/common.h"+#include "tensorflow/lite/kernels/internal/types.h"++namespace tflite {+namespace reference_integer_ops {++// Quantized Negate with int8 input and output, input and output must have an+// equal scale+// [zero_point_sum] represents the sum of the input and output zero points+inline void Negate(const RuntimeShape& input_shape, const int8_t* input_data,+                   const RuntimeShape& output_shape, int8_t* output_data,+                   int16_t zero_point_sum) {+  // where: output, output_zero_point, input_zero_point ∈ [-128, 127] : int8+  // zero_point_sum = (input_zero_point + output_zero_point)+  // equation: output = zero_point_sum - input+  // highest possible value for zero_point_sum = 127 + 127 = 254+  // lowest possible value for zero_point_sum = -128 + (-128) = -256+  // lowest possible neg value = lowest zero_point_sum - 127 = -256 - 127 =+  // -383+  // highest possible neg value = highest zero_point_sum - (-128) = 254 + 128+  // = 382+  // thus, accumulate on int16 [-383, 382]++  constexpr auto kI8Min =

nit: prefer explicit type

lamarrr

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

 TEST(NegOpModel, NegInt64) {   EXPECT_THAT(m.GetOutput<int64_t>(), ElementsAreArray({2, 1, 0, -1, -2, -3})); } +class QuantizedNegOpModel : public NegOpModel {+ public:+  using NegOpModel::NegOpModel;++  int input() { return input_; }++  template <typename integer_dtype>+  std::vector<float> GetDequantizedOutput() {+    return Dequantize<integer_dtype>(ExtractVector<integer_dtype>(output_),+                                     GetScale(output_), GetZeroPoint(output_));+  }+};++constexpr float GetToleranceInt8(int min, int max) {+  float kQuantizedStep = (max - min) / 255.0;+  return kQuantizedStep;+}++constexpr float GetToleranceInt16(float min, float max) {+  float kQuantizedStep = (max - min) / std::numeric_limits<int16_t>::max();+  return kQuantizedStep;+}++// input_quantization_buffer: buffer used for the quantization data+TEST(QuantizedNegOpModel, NegQuantizedInt8) {+  constexpr auto min = -6.0f;+  constexpr auto max = 6.0f;+  constexpr auto quantized_tolerance = GetToleranceInt8(min, max);++  const auto expected_output =+      std::vector<float>{3.5f, 2.0f, 1.0f, 0.0f, -1.0f, -2.0f, -3.0f, -3.5f};++  QuantizedNegOpModel model{{TensorType_INT8, {1, 2, 2, 2, 1}, min, max},+                            {TensorType_INT8, {1, 2, 2, 2, 1}, min, max}};+  model.QuantizeAndPopulate<int8_t>(+      model.input(), {-3.5f, -2.f, -1.f, 0.f, 1.f, 2.f, 3.f, 3.5f});+  model.Invoke();++  EXPECT_THAT(+      model.GetDequantizedOutput<int8_t>(),+      ElementsAreArray(ArrayFloatNear(expected_output, quantized_tolerance)));

nit: since we can get the result pretty accurate, we don't need a large tolerance right?

lamarrr

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

 namespace neg { constexpr int kInputTensor = 0; constexpr int kOutputTensor = 0; +struct OpDataInt8 {

just OpData

lamarrr

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_+#define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_++#include "tensorflow/lite/kernels/internal/common.h"+#include "tensorflow/lite/kernels/internal/types.h"++namespace tflite {+namespace reference_integer_ops {++// Quantized Negate with int8 input and output, input and output must have an+// equal scale+// [zero_point_sum] represents the sum of the input and output zero points+inline void Negate(const RuntimeShape& input_shape, const int8_t* input_data,+                   const RuntimeShape& output_shape, int8_t* output_data,+                   int16_t zero_point_sum) {+  // where: output, output_zero_point, input_zero_point ∈ [-128, 127] : int8+  // zero_point_sum = (input_zero_point + output_zero_point)+  // equation: output = zero_point_sum - input+  // highest possible value for zero_point_sum = 127 + 127 = 254+  // lowest possible value for zero_point_sum = -128 + (-128) = -256+  // lowest possible neg value = lowest zero_point_sum - 127 = -256 - 127 =+  // -383+  // highest possible neg value = highest zero_point_sum - (-128) = 254 + 128+  // = 382+  // thus, accumulate on int16 [-383, 382]++  constexpr auto kI8Min =+      static_cast<int16_t>(std::numeric_limits<int8_t>::min());+  constexpr auto kI8Max =+      static_cast<int16_t>(std::numeric_limits<int8_t>::max());++  const int flat_size = MatchingFlatSize(input_shape, output_shape);++  for (int i = 0; i < flat_size; ++i) {+    // all operations accumulated on int16+    const int16_t neg = zero_point_sum - static_cast<int16_t>(input_data[i]);+    const auto clamped_neg = std::min(std::max(neg, kI8Min), kI8Max);+    output_data[i] = static_cast<int8_t>(clamped_neg);+  }+}++}  // namespace reference_integer_ops+}  // namespace tflite+#endif  // TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_

nit:please add a newline

lamarrr

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

[TFLite int16] 16-bit reference kernel FULLY_CONNECTED

 inline TfLiteStatus CheckTypes(TfLiteContext* context,    // optional bias tensor.   const bool is_optional_bias_float = !bias || (bias->type == kTfLiteFloat32);-  const bool is_optional_bias_int = !bias || (bias->type == kTfLiteInt32);+  const bool is_optional_bias_int =+      !bias || (bias->type == kTfLiteInt32) || (bias->type == kTfLiteInt64);

curious why bias needs to be int64?

wwwind

comment created time in a month

pull request commenttensorflow/tensorflow

[TFLite] 16-bit version of MUL reference kernel operator

My understanding is the kernel implementation needs to reflect the quantization schema. +Suharsh Sivakumar suharshs@google.com may help answer that.

Thanks!

On Wed, Jan 22, 2020 at 6:18 PM Elena Zhelezina notifications@github.com wrote:

@wwwind commented on this pull request.

In tensorflow/lite/kernels/mul.cc https://github.com/tensorflow/tensorflow/pull/36101#discussion_r369474903 :

@@ -183,6 +184,22 @@ TfLiteStatus EvalQuantized(TfLiteContext* context, TfLiteNode* node, TF_LITE_MUL(optimized_integer_ops, Mul, int8_t); } }

  • } else if (input1->type == kTfLiteInt16) {
  •  // We have this check, because in case of int16
    
  •  // input1_val*input2_val can overflow int32:
    
  •  // see MulElementwise -
    
  •  // tensorflow/lite/kernels/internal/reference/integer_ops/mul.h in case of
    
  •  // 16-bit this function is used in symmetric quantization, so offset
    
  •  // should be zero.
    
  •  TF_LITE_ENSURE_EQ(context, op_params.input1_offset, 0.0);
    

Hi @renjie-liu https://github.com/renjie-liu, Thank you for the review.

We introduce this kernel for the quantization scheme: activations in 16-bit and weights in 8-bit. The range of 16-bit is quite large and it is sufficient, if we do only symmetric quantization. That's why we consider only symmetric quantization for 16-bit reference kernels. Should operator specifications reflect this case ? Should we have a special "restricted_value" property for 16-bit that shows that zero point is zero in this case ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/36101?email_source=notifications&email_token=AIURNGLW6AOEKBO2PZVBUZLQ7AMRHA5CNFSM4KJVCHJKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCSTNG3Y#discussion_r369474903, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGI4AHIUBWOQD7LM5ETQ7AMRHANCNFSM4KJVCHJA .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

wwwind

comment created time in a month

Pull request review commenttensorflow/tensorflow

[TFLite] 16-bit version of MUL reference kernel operator

 TfLiteStatus EvalQuantized(TfLiteContext* context, TfLiteNode* node,           TF_LITE_MUL(optimized_integer_ops, Mul, int8_t);         }       }+    } else if (input1->type == kTfLiteInt16) {+      // We have this check, because in case of int16+      // input1_val*input2_val can overflow int32:+      // see MulElementwise -+      // tensorflow/lite/kernels/internal/reference/integer_ops/mul.h in case of+      // 16-bit this function is used in symmetric quantization, so offset+      // should be zero.+      TF_LITE_ENSURE_EQ(context, op_params.input1_offset, 0.0);

are you sure symmetric quantization is the only case?

didn't see the specifications: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/optimize/operator_property.cc#L772-L775

wwwind

comment created time in a month

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

the prepare should be ran only once, the eval will be called every time the tflite interpreter got invoked. I'm fine if you find putting the pre-compute logic in prepare is tricky.

can but you add the kernel to tflite(not just micro) and test as well?

thanks!

lamarrr

comment created time in a month

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

you should be able to find examples about how to precompute stuff in prepare, for example, see fully_connected (precompute multiplier): https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/fully_connected.cc#L168-L178

also, for zero point, you can get from quantization params: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/c/common.h#L349

lamarrr

comment created time in a month

issue commenttensorflow/tensorflow

Creating outputs from the final_state returned by dynamic_rnn causes conversion to fail

great, looks like you got an answer already! :)

xkr1

comment created time in a month

issue commenttensorflow/tensorflow

Bug when convert to tflite models.

Can you try set

"converter.experimental_new_converter = True" ?

Thanks,

On Wed, Jan 15, 2020 at 2:14 PM dathudeptrai notifications@github.com wrote:

Still same error @renjie-liu https://github.com/renjie-liu, i cast it to tf.int32.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/35661?email_source=notifications&email_token=AIURNGPENCS3GJT23M6IK33Q52SUVA5CNFSM4KEDGWOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI7F5CA#issuecomment-574512776, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGIW3GUSMRZ7FR2NQALQ52SUVANCNFSM4KEDGWOA .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

dathudeptrai

comment created time in a month

issue commenttensorflow/tensorflow

Bug when convert to tflite models.

Do you think you can insert a cast op before the concat?

Thanks,

On Wed, Jan 15, 2020 at 12:04 PM dathudeptrai notifications@github.com wrote:

we don't allow things like

range(0, 0, 1),

can you change

"max_repeat = gen_math_ops.maximum(0, gen_math_ops._max(repeats, self._all_dimensions(repeats)))"

to "max_repeat = gen_math_ops.maximum(1, gen_math_ops._max(repeats, self._all_dimensions(repeats)))" ?

it occur another issue :D.


RuntimeError Traceback (most recent call last) <ipython-input-7-c61c3de97313> in <module> 24 25 ---> 26 interpreter.invoke()

~/anaconda3/lib/python3.7/site-packages/tensorflow_core/lite/python/interpreter.py in invoke(self) 491 """ 492 self._ensure_safe() --> 493 self._interpreter.Invoke() 494 495 def reset_all_variables(self):

~/anaconda3/lib/python3.7/site-packages/tensorflow_core/lite/python/interpreter_wrapper/tensorflow_wrap_interpreter_wrapper.py in Invoke(self) 111 112 def Invoke(self): --> 113 return _tensorflow_wrap_interpreter_wrapper.InterpreterWrapper_Invoke(self) 114 115 def InputIndices(self):

RuntimeError: tensorflow/lite/kernels/concatenation.cc:85 output->type != input_type (1 != 2)Node number 20 (CONCATENATION) failed to prepare.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/35661?email_source=notifications&email_token=AIURNGME2VM3FDBT2JPGRCTQ52DLLA5CNFSM4KEDGWOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI67LOA#issuecomment-574485944, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGKVYCJH6WW6O5FQRPDQ52DLLANCNFSM4KEDGWOA .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

dathudeptrai

comment created time in a month

issue commenttensorflow/tensorflow

Bug when convert to tflite models.

we don't allow things like

range(0, 0, 1),

can you change

"max_repeat = gen_math_ops.maximum(0, gen_math_ops._max(repeats, self._all_dimensions(repeats)))"

to "max_repeat = gen_math_ops.maximum(1, gen_math_ops._max(repeats, self._all_dimensions(repeats)))" ?

dathudeptrai

comment created time in a month

issue commenttensorflow/tensorflow

Creating outputs from the final_state returned by dynamic_rnn causes conversion to fail

yeah, the final_state is not supported for the fused lstm op.

Can you use normal keras lstm instead? please see the example here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/experimental_new_converter/keras_lstm.ipynb

xkr1

comment created time in a month

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

Oh, if you intend to change for micro as well. Feel free to add another prepare function.

On Wed, Jan 15, 2020 at 10:00 AM Basit Ayantunde notifications@github.com wrote:

The prepare function is here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/neg.cc#L31-L40 … <#m_1576323117430476474_> On Wed, Jan 15, 2020 at 7:19 AM Basit Ayantunde @.*> wrote: @.** commented on this pull request. ------------------------------ In tensorflow/lite/kernels/internal/reference/integer_ops/neg.h <#35461 (comment) https://github.com/tensorflow/tensorflow/pull/35461#discussion_r366624355> : > + static_cast<int16_t>(std::numeric_limits<int8_t>::max()); + constexpr int16_t kI8Min = + static_cast<int16_t>(std::numeric_limits<int8_t>::min()); + + // within: [-128, 127] + TFLITE_DCHECK_GE(input_zero_point, static_cast<int32_t>(kI8Min)); + TFLITE_DCHECK_LE(input_zero_point, static_cast<int32_t>(kI8Max)); + + // within: [-128, 127] + TFLITE_DCHECK_GE(output_zero_point, static_cast<int32_t>(kI8Min)); + TFLITE_DCHECK_LE(output_zero_point, static_cast<int32_t>(kI8Max)); + + const int flat_size = MatchingFlatSize(input_shape, output_shape); + + // already within int8 range, stored in int32 + const auto prior = static_cast<int16_t>(input_zero_point + output_zero_point); Yes, It is only computed once. The pattern I used is the same in other kernels (The Non-quantized Negate especially). Also, there is no Prepare function for Register_NEG's TfLiteRegistration. Are you sure you still want me to change this? (Create Prepare function) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#35461 https://github.com/tensorflow/tensorflow/pull/35461?email_source=notifications&email_token=AIURNGL5R376VU4FTF5A673Q5ZCBZA5CNFSM4KALN4O2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRYEXKA#discussion_r366624355>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGKEOSFOULQ3K62DS6LQ5ZCBZANCNFSM4KALN4OQ . -- Renjie Liu renjieliu@google.com +1 (650) 253-4359 <(650)%20253-4359>

Thanks. I was looking at tensorflow/lite/micro/kernels/neg.cc. I'll make the necessary changes later today.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35461?email_source=notifications&email_token=AIURNGI2CPACR6WKV7BQNZDQ5ZU4LA5CNFSM4KALN4O2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEI6ZCBA#issuecomment-574460164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGJFOPZTAUZ5UCTLVEDQ5ZU4LANCNFSM4KALN4OQ .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

lamarrr

comment created time in a month

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

The prepare function is here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/neg.cc#L31-L40

On Wed, Jan 15, 2020 at 7:19 AM Basit Ayantunde notifications@github.com wrote:

@lamarrr commented on this pull request.

In tensorflow/lite/kernels/internal/reference/integer_ops/neg.h https://github.com/tensorflow/tensorflow/pull/35461#discussion_r366624355 :

  •  static_cast<int16_t>(std::numeric_limits<int8_t>::max());
    
  • constexpr int16_t kI8Min =
  •  static_cast<int16_t>(std::numeric_limits<int8_t>::min());
    
  • // within: [-128, 127]
  • TFLITE_DCHECK_GE(input_zero_point, static_cast<int32_t>(kI8Min));
  • TFLITE_DCHECK_LE(input_zero_point, static_cast<int32_t>(kI8Max));
  • // within: [-128, 127]
  • TFLITE_DCHECK_GE(output_zero_point, static_cast<int32_t>(kI8Min));
  • TFLITE_DCHECK_LE(output_zero_point, static_cast<int32_t>(kI8Max));
  • const int flat_size = MatchingFlatSize(input_shape, output_shape);
  • // already within int8 range, stored in int32
  • const auto prior = static_cast<int16_t>(input_zero_point + output_zero_point);

Yes, It is only computed once. The pattern I used is the same in other kernels (The Non-quantized Negate especially). Also, there is no Prepare function for Register_NEG's TfLiteRegistration. Are you sure you still want me to change this? (Create Prepare function)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35461?email_source=notifications&email_token=AIURNGL5R376VU4FTF5A673Q5ZCBZA5CNFSM4KALN4O2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRYEXKA#discussion_r366624355, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGKEOSFOULQ3K62DS6LQ5ZCBZANCNFSM4KALN4OQ .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

lamarrr

comment created time in a month

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

+/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/+#ifndef TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_+#define TENSORFLOW_LITE_KERNELS_INTERNAL_REFERENCE_INTEGER_OPS_NEG_H_++#include "tensorflow/lite/kernels/internal/common.h"+#include "tensorflow/lite/kernels/internal/types.h"++namespace tflite {+namespace reference_integer_ops {++// Quantized Negate with int8 input and output, input and output must have the+// equal scale+inline void Negate(const RuntimeShape& input_shape, const int8_t* input_data,+                   int32_t input_zero_point, const RuntimeShape& output_shape,+                   int8_t* output_data, int32_t output_zero_point) {+  // equation: out =  in_zp + out_zp - in+  // where out, out_zp, in_zp ∈ [-128, 127] : int8+  // highest possible value: 127 + 127 - (-128) = 382+  // lowest possible value: (-128) + (-128) - (127)  = -383+  // accumulate on int16++  constexpr int16_t kI8Max =+      static_cast<int16_t>(std::numeric_limits<int8_t>::max());+  constexpr int16_t kI8Min =+      static_cast<int16_t>(std::numeric_limits<int8_t>::min());++  // within: [-128, 127]+  TFLITE_DCHECK_GE(input_zero_point, static_cast<int32_t>(kI8Min));+  TFLITE_DCHECK_LE(input_zero_point, static_cast<int32_t>(kI8Max));++  // within: [-128, 127]+  TFLITE_DCHECK_GE(output_zero_point, static_cast<int32_t>(kI8Min));+  TFLITE_DCHECK_LE(output_zero_point, static_cast<int32_t>(kI8Max));++  const int flat_size = MatchingFlatSize(input_shape, output_shape);++  // already within int8 range, stored in int32+  const auto prior = static_cast<int16_t>(input_zero_point + output_zero_point);

this is a constant, so we can pre-compute this in the prepare function

lamarrr

comment created time in a month

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

 void TestNegFloat(std::initializer_list<int> input_dims_data,   } } +// input_quantization_buffer: buffer used for the quantization data+void TestNegQuantizedInt8(float* input, int8_t* quantized_input_data,+                          float input_min, float input_max,+                          float* expected_output,+                          int8_t* quantized_expected_output_data,+                          int8_t* quantized_output_data, float output_min,+                          float output_max,+                          std::initializer_list<int> dimension_data) {+  TfLiteIntArray* tensor_dims = IntArrayFromInitializer(dimension_data);+  const int element_count = ElementCount(*tensor_dims);+  constexpr int inputs_size = 1;+  constexpr int outputs_size = 1;+  constexpr int tensors_size = inputs_size + outputs_size;++  // quantize input+  std::transform(input, input + element_count, quantized_input_data,

can you add the test in tensorflow/lite/kernels as well (also the kernel).

FYI, micro and tflite shares the most kernel implementation.

lamarrr

comment created time in a month

startedrenjie-liu/quantization-kernel-codelab

started time in a month

push eventrenjie-liu/quantization-kernel-codelab

Renjie Liu

commit sha 9433a017842523ab93be1800b72f8d175e7b27c0

update readme

view details

push time in a month

push eventrenjie-liu/quantization-kernel-codelab

Renjie Liu

commit sha e6bdf63c14f4e263f87112241a455029fa72baf9

adding tanh

view details

push time in a month

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

Given it's negate: where output = -input, so I think we should make sure the output_scale == input_scale, but note the input_zeropoint maybe different from output_zeropoint.

So we have: input_float = (input_q - input_zp) * input_scale and output_float = (output_q - output_zp) * output_scale

since output_float = -input_float, we have:

(output_q - output_zp) * output_scale = -(input_q - input_zp) * input_scale

Since output_scale == input_scale, we can get:

output_q - output_zp = input_zp - input_q

which is:

output_q = (input_zp + output_zp) - input_q

Note input_zp + output_zp is a constant, so we can pre-compute it, you will just end up with

output_q = constant - input_q

and yes, you will need to saturate the result to make sure it's within the range.

Hope this helps.

lamarrr

comment created time in a month

Pull request review commenttensorflow/tensorflow

TANH/Sigmoid 16-bit activation functions using LUT

 TfLiteStatus TanhEval(TfLiteContext* context, TfLiteNode* node) {     case kTfLiteInt16: {       TanhParams params;       params.input_left_shift = data->input_left_shift;-      if (kernel_type == kReference) {

we intended to keep the reference ones

wwwind

comment created time in a month

Pull request review commenttensorflow/tensorflow

TANH/Sigmoid 16-bit activation functions using LUT

 TEST_P(LogisticOpTest, SigmoidInt16) {   const float kMax = 32767.f / 32768.f;   QuantizedActivationsOpModel m(       GetRegistration(), BuiltinOperator_LOGISTIC,-      /*input=*/{TensorType_INT16, {1, 2, 4, 1}, 8 * kMin, 8 * kMax},-      /*output=*/{TensorType_INT16, {1, 2, 4, 1}, kMin, kMax});+      /*input=*/{TensorType_INT16, {1, 2, 6, 1}, 8 * kMin, 8 * kMax},+      /*output=*/{TensorType_INT16, {1, 2, 6, 1}, kMin, kMax});   m.SetInput<int16_t>({       0, -6, 2, 4,   //-      3, -2, 10, 1,  //

why delete this line?

wwwind

comment created time in a month

pull request commenttensorflow/tensorflow

added int8 support for negate kernel

A simple example:

Let's say your zero_point is 6 and your scale is 1.5.

And your current int8 value is 8, so your float value is float_value = (quantized_value - zero_point) * scale which is (8 - 6) * 1.5 = 3.

The negate value of that one should be -3.

In your implementation which is a direct negate, so 8 -> -8, so the result float value will be (-8 - 6) * 1.5 = -21, which is way off -3.

Does that make sense?

On Mon, Jan 6, 2020 at 10:44 AM Basit Ayantunde notifications@github.com wrote:

@lamarrr commented on this pull request.

In tensorflow/lite/micro/kernels/neg.cc https://github.com/tensorflow/tensorflow/pull/35461#discussion_r363141413 :

@@ -31,15 +31,22 @@ TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) { const TfLiteTensor* input = GetInput(context, node, kInputTensor); TfLiteTensor* output = GetOutput(context, node, kOutputTensor); switch (input->type) {

  • // TODO(wangtz): handle for kTfLiteInt8
  • case kTfLiteInt8:
  •  reference_ops::Negate(GetTensorShape(input), GetTensorData<int8_t>(input),
    

Works in what sense? My assumption was that irregardless of the input being quantized or not, negation is agnostic of the zero-point. Is this assumption flawed?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35461?email_source=notifications&email_token=AIURNGORG6DJ73UP72OPMZLQ4KLHFA5CNFSM4KALN4O2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQWE3SA#discussion_r363141413, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGNGSMA7L74WF5TWEBTQ4KLHFANCNFSM4KALN4OQ .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

lamarrr

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

added int8 support for negate kernel

 TfLiteStatus Eval(TfLiteContext* context, TfLiteNode* node) {   const TfLiteTensor* input = GetInput(context, node, kInputTensor);   TfLiteTensor* output = GetOutput(context, node, kOutputTensor);   switch (input->type) {-    // TODO(wangtz): handle for kTfLiteInt8+    case kTfLiteInt8:+      reference_ops::Negate(GetTensorShape(input), GetTensorData<int8_t>(input),

it seems this only works when zero_point == 0.

can you handle other cases as well?

thanks

lamarrr

comment created time in 2 months

issue commenttensorflow/tensorflow

TensorFlow Lite BroadcastTo

Hi, wonder can you explain your usage a little bit? is 5D/6D necessary?

Thanks!

DoritoDog

comment created time in 2 months

issue commenttensorflow/tensorflow

TensorFlow Lite BroadcastTo

Sure, will take a look.

On Fri, Dec 27, 2019 at 12:40 AM Jared Duke notifications@github.com wrote:

@renjie-liu https://github.com/renjie-liu can you take a look to see which operators would need to be updated to support this model (w/ 5/6D tensors)? Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/31577?email_source=notifications&email_token=AIURNGMBO5TUXNXAFO2T453Q2TM7ZA5CNFSM4ILIC3SKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHVZM4A#issuecomment-569087600, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGMHMIOWVAKKWGLIRX3Q2TM7ZANCNFSM4ILIC3SA .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

DoritoDog

comment created time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

Renjie Liu

commit sha a45931f03a53e027011372aa4ce8dce1eed2fec4

adding support for sin from -pi/2 to pi/2

view details

push time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

Wayne Wei

commit sha faaa8f71d211ec6d6a549004900954ce8e9e566d

')' has to be escape

view details

renjie-liu

commit sha c3dbdda7fba6d0a3eecb43e200bd7148ed01febd

Merge pull request #1 from windmaple/patch-1 ')' has to be escape

view details

push time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

Renjie Liu

commit sha dc8f2952709d1f687a08dc07bc1436d920f3ef3f

add fixed point sin

view details

push time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

Renjie Liu

commit sha 5cf8844a2dfc03919be1c0586222c9c02b83f05a

adding fixed point arithmetic codelab

view details

push time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

Renjie Liu

commit sha 658d79adf1b0a4f5eae2d2ad7b1ff9822bb6d649

Refactoring the quantization utils

view details

push time in 2 months

Pull request review commenttensorflow/tensorflow

NFC - minor spelling tweaks under lite directory

 MaxUnpooling& MaxUnpooling::operator=(MaxUnpooling&& kernel) { }  Status MaxUnpooling::Compile(const CreationContext& creation_context) {-  const auto code = GetMaxUnoolingKernelCode(+  const auto code = GetMaxUnroolingKernelCode(

this should be changed as well?

kiszk

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

NFC - minor spelling tweaks under lite directory

 def aggregate_and_return_name_for_input(self, out_graphdef):      In particular, if you have 4 inputs to a hint stub, this will be the     node that you can use as an output. I.e. you have 4 timesteps from a-    static rnn, then a fused UnidriecitonalLSTM will expect 1 input with+    static rnn, then a fused UndirectionalLSTM will expect 1 input with

this does not seem to be changed?

this should be UnidirectionalLSTM

kiszk

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

NFC - minor spelling tweaks under lite directory

 Status AllocateTensorMemory(const CLContext& context, const CLDevice& device,     case TensorStorageType::SINGLE_TEXTURE_2D: {       if (depth != 1) {         return InvalidArgumentError(absl::StrCat(-            "SINGLE_TEXTURE_2D support only cnannels in range [1-4], but ",+            "SINGLE_TEXTURE_2D support only chnannels in range [1-4], but ",

channels

kiszk

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

NFC - minor spelling tweaks under lite directory

 namespace gpu { namespace cl { namespace { -std::string GetMaxUnoolingKernelCode(+std::string GetMaxUnroolingKernelCode(

should this be unpooling?

kiszk

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

NFC - minor spelling tweaks under lite directory

 TEST(CommandLineFlagsTest, UsageString) {   bool some_switch = false;   std::string some_name = "something";   // Don't test float in this case, because precision is hard to predict and-  // match against, and we don't want a flakey test.+  // match against, and we don't want a franky test.

flaky

kiszk

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

NFC - minor spelling tweaks under lite directory

 def aggregate_and_return_name_for_input(self, out_graphdef):      In particular, if you have 4 inputs to a hint stub, this will be the     node that you can use as an output. I.e. you have 4 timesteps from a-    static rnn, then a fused UnidriecitonalLSTM will expect 1 input with+    static rnn, then a fused UndirecitonalLSTM will expect 1 input with

UnidirectiontionalLSTM

kiszk

comment created time in 2 months

issue commenttensorflow/tensorflow

Provided LSTM example doesn't work.

Hi Jian, can you help take a look about the lstm quantization? thanks

mpariente

comment created time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

renjie-liu

commit sha 9622e8bf515683acfd07c8ac8870f832aa31c3d1

Update README.md

view details

push time in 2 months

push eventrenjie-liu/quantization-kernel-codelab

renjie-liu

commit sha 27984e6287a0d7930da30d029fd687499190787d

Create README.md

view details

push time in 2 months

issue commenttensorflow/tensorflow

Provided LSTM example doesn't work.

I used colab to import the python notebook and ran directly.

Can you try the new approach and see if the post-training works for you?

Thanks a lot!

On Thu, Dec 19, 2019 at 3:12 PM Pariente Manuel notifications@github.com wrote:

Thanks a lot for the example, I'm going to try that out.

the old ophint-based method does not work well with resource variables (but it should work fine with 1.15, I have just tried in colab) Hmm, I did run this script with tf1.15.. Did you try the snippet I provided? Are you able to reproduce the error? And could I ask what you tried in colab?

for new keras lstm/rnn, please refer here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/experimental_new_converter/keras_lstm.ipynb

Thanks for that, where can I find some more info on the experimental_new_converter please? I'm struggling with post-training quantization with pretrained LSTMs for a while now..

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/35168?email_source=notifications&email_token=AIURNGKXI2OQXKXMIDIO6ETQZMNE7A5CNFSM4J3THI22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHIVCVA#issuecomment-567365972, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGKBUB4OWOMIYOZVIJ3QZMNE7ANCNFSM4J3THI2Q .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

mpariente

comment created time in 2 months

pull request commenttensorflow/tensorflow

TANH/Sigmoid 16-bit activation functions using LUT

Hi,

thanks a lot for the testing! and thanks a lot for the work!

I wonder should we sum the absolute value instead of the raw value? also the large sigmoid value makes me wonder it looks really look like a direct overflow: -32767.042969 really looks like -32768 plus some number (maybe we can just try with the extreme number?) or maybe just systematically shifted -0.5 for every value?

also given you're verifying the int16 value, judging from the number both have good results: +- 12/(1 << 15) for tanh

I think it's fine to keep both approach, you can refer here for prepare: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/activations.cc#L395-L422

and here for eval: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/kernels/activations.cc#L813-L824

wwwind

comment created time in 2 months

issue commenttensorflow/tensorflow

Provided LSTM example doesn't work.

the old ophint-based method does not work well with resource variables (but it should work fine with 1.15, I have just tried in colab)

for new keras lstm/rnn, please refer here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/experimental_new_converter/keras_lstm.ipynb

thanks

mpariente

comment created time in 2 months

issue commenttensorflow/tensorflow

tensorflow-lite-gpu.aar built from source is much slower than downloaded prebuilt library

Hi Juhyun, can you help take a look? thanks!

gimgane

comment created time in 2 months

create barnchrenjie-liu/quantization-kernel-codelab

branch : master

created branch time in 2 months

created repositoryrenjie-liu/quantization-kernel-codelab

created time in 2 months

issue commenttensorflow/tensorflow

[tflite] Output difference for simple MobileNetV2 model

can you run on some real dataset and check if indeed we loose some accuracy?

@multiverse-tf don't we have regression test for accuracy?

Thanks,

hgaiser

comment created time in 2 months

pull request commenttensorflow/tensorflow

TANH/Sigmoid 16-bit activation functions using LUT

Thanks Elena for the pr. Like Benoit pointed out, lut is not necessarily desired from the performance point of view.

Another thing is I suspect the accuracy measurement is "correct", do you have any real model to benchmark the accuracy? Like you said, your method will use range from -10.7 to 10.7, but should that based on the real scenario?

Jaesung can probably comment more, but if you want to cover more range for your scenario, probably just add one more bits to the integer parts? (where currently it's 3 bits, so represents -8 to 8) for sigmoid?

wwwind

comment created time in 2 months

issue commenttensorflow/tensorflow

gru convert tflite err(KeyError: 'kernel')

Hi Nupur, can you help take a look? thanks

mkz0930

comment created time in 3 months

issue commenttensorflow/tensorflow

Tensorflow lite conversion problem

please check here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/op_hint.py#L38-L41

but ignore the toco_convert (should be tfliteconverter)

tulasiram58827

comment created time in 4 months

issue commenttensorflow/tensorflow

Tensorflow lite conversion problem

I see.

I think you have two options:

  1. freeze the graph & call tf.lite.experimental.convert_op_hints_to_stubs then pass to tfliteconverter

  2. refactor your model building structure, yes you have [None,None,13] for training, but export a fixed-shape graph for inference

tulasiram58827

comment created time in 4 months

issue commenttensorflow/tensorflow

Tensorflow lite conversion problem

are you trying to use a different shape? from_session takes in tensors, which have shapes implicitly

tulasiram58827

comment created time in 4 months

IssuesEvent

issue commenttensorflow/tensorflow

The use of tflite Model of C3D Network in Android

Hi,

Great. For LSTM support, you can refer here https://www.tensorflow.org/lite/convert/rnn.

Thanks,

On Tue, Oct 29, 2019 at 10:20 AM zxj11838 notifications@github.com wrote:

I use 3D convolution to solve the classification problem of continuous gestures, such as sliding to the left or sliding to the right; I refer to the following project to train the model: https://github.com/hx173149/C3D-tensorflow Now, I am trying to use the lstm method. Do you have any better suggestions?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/32866?email_source=notifications&email_token=AIURNGIGASRXWF2S3MMOMRTQQ6MYDA5CNFSM4I3DOHU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECO7VFA#issuecomment-547224212, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIURNGIBLNXM23JVVGHXSHTQQ6MYDANCNFSM4I3DOHUQ .

-- Renjie Liu

renjieliu@google.com +1 (650) 253-4359

zxj11838

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

Fix Crash When input_size Is an int

 def call(self, inputs, state):         index_override=18)      input_size = inputs.shape.with_rank(2)[1]-    if input_size.value is None:+    if not isinstance(input_size, int) and input_size.value is None:

can you change that? thanks a lot!

abduelhamit

comment created time in 4 months

issue commenttensorflow/tensorflow

The use of tflite Model of C3D Network in Android

Hi,

Wonder if it's possible to share your usage and code snippet, so we can evaluate if c3d is necessary? maybe it's possible to avoid c3d? thanks!

zxj11838

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

Fix Crash When input_size Is an int

 def call(self, inputs, state):         index_override=18)      input_size = inputs.shape.with_rank(2)[1]-    if input_size.value is None:+    if not isinstance(input_size, int) and input_size.value is None:

thanks for the commit, how about we do:

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell_impl.py#L1022-L1024

abduelhamit

comment created time in 4 months

pull request commenttensorflow/models

Fix BERT EmbeddingPostprocessor

Gather is supported in tflite

Vooblin

comment created time in 4 months

issue commenttensorflow/tensorflow

Tensorflow lite conversion problem

Great to know it's working. :)

If you converted from frozen graph, you need to use tf.lite.experimental.convert_op_hints_to_stubs

like here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/python/op_hint.py#L36-L40

using 'tf.lite.TFLiteConverter.from_session' is much simpler. :)

tulasiram58827

comment created time in 4 months

issue closedtensorflow/tensorflow

Tensorflow lite conversion problem

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow installed from (source or binary):
  • TensorFlow version (or github SHA if from source):

Provide the text output from tflite_convert

Some of the operators in the model are not supported by the standard TensorFlow Lite runtime and are not recognized by TensorFlow. If you have a custom implementation for them you can disable this error with --allow_custom_ops, or by setting allow_custom_ops=True when calling tf.lite.TFLiteConverter(). Here is a list of builtin operators you are using: FULLY_CONNECTED, RESHAPE, TRANSPOSE. Here is a list of operators for which you will need custom implementations: EmptyTensorList, TensorListFromTensor, TensorListReserve, TensorListStack, While.

Graph code: self.inputs = tf.placeholder(tf.float32, [None, None, num_features], name='inputs')

    # # Here we use sparse_placeholder that will generate a
    # # SparseTensor required by ctc_loss op.
    # # https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor
    # # https://www.tensorflow.org/api_docs/python/tf/nn/ctc_loss
    self.targets = tf.sparse_placeholder(tf.int32, name='targets')

    # # 1d array of size [batch_size]
    self.seq_len = tf.placeholder(tf.int32, [None], name='seq_len')
    self.lstm_cell = tf.lite.experimental.nn.TFLiteLSTMCell(num_hidden)
    self.outputs, _ =tf.lite.experimental.nn.dynamic_rnn(self.lstm_cell,self.inputs,dtype='float32')

    self.shape = tf.shape(self.inputs)
    self.batch_s, self.max_time_steps = self.shape[0], self.shape[1]

    # Reshaping to apply the same weights over the timesteps
    self.outputs = tf.reshape(self.outputs, [-1, num_hidden])

    # Truncated normal with mean 0 and stdev=0.1
    # Tip: Try another initialization
    # see https://www.tensorflow.org/versions/r0.9/api_docs/python/contrib.layers.html#initializers
    self.W = tf.Variable(tf.truncated_normal([num_hidden,
                                         num_classes],
                                        stddev=0.1))
    # Zero initialization
    # Tip: Is tf.zeros_initializer the same?
    self.b = tf.Variable(tf.constant(0., shape=[num_classes]))

    # Doing the affine projection
    self.logits = tf.matmul(self.outputs, self.W) + self.b

    # Reshaping back to the original shape
    self.logits = tf.reshape(self.logits, [self.batch_s, -1, num_classes])

    # Time major
    self.logits = tf.transpose(self.logits, (1, 0, 2))

    self.logits = tf.identity(self.logits,name="output")

    self.loss = tf.nn.ctc_loss(self.targets, self.logits, self.seq_len)
    self.cost = tf.reduce_mean(self.loss)

    with tf.variable_scope("gs"):
        self.global_step = tf.Variable(0, name='global_step', trainable=False)

    # optimizer = tf.train.AdamOptimizer().minimize(cost)
    # optimizer = tf.train.MomentumOptimizer(learning_rate=0.01, momentum=0.9).minimize(cost)
    self.optimizer = tf.train.AdamOptimizer(learning_rate=5e-4)
    self.gvs = self.optimizer.compute_gradients(self.cost)
    self.clipped = []
    for grad, var in self.gvs:
        grad = tf.clip_by_value(grad, -1., 1.)
        self.clipped.append((grad, var))
        self.train_op = self.optimizer.apply_gradients(self.clipped, global_step=self.global_step)

Any other info / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 4 months

tulasiram58827

issue commenttensorflow/tensorflow

Tensorflow lite conversion problem

Hi, you don't need to manually freeze the graph, you can just

use converter = tf.lite.TFLiteConverter.from_session(sess, input_arrays, output_arrays,input_shapes = {"inputs":[1,100,13]})

like here https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/experimental/examples/lstm/g3doc#3-lets-define-the-export-to-tensorflow-lite-model-function

also please note tf.lite.experimental.nn.dynamic_rnn only accepts time_major=True (and it's the default value).

tulasiram58827

comment created time in 4 months

issue commenttensorflow/tensorflow

Problem to transform an custom efficient-net with an unofficial API to tf-lite version

Hi Momo,

I mean your function needs to be serializable by Keras.

Also added Tiezhen who is the expert in Keras model.

momo1986

comment created time in 4 months

issue commenttensorflow/tensorflow

Problem to transform an custom efficient-net with an unofficial API to tf-lite version

it seems it hasn't come down to tflite yet, I think you need to make your function serializable by keras

momo1986

comment created time in 4 months

issue commenttensorflow/tensorflow

java.nio.BufferOverflowException TensorFlowYoloDetector

Tei has made Yolo-v3 work.

Hi Miguel, is it possible to attach the problematic model for us to debug? thanks

mkulisic

comment created time in 4 months

issue commenttensorflow/tensorflow

The use of tflite Model of C3D Network in Android

Hi all,

sorry for the late response. (I just came back from vacation)

Currently 5-D strided_slice is not supported. conv_3d is not supported as well. :(

We would love to know your use cases as well: if conv 3d is necessary, maybe we can find some workaround methods.

cheers,

zxj11838

comment created time in 4 months

more