profile
viewpoint

Ask questionsTFLite TransposeConvV2 Operator Slow on x86 CPU Ubuntu

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 18.04
  • TensorFlow installed from (source or binary): Binary
  • TensorFlow version (use command below): tf-nightly 2.4.0.dev20200902
  • Python version: 3.6.9
  • Bazel version (if compiling from source): 3.5.0
  • GCC/Compiler version (if compiling from source): gcc 7.5.0
  • CUDA/cuDNN version: CUDA 10.1
  • GPU model and memory: Using CPU (Intel(R) Core(TM) i7-8086K CPU @ 4.00GHz)

Describe the current behavior The TRANSPOSE_CONV operator takes up >80% of total computation time when using the TFLite benchmarking tool.

============================ Top by Computation Time ==============================
	             [node type]	          [start]	  [first]	 [avg ms]	     [%]	  [cdf%]	  [mem KB]	[times called]	[Name]
	          TRANSPOSE_CONV	          235.882	  487.948	  482.097	 26.927%	 26.927%	     0.000	        1	[device_0/g_ae/dec_0/conv2d_transpose1]:102
	          TRANSPOSE_CONV	          719.253	  163.323	  162.187	  9.059%	 35.986%	     0.000	        1	[device_0/g_ae/dec_1/conv2d_transpose1]:113
	          TRANSPOSE_CONV	         1634.919	  108.162	  111.988	  6.255%	 42.241%	     0.000	        1	[device_0/g_ae/dec_9/conv2d_transpose1]:201
	          TRANSPOSE_CONV	         1501.813	  116.089	  109.471	  6.114%	 48.355%	     0.000	        1	[device_0/g_ae/dec_8/conv2d_transpose1]:190
	          TRANSPOSE_CONV	          882.714	  111.952	  108.459	  6.058%	 54.413%	     0.000	        1	[device_0/g_ae/dec_2/conv2d_transpose1]:124
	          TRANSPOSE_CONV	          993.583	  103.796	   97.807	  5.463%	 59.876%	     0.000	        1	[device_0/g_ae/dec_3/conv2d_transpose1]:135
	          TRANSPOSE_CONV	         1287.885	   92.329	   95.829	  5.352%	 65.229%	     0.000	        1	[device_0/g_ae/dec_6/conv2d_transpose1]:168
	          TRANSPOSE_CONV	         1394.631	  109.527	   95.786	  5.350%	 70.579%	     0.000	        1	[device_0/g_ae/dec_7/conv2d_transpose1]:179
	          TRANSPOSE_CONV	         1093.908	   92.043	   93.959	  5.248%	 75.827%	     0.000	        1	[device_0/g_ae/dec_4/conv2d_transpose1]:146
	          TRANSPOSE_CONV	         1193.164	   88.003	   89.509	  4.999%	 80.826%	     0.000	        1	[device_0/g_ae/dec_5/conv2d_transpose1]:157

Number of nodes executed: 216
============================== Summary by node type ==============================
	             [Node type]	  [count]	  [avg ms]	    [avg %]	    [cdf %]	  [mem KB]	[times called]
	          TRANSPOSE_CONV	       11	  1466.204	    81.898%	    81.898%	     0.000	       11
	                 CONV_2D	       11	   159.086	     8.886%	    90.785%	     0.000	       11
	                     ABS	       21	   111.310	     6.217%	    97.002%	     0.000	       21
	                     ADD	       32	    11.551	     0.645%	    97.647%	     0.000	       32
	                     MUL	       42	    10.792	     0.603%	    98.250%	     0.000	       42
	                 RESHAPE	       44	     9.645	     0.539%	    98.789%	     0.000	       44
	                     SUB	       21	     9.366	     0.523%	    99.312%	     0.000	       21
	                    RELU	       21	     6.514	     0.364%	    99.676%	     0.000	       21
	           CONCATENATION	       11	     5.129	     0.286%	    99.962%	     0.000	       11
	      TfLiteFlexDelegate	        1	     0.430	     0.024%	    99.986%	     0.000	        1
	                    TANH	        1	     0.245	     0.014%	   100.000%	     0.000	        1

Timings (microseconds): count=50 first=1811076 curr=1775110 min=1738229 max=1895058 avg=1.79038e+06 std=32080

Describe the expected behavior Faster execution of this operator. I expected my model to run inference faster once I converted to TFLite, but currently it is running more slowly than regular tensorflow on the same hardware.

Standalone code to reproduce the issue

bazel-bin/tensorflow/lite/tools/benchmark/benchmark_model_plus_flex \
  --graph=.../converted_model_float32.tflite --num_threads=4 --enable_op_profiling=true > .../float32_benchmark.txt

(The benchmarking tool was built from tf master source commit 86db5756535f70f1b1fab61c6f3f0483141510e8)

Other info / logs Full TFLite benchmark output

I'd appreciate any tips for how I could profile this operator. How can I find out why it is taking so much time in my network? Can I use C++ profiling tools to find the computation time sinks in the transpose_conv.h?

tensorflow/tensorflow

Answer questions renjie-liu

I see, this is really strange, can you add --define=ruy_profiler=true as well when building the benchmark_tool, we should be able to see the detailed profiling,

useful!

Related questions

ModuleNotFoundError: No module named 'tensorflow.contrib' hot 9
Tf.Keras metrics issue hot 8
Error occurred when finalizing GeneratorDataset iterator hot 7
Error loading tensorflow hot 6
module 'tensorflow' has no attribute 'ConfigProto' hot 6
TF 2.0 'Tensor' object has no attribute 'numpy' while using .numpy() although eager execution enabled by default hot 6
tensorflow-gpu CUPTI errors
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.
ModuleNotFoundError: No module named 'tensorflow.contrib'
When importing TensorFlow, error loading Hadoop
OSError: SavedModel file does not exist at: saved_model_dir/{saved_model.pbtxt|saved_model.pb}
AttributeError: module 'tensorflow.python.framework.op_def_registry' has no attribute 'register_op_list'
tf.keras.layers.Conv1DTranspose ?
[TF 2.0] tf.keras.optimizers.Adam hot 4
TF2.0 AutoGraph issue hot 4
source:https://uonfu.com/
Github User Rank List