Ask questions[TFLu] int8 ops slower than f32
Describe the problem
I compared the time spent by
MicroInterpreter::Invoke() to perform different ops on the same model with with int8 quantization and without. I also tried the CMSIS-NN kernels for some of the ops. The problem is that besides the fully connected op, every other op is the same or slower with the int8 ops.
Here is a table showing the average time in ticks spent by each op's
Eval(). The first column shows model with int8 quantization using cmsis-nn kernels for mul, add, fullyconnected. The second column uses the reference kernels and third is floating point.
The tanh kernel performs the worst with 13x slower than the floating point equivalent. Is this expected or known behavior?
Please provide the exact sequence of commands/steps when you ran into the problem I have attached the models that I used for profiling. profiling_models.zip
Answer questions renjie-liu
We have optimized for neon on arm, but for micro, unfortunately those simd instructions are not available.
Hi Pete, do you have any suggestions?