Our (Syntiant Corp’s) neural network inference chips use quantized weights and biases in order to minimize storage and energy consumption. The new Tensorflow experimental quantization feature tf.contrib.quantize.experimental_create_training_graph supports quantizing weights to between 2 and n bits, but the tf.contrib.lite.toco_convert tool currently only supports 8 bit quantization. As a result, we have to internally fork the TFLite pipeline before generating the Flatbuffer.

Feature request: Update TOCO to support arbitrary (i.e., 2 to n bit) signed fixed point quantization of weights and biases for both symmetric and asymmetric quantization. Our desired solution would process the quantization specified at the op or Keras layer level and not involve quantization specification within the TOCO tool API.


We are working to make things more configurable in new converter iterations, will close this issue since its a bit stale.


