tensorflow/model-optimization 913

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

HW 3 skeleton for doing BDD with RottenPotatoes

A web page that reports a browser's WebGL capabilities, including supported extensions and implementation specific capabilities, such as the maximum number of texture units.

ToDo List example

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Adds a table of contents to HTML based presentations including Reveal.js, html5slides, io-2013-slide, Shower and impress.js

This is a Slack bot that publishes a team's pull requests to their Slack Channel, once provided the organisation name, the team members' github names, and a list of repos to follow. It is my first 20% project at GDS.

An Open Source Machine Learning Framework for Everyone

push eventwwwind/tensorflow

commit sha d49cf4c9dcee414a543b39bdfa5a15c4778e72e6

Added a small fix to the test. Change-Id: Ia7f43fe9d77596e6ba9ff85d9776b94440362b0a

push time in 11 hours

push eventwwwind/tensorflow

commit sha c0d20ffd82c603334177a690d65f36bae3a6744d

Support batching all-reduce with concat and split Collective v2 doesn't support scoped allocator. While it's possible to make scoped allocator work with it. Concat/split is much simpler. PiperOrigin-RevId: 337021023 Change-Id: I6e6e2fdc3c94ffbc59a52c20a451dcd74fd864e4

commit sha 016bcfcb193260c5db2c58056aeb7aa4f6e5dc29

Removing the unused HostComputeMetadata field. PiperOrigin-RevId: 337022622 Change-Id: Ibd191b66095b7df84173f42a4f115bab316ef625

commit sha bf14e371d28dd546f5b196c7f13192d1c47a68c7

Fix an internal error. PiperOrigin-RevId: 337038590 Change-Id: Ie2dd4a6e9c129093b2fed0a35bebdbcb6b47d103

commit sha 276f8c445219579545e95aadd69f29c5b12e228c

Add BUILD rules to build a benchmark model APK with flex delegate. PiperOrigin-RevId: 337043349 Change-Id: Ieada6f1fc756b09b5b889076dd28b9ea96aec999

commit sha 74283f4e6dcbfb097f8bfbe2479c40053d7ba04c

Update GraphDef version to 554. PiperOrigin-RevId: 337045178 Change-Id: I19dc22f8b05bd4f4d95d42f8d56a0d93e14569b4

commit sha 61d034993954c30ae3b347ef910f988343d6fd2c

compat: Update forward compatibility horizon to 2020-10-14 PiperOrigin-RevId: 337045191 Change-Id: I745a903726e2d1884098adc2b1aa82255f5220c6

commit sha 7144107de938c94f1e5e972ceed63401fcb8b986

Change SparseApplyFtrl operator() to return Status

commit sha fccc8b6179a24daf3d9779d236a4143b618a0a38

Integrate LLVM at llvm/llvm-project@3b33b4160478 Updates LLVM usage to match [3b33b4160478](https://github.com/llvm/llvm-project/commit/3b33b4160478) PiperOrigin-RevId: 337052227 Change-Id: Idbd8bd42775111dbeb6e5888320eb1fc5bcb50e3

commit sha dae99c2076e7647d3640aa019d3870aed47ac44f

Update TimeDistributed documentation Based on notes.

commit sha c38c7fbaa789b1a6e081204e28a0f056bc100880

Last update to TimeDistributed layer

commit sha 261bc3aba4e5c1611a417cf9d916c916996afad2

Integrate LLVM at llvm/llvm-project@d0c95808e50c Updates LLVM usage to match [d0c95808e50c](https://github.com/llvm/llvm-project/commit/d0c95808e50c) PiperOrigin-RevId: 337061504 Change-Id: Ifbfdbda59b77eeddcbde57b711274b35201a2dcb

commit sha 7a65acd4273040d6c324554a02c26c84fe44d7ea

Merge pull request #43980 from tensorflow:av8ramit-patch-1 PiperOrigin-RevId: 337061618 Change-Id: Ie32f14bd1b6b6e17fca09ac4f3c5592e200cdd24

commit sha 76c685252469e800aa4486c50f6a390f26b806a7

Disable remote_cluster_test on tsan as it's failing. PiperOrigin-RevId: 337061784 Change-Id: I7793a638079c49f3602050cff57973051dcde88d

commit sha 4cc4ff39a9a4b591d3239e3df2822498cd1d1578

Update download_and_extract script for faster CMSIS patching Change-Id: I7320b187877b57e43c9b53b86eb9ac792439fe48

commit sha a3205b8c9cabc77b0212d718066d8f42bc444617

Add correct scatter file values for STM32F4 target Change-Id: Iee3b278d7cfa98b348d0d908bf62a21fcdcf1ace

commit sha c538deb93e13a4583ea60234d1bf9c3edbf12b0c

Docstring update: clarify non-portability of a saved Keras Lambda layer. PiperOrigin-RevId: 337076364 Change-Id: Icade511cf6721b64b05720797d5ba980615ab461

commit sha 83050d565c99c4aa5eac91e917055d381c544dbf

Integrate LLVM at llvm/llvm-project@9b3c2a72e4cb Updates LLVM usage to match [9b3c2a72e4cb](https://github.com/llvm/llvm-project/commit/9b3c2a72e4cb) PiperOrigin-RevId: 337077072 Change-Id: I88a61c166112d4f40f9e3dc7aa4bb12d9bd5ef64

commit sha 0602ad74c389a6b040a28d35e4290bfd032603ea

Merge pull request #43935 from vnvo2409:gradients PiperOrigin-RevId: 337081150 Change-Id: I79311086235fca9be19d10926167653bf2d91a94

commit sha 261016b03461ab3cbd86ccd756c441c98daa1bf0

Include stm32f4 target Change-Id: I4c57b84a3ea735360a4bde89a906401121dbad0b

commit sha 6c44255bd402b4502c87831a9740cb7274f47bcb

fix host device tuple bug

push time in a day

pull request commenttensorflow/model-optimization

[Clustering] Support for clustering of a subclassed model.

Hi @alanchiao Yes, all issues that are mentioned in the doc are addressed. The clustering case is easier, because we cluster the trained model always. The example on mnist demonstrates that we use the same Clustering API - I just removed summary() and save/load() into HDF5 as they are not supported by the subclassed model. On my list TODO is to cluster MobileBert. Thanks.

comment created time in 16 days

pull request commenttensorflow/model-optimization

[Clustering] Support for clustering of a subclassed model.

Hi @alanchiao Please take a look at this PR: it enables clustering for a subclassed model. when the whole model is passed, it is wrapped, so we have control over 'build/call' functions of layer: this is the wrapper the approach has been tested on the subclassed model from this tutorial Any comments/suggestions are much appreciated. Thanks!

comment created time in 16 days

Pull request review commenttensorflow/model-optimization

[Clustering] Support for clustering of a subclassed model.

def testClusterFunctionalModelPreservesBuiltState(self): json.loads(clustered_model.to_json())) self.assertEqual(loaded_model.built, True) + @keras_parameterized.run_all_keras_modes

it looks that the block of these tests has not been upstreamed. i can remove it from my PR as they are not relevant

comment created time in 16 days

Pull request review commenttensorflow/model-optimization

[Clustering] Support for clustering of a subclassed model.

def clusters_check(stripped_model): self.end_to_end_testing(original_model, clusters_check) + @keras_parameterized.run_all_keras_modes(always_skip_v1=True)+ def testEndToEndSubclassedModel(self):+ """Test End to End clustering for the subclassed model.+ In this test we pass the whole subclassed model for clustering.+ We check that the number of weights is less the requested+ number of clusters after stripping clustering wrapper.++ """+ subclassed_model = SubclassedModel()++ clustered_model = cluster.cluster_weights(subclassed_model, **self.params)++ clustered_model.compile(+ loss=keras.losses.categorical_crossentropy,+ optimizer="adam",+ metrics=["accuracy"]+ )++ # The model should be trained a little bit.+ clustered_model.fit(x=self.dataset_generator(), steps_per_epoch=1)+ stripped_model = cluster.strip_clustering(clustered_model)++ nr_unique_weights = len(np.unique(stripped_model.layers[0].\+ trainable_weights[0].numpy().flatten()))+ self.assertLessEqual(nr_unique_weights, self.params["number_of_clusters"])++ @keras_parameterized.run_all_keras_modes(always_skip_v1=True)+ def testEndToEndSubclassedModelTwoLayers(self):+ """Test End to End clustering for the subclass model.++ This test demonstrates another approach.+ All layers that are present in the subclassed model+ (see SubclassedModelTwoLayers definition above) are wrapped+ manually. The model should be re-build in this case.++ We need to strip clustering away manually as well (see how it is+ done inside the test).++ Clustering is working well and clusters are updated during+ training."""+ subclassed_model = SubclassedModelTwoLayers()+ input_shape = (1, 5)++ # We need to build the model+ subclassed_model.build(input_shape=input_shape)++ # Check that the number of weights is bigger than the number of clusters.+ nr_unique_weights = len(np.unique(subclassed_model.layers[0].\+ trainable_weights[0].numpy().flatten()))+ self.assertGreater(nr_unique_weights, self.params["number_of_clusters"])+ nr_unique_weights = len(np.unique(subclassed_model.layers[1].\+ trainable_weights[0].numpy().flatten()))+ self.assertGreater(nr_unique_weights, self.params["number_of_clusters"])++ # Now we apply cluster_weights for each layer.+ subclassed_model.dense_layer1 = cluster.cluster_weights(+ subclassed_model.dense_layer1, **self.params)+ subclassed_model.dense_layer2 = cluster.cluster_weights(+ subclassed_model.dense_layer2, **self.params)++ # We need to re-build the model again.+ subclassed_model.build(input_shape=input_shape)++ subclassed_model.compile(+ loss=keras.losses.categorical_crossentropy,+ optimizer="adam",+ metrics=["accuracy"]+ )++ subclassed_model.fit(x=self.dataset_generator(), steps_per_epoch=1)++ # We strip from layers that were wrapped.+ subclassed_model.dense_layer1 = cluster.strip_clustering(subclassed_model.dense_layer1)+ subclassed_model.dense_layer2 = cluster.strip_clustering(subclassed_model.dense_layer2)++ # Checks that the number of unique values is less than the requested+ # number of clusters.+ nr_unique_weights = len(np.unique(subclassed_model.layers[0].\+ trainable_weights[0].numpy().flatten()))+ self.assertLessEqual(nr_unique_weights, self.params["number_of_clusters"])+ nr_unique_weights = len(np.unique(subclassed_model.layers[1].\+ trainable_weights[0].numpy().flatten()))+ self.assertLessEqual(nr_unique_weights, self.params["number_of_clusters"])++ @keras_parameterized.run_all_keras_modes(always_skip_v1=True)+ def testEndToEndSubclassedModelAsDeepLayer(self):+ """Test End to End clustering for the model with the layer as a subclass model."""+ # This case is not supported currently.

this case will be enabled later once the current approach is approved

comment created time in 16 days

Pull request review commenttensorflow/model-optimization

[Clustering] Support for clustering of a subclassed model.

def clusters_check(stripped_model): self.end_to_end_testing(original_model, clusters_check) + @keras_parameterized.run_all_keras_modes(always_skip_v1=True)+ def testEndToEndSubclassedModel(self):+ """Test End to End clustering for the subclassed model.+ In this test we pass the whole subclassed model for clustering.+ We check that the number of weights is less the requested+ number of clusters after stripping clustering wrapper.++ """+ subclassed_model = SubclassedModel()++ clustered_model = cluster.cluster_weights(subclassed_model, **self.params)++ clustered_model.compile(+ loss=keras.losses.categorical_crossentropy,+ optimizer="adam",+ metrics=["accuracy"]+ )++ # The model should be trained a little bit.+ clustered_model.fit(x=self.dataset_generator(), steps_per_epoch=1)+ stripped_model = cluster.strip_clustering(clustered_model)++ nr_unique_weights = len(np.unique(stripped_model.layers[0].\+ trainable_weights[0].numpy().flatten()))+ self.assertLessEqual(nr_unique_weights, self.params["number_of_clusters"])++ @keras_parameterized.run_all_keras_modes(always_skip_v1=True)+ def testEndToEndSubclassedModelTwoLayers(self):

This test re-produces the approach tested here: https://github.com/tensorflow/model-optimization/pull/554

comment created time in 16 days

PR opened tensorflow/model-optimization

This PR adds support for clustering of a subclassed model. Added an example for clustering of a subclassed model taken from the tutorial: https://www.tensorflow.org/tutorials/quickstart/advanced

pr created time in 16 days

create barnchwwwind/model-optimization

branch : clustering_subclassed_models

created branch time in 16 days

PR opened tensorflow/model-optimization

Small tidy up: KMEANS_PLUS_PLUS is now included in the test as well.

pr created time in 21 days

create barnchwwwind/model-optimization

branch : clustering_tidy_up_test

created branch time in 21 days

Pull request review commenttensorflow/tensorflow

[TFLite 16x8] ADD/SUB operators: fixes + tests for versioning

OperatorProperty GetOperatorProperty(const ModelT* model, int subgraph_index, property.inputs = {{0, {}}, {1, {}}}; property.outputs = {{0, {}}}; property.version = 2;+ property.restrict_same_input_output_scale = true;

Thanks for spotting! corrected

comment created time in 23 days

push eventwwwind/tensorflow

commit sha 902a60281163d68fbbd32b8d77dc3d8199bfb113

Addressed reviewer's comment. Change-Id: I0ddc0cd4db9604d3e84f5ce55ecbf8acc55c08d0

push time in 23 days

pull request commenttensorflow/model-optimization

Re-factoring of the clustering example

Hi @alanchiao,

Thanks for your comment.

It would be nice to have a uniform experience across the techniques, because currently if I install tfmot, then I have a directory examples/ with only quantization/ subdirectory with scripts. It will be good to have none or scripts for all techniques. If we include all these scripts in the package, then they should be included for CI, so that these scripts are not out-of-dated scripts - it increases the cost of their maintaining.

As a developer, for debugging purpose, personally, I use integration tests as they are much faster. For experiments, as you pointed out, Jupyter notebooks are more convenient.

Re: discussion regarding examples in the tfmot package: @akarmi @Ruomei @psunn @benkli01

comment created time in 23 days

pull request commenttensorflow/model-optimization

[Clustering] Small tidy up - removed unusued variables

Hi @alanchiao I will investigate. During debugging I have not seen them updating, but I will re-test this.

comment created time in 23 days

pull request commenttensorflow/model-optimization

Re-factoring of the clustering example

@benkli01 Thanks! Sorry, I didn't explain myself clearly in my previous comment - I think it is worth to add model size metrics to the example, because it shows the main purpose of the clustering: model vs stripped_model. The mentioned numbers above are very good !

comment created time in a month

pull request commenttensorflow/model-optimization

Re-factoring of the clustering example

We don't demonstrate in this example the benefits of clustering: the model size has been reduced.

comment created time in a month

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for conv_activations

Hi @jdduke I can reproduce this error and it is due to the OS Error "File name too long", that happened during Archive -> Unzip. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/testing/generated_examples_zip_test.cc#L280

It has happened after recent updating with the master branch, when the new parameter 'dynamic_range_quantize=False' is added. with 'quant_16x8' is just too much. Example of the file name:

conv_relu_forward-compat_channel_multiplier=2,constant_filter=True,data_format='NHWC',dilations=[1,1,1,1],dynamic_range_quantize=True,filter_shape=[3,3],fully_quantize=False,input_shape=[1,3,4,3],padding='VALID',quant_16x8=False,strides=[1,2,3,1]_tests.txt

There are two ways to solve:

- replace the name of some params with the shorter version - I don't know which one should I dio
- split arrays of dilations, input_shapes, strides into 1 per test/file
- implement a special check and a fallback code in the generated_examples_zip_test.cc

Could you please advise how should I go forward with this PR ? Thanks

comment created time in a month

push eventwwwind/tensorflow

commit sha a2b53623a2c0f81c32b02e2d4b3bac153daaacf9

Addressed reviewer's comments. Change-Id: I798bf7919b6a268a4631984ed07a242943ca0b72

push time in a month

pull request commenttensorflow/model-optimization

Clustering for models with deep layers.

Hi @alanchiao If you have a minute, please take a look at this PR - I have addressed your comment by adding a check. Thanks!

comment created time in a month

PR closed tensorflow/model-optimization

This PR is a small improvement to performance: we skip a layer if it has the number of unique weights less than the requested number of clusters. Test added.

pr closed time in a month

pull request commenttensorflow/model-optimization

Hi @alanchiao Thanks for the review! The case 2a is absolutely valid and I missed it - this problem could be resolved by removing "unique" from the condition as you pointed out in the 2b. But because, as you pointed out in the case 1, this PR introduces inconsistency in the behaviour between different techniques and can confuse the user, I will close this PR. The performance benefit is not significant with this change.

comment created time in a month

Pull request review commenttensorflow/model-optimization

def testStripSelectivelyClusteredSequentialModel(self): self.assertEqual(self._count_clustered_layers(stripped_model), 0) self.assertIsInstance(stripped_model.layers[0], layers.Dense) + @keras_parameterized.run_all_keras_modes+ def testClusteringModelWithTrainableParamsLessNumberOfClusters(self):+ """+ Verifies that we skip a layer with the number of trainable parameters+ less than the requested number of clusters.+ """+ model = keras.Sequential([+ layers.Dense(2),+ layers.Dense(1),+ layers.Dense(4),+ layers.Dense(3) # 12 weights in kernel:0+ ])+ model.build(input_shape=(2, 1))++ clustered_model = cluster.cluster_weights(model, **self.params)+ self.assertTrue(not isinstance(clustered_model.layers[0], cluster_wrapper.ClusterWeights))+ self.assertTrue(not isinstance(clustered_model.layers[1], cluster_wrapper.ClusterWeights))+ self.assertTrue(not isinstance(clustered_model.layers[2], cluster_wrapper.ClusterWeights))+ self.assertTrue(isinstance(clustered_model.layers[3], cluster_wrapper.ClusterWeights))++ stripped_model = cluster.strip_clustering(clustered_model)++ self.assertEqual(self._count_clustered_layers(stripped_model), 0)+ self.assertEqual(model.get_config(), stripped_model.get_config())

Hi @akarmi Sorry, I think I don't understand which check to add. We don't properly train in this test: we just check that we don't wrap layers for clustering if they have the number of unique weights smaller than the requested number of clusters - so 3 layers are skipped, but the last one is wrapped. then we can do strip_clustering and remove what was added.

comment created time in a month

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for conv_activations

Sure, I will check these failures!

comment created time in a month

PR opened tensorflow/model-optimization

Small tidy up: removed variables that are not set and not used - some leftovers.

pr created time in a month

create barnchwwwind/model-optimization

branch : clustering_tidy_up_trainable_weights

created branch time in a month

push eventwwwind/model-optimization

commit sha 39089cee94922c529968d78963ceb776a906363d

Replaced SubClass with subclassed. Change-Id: Ibf43764cbf4024890b68a2550ae44c6e52aade31

push time in a month

pull request commenttensorflow/model-optimization

Clustering for models with deep layers.

I added the mentioned check whether the model is a subclassed model + test for this case. @alanchiao @akarmi Please take a look and approve this PR if everything is ok. Thanks!

comment created time in a month

push eventwwwind/model-optimization

commit sha 6c9941e5ea846fb0ee82c40fa421b3e8e5491813

Added check before clone_model when we copy layers: if layer is SubClass model, we throw an exception. This PR addresses reviewer's comment. Change-Id: I0bd72324fe60da7eda3d3c440c68d1797beecd6c

push time in a month

PR opened tensorflow/model-optimization

This PR is a small improvement to performance: we skip a layer if it has the number of unique weights less than the requested number of clusters. Test added.

pr created time in a month

create barnchwwwind/model-optimization

branch : clustering_skip_small_layers

created branch time in a month

push eventwwwind/tensorflow

commit sha f6d870d8f3990e98d5453ed641210b41ec1697a4

Small fix in the model number. Change-Id: I894fa91f544612d0e34ef2ad38c56c3f73445221

push time in a month

push eventwwwind/tensorflow

commit sha 4cd4a57c3b844bc01e92ca7843010b3581771413

Move hlo_metrics_db to OpStats PiperOrigin-RevId: 330028129 Change-Id: I924a828087528633feebdcdfcba1021494ef69f4

commit sha a982edc0045d5c0e7cc08c2ff9d87d3e4fe6fce1

Added comparison operations to OpenCL backend. PiperOrigin-RevId: 330028303 Change-Id: I320ebb451c508c639479a6729470a9cd883df882

commit sha 78ffc027e97bae384a16c8b7c14b77a98a075985

[SE] Include absl/memory/memory.h

commit sha 7f2e2bb276791fbdbb70593f41a0ea86e44501d1

Fix server address assignment when constructing TpuPodState. PiperOrigin-RevId: 330032670 Change-Id: I686cb5d2ba665480acfc16f06ac54228e5b1177b

commit sha 6a00b43334d9c7de2b8746927a155be45df42e3a

Added BatchedMatMul parser for model reader. Added OpenCL transformation to execute Batched mat mul with batch = 1. PiperOrigin-RevId: 330033938 Change-Id: Ieabbe49a063373efe4d6d1647cdd8a100db1c38c

commit sha bd69906ff9c4d4abd1b7656d7e129a20e25f7be2

Add a MeanStddevNormalization test case with large vectors for three implementations: float, experimental/FP16 and GPU delegate. The GPU delegate version didn't support non-power-of-two vector, fix it. (Also add some comments.) PiperOrigin-RevId: 330038282 Change-Id: I380ce7276e42f41e54cdddfa35ad38421da89b15

commit sha 0ab43a1d18fc545767d85195bfc4de4a1c761b9b

Implement V2->v1 conversion for TransformTensorBilinear operation. PiperOrigin-RevId: 330038731 Change-Id: I2011e17a3935c27f4cd7bd1f3d7a1dca6a0fc6c3

commit sha 1dbe0671a05257244fd9eae5701092c24540d872

[XLA] fix bug in conditional_code_motion.cc by checking whether an instruction is dead before removing it. The bug has to do with instructions in alternative branches (other than branch(0)) of a conditional may be placed into boundaries to move out multiple times, if they happen to be identical to those in branch(0) and are shared multiple times (while those in branch(0) are not shared). The fix tries to avoid deleting them if they are already deleted, or if they still have uses inside their conditional branch. PiperOrigin-RevId: 330045927 Change-Id: I7a786eaa77085dd65609cc8639019874140474c0

commit sha 312c11c4ecdc2c466895fc3be0a16d60c692422e

Move hlo_metrics_db to OpStats PiperOrigin-RevId: 330047117 Change-Id: I5f1ee035217f489f25c97c7ccf6a77fcf1115e13

commit sha f66d1cb1d4a9e3b37059728243c03f1e7d4a6412

Integrate LLVM at llvm/llvm-project@2dd9a4d855f6 Updates LLVM usage to match [2dd9a4d855f6](https://github.com/llvm/llvm-project/commit/2dd9a4d855f6) PiperOrigin-RevId: 330048272 Change-Id: Id0723a12ba327ba3945e2b927c3a33f98d2bb208

commit sha 0f1b6731b9369450ef5a6c2ad850d0fae788e92b

Add verifications on the output type for slice. - Checks the output rank if input rank is known. - Checks the output shape if sizes can be calculated. PiperOrigin-RevId: 330050644 Change-Id: Ifb1e1c404c85d3f19144f78fd5e99736f2e06ed1

commit sha 83c9227a4b8faa502d34e70e1e6752123f2adb93

Remove redundant lines to make dockerfiles - remove exact identical lines in spec.yml which used to make dockerfiles Signed-off-by: anencore94 <anencore94@kaist.ac.kr>

commit sha 19d13e8b412faa4d940f0213f058e2d79f8411c6

bash syntax change

commit sha 340a16ac5466dbe6c49ca1f6ca686594c9e8aecf

Some style changes to splitV. Changed tests to use a single function template.

commit sha e0800968d9971de780b7f9837a2e02fb2a89b087

Specify the optimization level in a variable. Make it possible to specify the optimization level from the command line and make it the same regardless of the BUILD_TYPE.

commit sha aa34a55f9e001fca85814276c0a71cdbbc9323cf

Update GraphDef version to 514. PiperOrigin-RevId: 330070025 Change-Id: Ic2251524782139d7affd548ded29a7fd60ad4e1e

commit sha cd035167ca4a985b43db3e15fb98baf63ea02aae

compat: Update forward compatibility horizon to 2020-09-04 PiperOrigin-RevId: 330070027 Change-Id: I29dc83b8eac290bf7759cc856e78ecca4b8d1eb0

commit sha c1348607d126e9cb3ea8e226f640758152743f6a

Fix uint8 MUL operator with broadcast The MulSimpleBroadcast function must use MultiplyByQuantizedMultiplier instead of MultiplyByQuantizedMultiplierSmallerThanOneExp as the MUL operator doesn't guarantee that the quantized multiplier is smaller than 1.

commit sha 4cb07aab95da1f8e4113a546bc7186181d16bdf5

Fix issue with return value of evaluate() in models that add custom metrics via overriding train_step. PiperOrigin-RevId: 330111888 Change-Id: If78cfdc5754c362b5ada683f7c23e3ef019ee1fb

commit sha 247e34e920ace90ffc08e9a7d68b47895d0734d8

Improve VLOGS for multi-device function execution in ProcessFLR. PiperOrigin-RevId: 330117207 Change-Id: I216d9013bff3fcf8ab72fee81799693483777a78

push time in a month

PR opened tensorflow/model-optimization

In this PR we enable clustering for models with sub-models: all deep layers, that are supported for clustering, are clustered. Added such cases to integration tests.

pr created time in a month

create barnchwwwind/model-optimization

branch : clustering_deep_layers

created branch time in a month

Pull request review commenttensorflow/model-optimization

Add support for tf.distribute after enabling the update of cluster indices

def build(self, input_shape): shape=pulling_indices.shape, dtype=tf.int32, trainable=False,+ synchronization=tf.VariableSynchronization.ON_READ,+ aggregation=tf.VariableAggregation.ONLY_FIRST_REPLICA, initializer=initializers.Constant( value=k.batch_get_value([pulling_indices])[0] ) ) # We store these pairs to easily update this variables later on- self.clustered_vars.append((weight_name, weight))+ self.ori_weights_vars_tf[weight_name] = self.add_weight(+ 'ori_weights_vars_tf',+ shape=weight.shape,+ dtype=weight.dtype,+ trainable=True,+ initializer=initializers.Constant(+ value=k.batch_get_value([weight])[0]+ )+ ) # We use currying here to get an updater which can be triggered at any time # in future and it would return the latest version of clustered weights def get_updater(for_weight_name): def fn():- return self.clustering_impl[for_weight_name].get_clustered_weight(- self.pulling_indices_tf[for_weight_name]- )+ # Get the clustered weights+ pulling_indices = self.pulling_indices_tf[for_weight_name]+ clustered_weights = self.clustering_impl[for_weight_name].\+ get_clustered_weight(pulling_indices)+ return clustered_weights return fn # This will allow us to restore the order of weights later # This loop stores pairs of weight names and how to restore them- for ct, weight in enumerate(self.layer.weights): name = self._weight_name(weight.name)- full_name = self.layer.name + "/" + name+ full_name = '{}{}{}'.format(self.layer.name, '/', name)

the character '/' could be inside of the string

comment created time in 2 months

pull request commenttensorflow/tensorflow

[TFLite 16x8] Fixes for TANH and LOGISTIC

HI @gbaned Thanks for the reminder! I have replied.

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[TFLite 16x8] Fixes for TANH and LOGISTIC

TfLiteStatus TanhPrepare(TfLiteContext* context, TfLiteNode* node) { (data->input_left_shift == 0 || data->input_left_shift == 1); if (!param_scale_pot) {- // In case of general scale parameter, we need to do a rescaling.- // Magic constant 4096:- // We need to scale down to (-2^3, 2^3) / 3 is kInputIntegerBits/ interval- // from 16-bit (-2^15, 2^15),- // so we need to multiply by- // 2^(15 - kInputIntegerBits) = 2^12 = 4096.- data->input_multiplier = static_cast<int32_t>(input->params.scale * 4096);+ // Calculate multiplier to change input scale to 1/(3*4096)

Hi @renjie-liu Sorry for the delay with the reply. I missed the comment.

The initial implementation has been updated to have a general input scale, so we introduced this scaling factor 1/4096 (from 16-bit to [-8, 8]), but the change allows only integer multipliers of the 1/4096 scaling factor. Here we do: data->input_multiplier = static_cast<int32_t>(input->params.scale * 4096); and then int32_t input_data_mul = (input_multiplier > 0) ? input_multiplier : 1; So, the scaling should be multiple of 1/4096.

This fix is to handle general case. The number 3.0 in the multiplier comes from here, because the interval is [-10.7, 10.7] instead of [-8, 8] https://github.com/tensorflow/tensorflow/pull/42671/files#diff-7ae159b53f418105dff8194481058709R66 The numbers are activations_test.cc are identical to values from the calculator with this change.

comment created time in 2 months

pull request commenttensorflow/tensorflow

[TFLite 16x8, documentation] Added section on 16x8 quantization scheme to model-optimization.

Hi @jdduke Corrected, could you please re-approve ? thanks.

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 26bf5ac3e0d93a8fbe9e607d723790e708ea2741

Addressed reviewer's comments. Change-Id: I27fc31bbc32597d5e8f0e50fc78d2f5602dfb9d0

push time in 2 months

Pull request review commenttensorflow/tensorflow

[TFLite 16x8, documentation] Added section on 16x8 quantization scheme to model-optimization.

the numbers here: </figcaption> </figure> +### Full integer quantization with int16 activations and int8 weights++[Quantization with int16 activations](https://www.tensorflow.org/model_optimization/guide/quantization/post_training) is a full integer quantization scheme with activations in int16 and weights in int8. This mode can improve accuracy of the quantized model in comparison to the full integer quantization scheme with both activations and weights in int8 keeping a similar model size. It is recommended when activations are sensitive to the quantization.++Currently only non-optimized reference kernel implementation is available in TFLite so that by default the performance will be slow compared to int8 kernels. Full advantages of this mode can currently be accessed via specialised hardware, or custom software.++Below are the accuracy results for some models that benefit from this mode.+<figure>+ <table>+ <tr>+ <th>Model</th>+ <th>Accuracy metric type </th>+ <th>Accuracy (float32 activations) </th>+ <th>Accuracy (int8 activations) </th>+ <th>Accuracy (int16 activations) </th>+ </tr> <tr><td>Wav2letter</td><td>WER</td><td>6.7%</td><td>7.7%</td>+ <td>7.2%</td></tr>+ <tr><td>DeepSpeech 0.51 (unrolled)</td><td>CER</td><td>6.13%</td><td>43.67%</td>+ <td>6.52%</td></tr>+ <tr><td>YoloV3</td><td>mAP(IOU=0.5)</td><td>0.577</td><td>0.563</td>+ <td>0.574</td></tr>+ <tr><td>MobileNetV1</td><td>Top-1 Accuracy</td><td>0.7062</td><td>0.694</td>+ <td>0.6936</td></tr>+ <tr><td>MobileNetV2</td><td>Top-1 Accuracy</td><td>0.718</td><td>0.7126</td>+ <td>0.7137</td></tr>+ <tr><td>MobileBert</td><td>F1(Exact match)</td><td>88.81(81.23)</td><td>2.08(0)</td>

identical to this one: https://github.com/google-research/google-research/tree/master/mobilebert

comment created time in 2 months

push eventwwwind/model-optimization

commit sha 33a803013e386004f3c12d037c9c968caec43c91

Improved the test for non_clusterable_layer that demonstrates that the layer can be clusterable if weights are not allocated. Change-Id: If26bf41355c380dc564aff5e2309dc502abe91f2

push time in 2 months

PR opened tensorflow/model-optimization

In this PR we simplify clustering registry: if layer does not have weights, it is enabled for clustering automatically. Example of such layers: Reshape, Pooling, Maximum/Minimum. This PR addressed problem with TensorFlowOpLayer from this PR as well.

Added test for DepthwiseConv2D layer as well: we don't cluster it as an accuracy loss is big and can't be recovered during re-training.

pr created time in 2 months

create barnchwwwind/model-optimization

branch : clustering_registry_improvement

created branch time in 2 months

Pull request review commenttensorflow/tensorflow

[TFLite 16x8, documentation] Added section on 16x8 quantization scheme to model-optimization.

the numbers here: </figcaption> </figure> +### Full integer quantization with int16 activations and int8 weights++[Quantization with int16 activations](https://www.tensorflow.org/model_optimization/guide/quantization/post_training) is a full integer quantization scheme with activations in int16 and weights in int8. This mode can improve accuracy of the quantized model in comparison to the full integer quantization scheme with both activations and weights in int8 keeping a similar model size. It is recommended when activations are sensitive to the quantization.

Hi @jianlijianli I added explanation regarding performance. Could you please take a look and re-approve if it is ok? Thanks

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 354111f60d754bef196406f8690200e5e34bf39a

Added clarification regarding to expected performance of 16x8. Change-Id: I9b6d2cc6e93e45a47539667d12c8e50f8a25f595

push time in 2 months

pull request commenttensorflow/tensorflow

[TFLite 16x8] Notebook for 16x8 post-training quantization.

Hi @khanhlvg Thanks for the review! I updated the notebook as suggested. Could you please re-approve ?

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 06ba4be784174a174b9e67391cbfd14d6e67ac47

Addressed reviewer's comments. Change-Id: I5066e5a4cb0a4709a2f2b37eb2a5d34a3adf49f9

push time in 2 months

push eventwwwind/tensorflow

commit sha 18b9a48d679a97758d5108b0c723bf651aa4d40a

Use same context between maybe_define_function and define_function_with_shape_relaxation

commit sha 317f67e9f9b9657d315879e153ed655e982f96bc

Update tf._TPUCompileMlir creation to always create a tensor<2x!tf.string> typed output for its program outputs, matching its shape function (NFC). PiperOrigin-RevId: 327750291 Change-Id: I22a83a3775830d1c6e011a00ad2860fcaace1c26

commit sha 4b9f9d1bf1f49ca385594d8deeeb22d76a548b1f

Early return for nest flatten when the input is None PiperOrigin-RevId: 327752828 Change-Id: I6d60c87daea1df08a8515421a85e8583c4fd2eb8

commit sha 4c73899b3fdb7aa3a3d84d0f6d11c0a23bcae1a1

Create BUILD files and corresponding targets for `tensorflow/core/ops/compat/ops_history_v*`. PiperOrigin-RevId: 327752941 Change-Id: I4ea5c7b882fd3db9f9962e7a85ef79a63f4989fc

commit sha 2dae6b39672362ad86da11754feba0b02af9d2bc

Disable flaky test: //third_party/tensorflow/python/keras/distribute:multi_worker_tutorial_test PiperOrigin-RevId: 327753644 Change-Id: I36498f8677e890cb3146f7f2bf43aa0029bdaf99

commit sha 8a002f2269310f513d326eaeff5c73f679c73f77

Update TFLite Converter API Updates doc PiperOrigin-RevId: 327757194 Change-Id: Ice1e28ebe174f37020992b763e34cecb13444a55

commit sha e918c5c7ea3ab71f6cc0a48395bfb445c71c27d7

[XLA] Fix issue in conditional code motion regarding sharing of computations in conditionals and cleanup generated code. The branch computations inside a conditional may be shared among different Hlo instructions, e.g., different conditionals. When moving instructions across the boundaries of two computations, specifically the branch computations and the parent of a conditional, we must make sure the branch computations being modified are not shared --- if shared, they must be cloned first before being modified. The transformation code and the cost calculation for moving instructions inside branches are also modified to produce cleaner result and to refrain from modifying a conditional back and forth. The original implementation for moving instructions inside branches merely extends the old roots of the branches with new instructions. The improved transformation now folds the tuple/getTupleElement instructions in the branches to eliminate unnecessary tuple/getTupleElement pairs. PiperOrigin-RevId: 327764642 Change-Id: Ia7d7fda3f6e8d8d9af6e091f92a94946af096a7e

commit sha f74cc7a696c66db173b321d51fb05f032652a6c7

Use MPR for fault tolerance test PiperOrigin-RevId: 327766188 Change-Id: I247539f5561940a29fef658818b1e815dd194c1d

commit sha 4d022d6e2cbc924fbff1ffa8c1d98383c4ecaeae

Enable conv + (bias/bn) + leakyrelu fusion

commit sha 6aabcb1923fce33dd3cd17ab6875f5d2ec9d49b1

Update remapper tests for copy leakyrelu alpha

commit sha 62db9a72f1d5d980ee6c8f8f9c69510b8220c5d5

Fix missing activation in conv bn leakyrelu copyattr

commit sha fb36d7d5200d371e17159091d21d8de340bb189f

Integrate LLVM at llvm/llvm-project@a54eb9b7c509 Updates LLVM usage to match [a54eb9b7c509](https://github.com/llvm/llvm-project/commit/a54eb9b7c509) PiperOrigin-RevId: 327772531 Change-Id: I75d50abc1b22a9bf67ba916b70f3ee59a2381868

commit sha f22fa8a28b8d172b8983d2bef2bb701ba59ae5d8

Update GraphDef version to 500. PiperOrigin-RevId: 327776829 Change-Id: I7bffa531a9c5158bf808ab255412a026e32e5d9d

commit sha 33f55fcccb29aa01d08b6ac9aecc9a0504562e80

compat: Update forward compatibility horizon to 2020-08-21 PiperOrigin-RevId: 327776834 Change-Id: I82f5373d84ce5474a57ab9d853a4579519b57096

commit sha 51caa173eeee32ae6346320d6ff479df0d020ece

Fixed merge typo

commit sha 88e357042ea79e0c8a4b906f76494ab306754c7d

Integrate LLVM at llvm/llvm-project@e1cd7cac8a36 Updates LLVM usage to match [e1cd7cac8a36](https://github.com/llvm/llvm-project/commit/e1cd7cac8a36) PiperOrigin-RevId: 327792209 Change-Id: I3d6b883cfbe467d5d588bbc3d6cd121a118efbd5

commit sha e0bb74087f440394f6df00c5c7ff36e50d23132a

Make Grappler also ignore functions transitively called by XlaLaunch ops

commit sha 6a35d0baed07a8599174eba800cd2e13a54dd860

Integrate LLVM at llvm/llvm-project@3f7985e6ec21 Updates LLVM usage to match [3f7985e6ec21](https://github.com/llvm/llvm-project/commit/3f7985e6ec21) PiperOrigin-RevId: 327805023 Change-Id: Ie359643871d7a44b257b2c07273c98d9fa558515

commit sha 3af35558779ed6d7e3ccc0ed69302cdb51b4b03f

Integrate LLVM at llvm/llvm-project@c1dd5df4255c Updates LLVM usage to match [c1dd5df4255c](https://github.com/llvm/llvm-project/commit/c1dd5df4255c) PiperOrigin-RevId: 327818600 Change-Id: I615bd546ba2d743453050fcc7b16cd88ed328fb8

commit sha 8924394e1715db2f696c867c8f7006e87403082c

[XLA:SPMD] Support partial replicate to parital replicate resharding. PiperOrigin-RevId: 327824687 Change-Id: I7a5a12dacb14f00483c0beb29793914a0b9cc5f2

push time in 2 months

push eventwwwind/tensorflow

commit sha 1622cca6dc0c48bdad0e909d729a67e935636321

- Added checks that zero point iz zero for ADD/SUB. - POT int16x8: create a new BroadcastSub16POTSlow function to manage the POT scaling. - General int16x8: the BroadcastAdd4DSlow should be used instead of BroadcastSubSlow as the sign of input2 multiplier is changed in PrepareGeneralSubOp. Change-Id: Id8042d089af51f402cba72b1db9bb5d948ba5cbc

push time in 2 months

Pull request review commenttensorflow/model-optimization

Simplify the clustering example

'Output directory to hold tensorboard events') -def build_sequential_model(input_shape):- return tf.keras.Sequential([- l.Conv2D(- 32, 5, padding='same', activation='relu', input_shape=input_shape),- l.MaxPooling2D((2, 2), (2, 2), padding='same'),- l.BatchNormalization(),- l.Conv2D(64, 5, padding='same', activation='relu'),- l.MaxPooling2D((2, 2), (2, 2), padding='same'),- l.Flatten(),- l.Dense(1024, activation='relu'),- l.Dropout(0.4),- l.Dense(num_classes, activation='softmax')- ])+def load_mnist_dataset():+ mnist = keras.datasets.mnist+ (train_images, train_labels), (test_images, test_labels) = mnist.load_data() + # Normalize the input image so that each pixel value is between 0 to 1.+ train_images = train_images / 255.0+ test_images = test_images / 255.0 -def build_functional_model(input_shape):- inp = tf.keras.Input(shape=input_shape)- x = l.Conv2D(32, 5, padding='same', activation='relu')(inp)- x = l.MaxPooling2D((2, 2), (2, 2), padding='same')(x)- x = l.BatchNormalization()(x)- x = l.Conv2D(64, 5, padding='same', activation='relu')(x)- x = l.MaxPooling2D((2, 2), (2, 2), padding='same')(x)- x = l.Flatten()(x)- x = l.Dense(1024, activation='relu')(x)- x = l.Dropout(0.4)(x)- out = l.Dense(num_classes, activation='softmax')(x)-- return tf.keras.models.Model([inp], [out])--def train_and_save(models, x_train, y_train, x_test, y_test):- for model in models:- model.compile(- loss=tf.keras.losses.categorical_crossentropy,- optimizer='adam',- metrics=['accuracy'])-- # Print the model summary.- model.summary()-- # Model needs to be clustered after initial training- # and having achieved good accuracy- model.fit(- x_train,- y_train,- batch_size=batch_size,- epochs=epochs,- verbose=1,- validation_data=(x_test, y_test))- score = model.evaluate(x_test, y_test, verbose=0)- print('Test loss:', score[0])- print('Test accuracy:', score[1])-- print('Clustering model')-- clustering_params = {- 'number_of_clusters': 8,- 'cluster_centroids_init': cluster_config.CentroidInitialization.DENSITY_BASED- }-- # Cluster model- clustered_model = cluster.cluster_weights(model, **clustering_params)-- # Use smaller learning rate for fine-tuning- # clustered model- opt = tf.keras.optimizers.Adam(learning_rate=1e-5)-- clustered_model.compile(- loss=tf.keras.losses.categorical_crossentropy,- optimizer=opt,- metrics=['accuracy'])+ return (train_images, train_labels), (test_images, test_labels) - # Fine-tune model- clustered_model.fit(- x_train,- y_train,- batch_size=batch_size,- epochs=epochs_fine_tuning,- verbose=1,- validation_data=(x_test, y_test)) - score = clustered_model.evaluate(x_test, y_test, verbose=0)- print('Clustered Model Test loss:', score[0])- print('Clustered Model Test accuracy:', score[1])+def build_sequential_model():+ "Define the model architecture." - #Ensure accuracy persists after stripping the model- stripped_model = cluster.strip_clustering(clustered_model)+ return keras.Sequential([+ keras.layers.InputLayer(input_shape=(28, 28)),+ keras.layers.Reshape(target_shape=(28, 28, 1)),+ keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),+ keras.layers.MaxPooling2D(pool_size=(2, 2)),+ keras.layers.Flatten(),+ keras.layers.Dense(10)+ ]) - stripped_model.compile(- loss=tf.keras.losses.categorical_crossentropy,++def train_model(model, x_train, y_train, x_test, y_test):+ model.compile(+ loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),+ optimizer='adam',+ metrics=['accuracy'])++ # Print the model summary.+ model.summary()++ # Model needs to be clustered after initial training+ # and having achieved good accuracy+ model.fit(+ x_train,+ y_train,+ batch_size=batch_size,+ epochs=epochs,+ verbose=1,+ validation_split=0.1)++ score = model.evaluate(x_test, y_test, verbose=0)+ print('Test loss:', score[0])+ print('Test accuracy:', score[1])+ + return model+++def cluster_model(model, x_train, y_train, x_test, y_test):+ print('Clustering model')++ clustering_params = {+ 'number_of_clusters': 8,+ 'cluster_centroids_init': cluster_config.CentroidInitialization.DENSITY_BASED+ }++ # Cluster model+ clustered_model = cluster.cluster_weights(model, **clustering_params)++ # Use smaller learning rate for fine-tuning+ # clustered model+ opt = tf.keras.optimizers.Adam(learning_rate=1e-5)

i've got better accuracy with learning rate 1e-3 0.9724 (1e-5) vs. 0.9786 (1e-3) could be noise, but it trains faster with 1e-3

comment created time in 2 months

PR opened tensorflow/tensorflow

In this PR the section on the 16x8 quantization scheme is added to model-optimization overview document.

pr created time in 2 months

PR opened tensorflow/tensorflow

This PR provides fixes for Tanh/Logistic in case of int16. The previous implementation of Tanh/Logistic allows only integer multiplies of the 1/4096 scaling factor. It has been changed to handle more general case. Another fixed issue is that Logistic code has no overflow check on the table lookup. Tests are improved as well.

pr created time in 2 months

push eventwwwind/model-optimization

commit sha 307ebd834b7a6859af7c938a1be8dd319b8738cf

Fix to make the test more stable. Change-Id: Id9edf30a211cdace462737e2f390d6d697ec458c

push time in 2 months

push eventwwwind/model-optimization

commit sha 7dc76a4b409dd4cb78c2550080957d852b40143a

Addressed reviewer's comments. Change-Id: Ie5395e5cafc3544f909294650ff1b8de4c6153f7

push time in 2 months

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for leaky_relu

Hi @jdduke , @gbaned Could you please re-approve this PR ? This failure has been fixed.

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 985adde3214ae97a6d7b1a37f630a21edcc8c67e

Fix test failure. Change-Id: Id7a7c31fca509507f7098097da8abc793506f833

push time in 2 months

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for leaky_relu

Hi @gbaned Thanks! I am looking at these failures.

comment created time in 2 months

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for conv_activations

Hi @rthadur ! I have removed TODO comment. Could you please re-approve ? Thanks!

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 687718d57af5b83415ad6417dba496f4011f74f1

Addressed reviewer's comments. Change-Id: I04a0580c3b794fd674d7ae17a71ffa56495e5e28

push time in 2 months

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for conv_activations

I found the source of problems. They should be fixed now. I used the following commands to run these tests locally:

bazel test --compilation_mode=opt //tensorflow/lite/testing:zip_test_conv_relu bazel test --compilation_mode=opt //tensorflow/lite/testing:zip_test_conv_relu1 bazel test --compilation_mode=opt //tensorflow/lite/testing:zip_test_conv_relu6

If I combine parameters like it was before, then I hit the error: "There are at least 744 combinations while the upper limit is 500. Having too many combinations will slow down the tests."

Hi @suharshs, @rthadur Could you please re-approve this PR ? Thanks

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 88ce54b3eb29a0049da875bfa98125f5473aea7c

Fixed failures. Change-Id: Id1ca1d3bfff5b418c583ba0efa15d3bc375870f7

push time in 2 months

pull request commenttensorflow/tensorflow

[TFLite 16x8] Notebook for 16x8 post-training quantization.

Hi @renjie-liu ! Could you please re-approve this PR?

- i think i enabled python3 correctly now in this notebook by setting
`"kernelspec": { "display_name": "Python 3", "name": "python3" }`

- i checked that all output cells are empty:
`"outputs": [],`

- added 'experimental' status as well

Thanks for the review!

comment created time in 2 months

push eventwwwind/tensorflow

commit sha 161b105b059c96ed1d640d0cbbbab43fdc74a3c8

Addressed reviewer's comments. Change-Id: I56a81e42ac121550d6dfdc9e795ec01f033b80a2

push time in 2 months

PR opened tensorflow/model-optimization

Fix for the bug in clustering: names of weights/bias are not the same for the original model and for the stripped model.

pr created time in 2 months

create barnchwwwind/model-optimization

branch : bug_inconsistency_weights_name

created branch time in 2 months

fork wwwind/model-optimization

A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

https://www.tensorflow.org/model_optimization

fork in 2 months

PR opened tensorflow/tensorflow

In this PR:

- tests for versioning of operators ADD/SUB
- some fixes: maximum version of ADD has been changed to 3 for 16x8, so I updated places, where it is mentioned as 4 both inputs for SUB operator should be quantized to int16 as for it is done for ADD
- as discussed: only general case of reference kernel for 16x8 SUB/ADD will be used for the new models It has been suggested to modify quantize_model.cc file and set option pot_scale_int16 to false during quantization - implementation is done this way.

pr created time in 2 months

pull request commenttensorflow/tensorflow

[TFLite] Added coverage test for 16x8 quantizaion post-training mode.

@jdduke fixed. Could you please re-approve ? Thanks!

comment created time in 3 months

push eventwwwind/tensorflow

commit sha 5a448b101f8c7c0adb373b66c1f4217e247912d1

Fix pylint. Change-Id: I13cf512f970e0f0e5cc6e7b26f2a2b7f49eb110f

push time in 3 months

pull request commenttensorflow/tensorflow

[TFLite] Added op tests for conv_activations

Hi @rthadur Could you please mention which bazel test task is failing ? What bazel test command should I call to reproduce these failures ? Thanks!

comment created time in 3 months

pull request commenttensorflow/tensorflow

[TFLite] Added coverage test for 16x8 quantizaion post-training mode.

Hi @gbaned This failure does not look relevant to my changes.

The test //tensorflow/core/grappler/costs:op_level_cost_estimator_test is failing, but this failure is due to timeout.
`tensorflow/core/grappler/costs/op_level_cost_estimator_test.cc:984 Expected equality of these values: cost.compute_time Which is: 5000ns Costs::Duration(expected_compute_time) Which is: 4300ns Softmax`

I checked that is passes locally. Could you please re-run Ubuntu Sanity Checks on this PR ?

comment created time in 3 months

pull request commenttensorflow/tensorflow

[TFLite 16x8] Notebook for 16x8 post-training quantization.

Hi @renjie-liu Sorry, but all output cells are empty in this notebook as far as I can see. Also, what do you mean by "enable python3 for the colab" ? Thanks a lot for the review!

comment created time in 3 months

pull request commenttensorflow/tensorflow

[TFLite 16x8] Notebook for 16x8 post-training quantization.

Hi @khanhlvg Thanks for the review! I have changes to update these pages, but I want to upstream them in a separate PR. Is it okay ?

comment created time in 3 months

pull request commenttensorflow/tensorflow

[TFLite] Added coverage test for 16x8 quantizaion post-training mode.

Hi @jdduke Thanks for the review. I renamed. Please take a look.

comment created time in 3 months

push eventwwwind/tensorflow

commit sha bba56b756fafc6584b0da7c42034fb97a46241bf

Addressed reviewer's comments. Change-Id: I18f870b2bfdb73beceff94f510b69033b0d5f451

push time in 3 months

PR opened tensorflow/tensorflow

This PR provides a notebook with tutorial on 16x8 post-training quantization. It uses a simple MNIST example to demonstrate how this mode should be used.

pr created time in 3 months

pull request commenttensorflow/tensorflow

[TFLite] 16x8 quantization: fixes for SLICE, TRANSPOSE operators

Hi @renjie-liu I added versioning for 16x8 case to these operators. Please take a look. Thanks!

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[TFLite] 16x8 quantization: fixes for SLICE, TRANSPOSE operators

std::string GetMinimumRuntimeVersionForModel(const Model& model) { {{OperatorType::kTranspose, 1}, "1.6.0"}, {{OperatorType::kTranspose, 2}, "1.14.0"}, {{OperatorType::kTranspose, 3}, "1.15.0"},+ {{OperatorType::kTranspose, 5}, kPendingReleaseOpVersion},

this place confused me - should I add version 4 with kPendingReleaseOpVersion ?

comment created time in 3 months

push eventwwwind/tensorflow

commit sha f81de276db3504a7cf6334a3afe5f21212182940

Added versioning. Change-Id: Ie0d7bbb9798dd4d06493537a0f44ca18f8a07a5f

push time in 3 months