profile
viewpoint

shivaram/matrix-bench 6

Single machine matrix benchmarks to compare various implementations

tomerk/spark 2

Mirror of Apache Spark

mbalazin/cse599c-17sp-projects 1

Final class projects for CSE599c-17sp Big Data Management Systems

tomerk/models 1

Models and examples built with TensorFlow

shivaram/keystone 0

The biggest, baddest pipelines around.

shivaram/spark-ml 0

proposal for the new interfaces

tomerk/addons 0

Useful extra functionality for TensorFlow 2.0 maintained by SIG-addons

tomerk/baselines 0

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms

tomerk/benchmark 0

Large scale query engine benchmark

tomerk/calculating-spaceweather-keywords 0

calculate space-weather keywords (using python)

issue commenttensorflow/tensorflow

tf-nightly-cpu couldn't trace any graph with subclass models.

Does this work in the nightlies?

AlexanderJLiu

comment created time in 2 hours

Pull request review commenttensorflow/community

RFC: tf.internal API namespace.

+# tf.internal API namespace++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [278](https://github.com/tensorflow/community/pull/278)|+| **Author(s)** | Qianli Zhu (scottzhu@google.com) |+| **Sponsor**   | Martin Wicke (wicke@google.com), Alex Apassos (apassos@google.com)|+| **Updated**   | 2020-08-10                                           |+| **Intended audience**| tf-api-owners, tf-addons, keras-team, deepmind/sonnet|++## Objective++Adding a new "internal" API namespace in TF to host APIs for framework building/testing, etc. The API will have looser contracts compared to core TF API.++## Motivation++TensorFlow has a strong contract to ensure the stability of its API. The core +API, once added, can't be updated with backward incompatible change or removed. +TF API owners will pay extra attention to any new API proposal due to this +restriction, based on the cost we have to bear to maintain the API +compatibility. See more details about API review +[here](https://github.com/tensorflow/community/blob/master/governance/api-reviews.md).++Keras team is trying to split its code into a [standalone Github repository](https://github.com/tensorflow/community/blob/master/rfcs/20200205-standalone-keras-repository.md). +One of the key actions before the repository split is that Keras code has to +rely on the public TF API only, to avoid any future breakages due to private +method/class change.++Historically, Keras was an internal component of TF, and relies heavily on a lot+of private TF utilities and functionalities that are not part of the public API. +Exposing those functions into public APIs will be very confusing to the end +user, since those functions are useful to framework-builders, but not typical +TensorFlow users. In addition, these utilities would be overly restricted by the +standard TensorFlow API contract, as we would like to change/remove them more +often than the existing core API.+++## Design Proposal++Add a new namespace "tensorflow.internal" to host the framework building/testing +related APIs. <b>(See alternative naming in the sections below, the naming of +the namespace will be discussed in the design meeting.)</b>++### Backward compatibility+The contract for all the APIs under this namespace is looser compared to core TF+API. It will remain backward compatible for at least 1 minor TF release. ++This means for any API that is added in 2.x release, it will remain the same in +2.x+1. If we choose to remove the same API, we will mark it as deprecated in +2.x+1, and delete it at 2.x+2 release. TensorFlow is released every 3 months, +and this will give enough time for all the clients to change to the new +API/alternative.++Any deprecation and backward incompatible API change will be explicitly called +out in TF release notes. This applies to both core API and "internal" API.++### Acceptance Criteria+The candidate of the "internal" API should:++1. Does NOT fit for core TF API, otherwise it should be exposed as core TF API.+1. Are effectively required to build/test/deploy a framework or library on top +   of TensorFlow, or to integrate such a framework into TensorFlow's API.+1. Mature enough and won't change behavior/interface for every release. +1. Non-trivial, otherwise the function should be copied to the client side.+1. Widely used in the implementation of TF public APIs (i.e. new functionality +   isn’t immediately added to the tf.internal namespace)+1. Has at least two downstream libraries which are known to need it.++TF API owners will review the new API proposal, and follow the existing review +process for core TF API.+++### Documentation and Docstring+The "internal" API should have the same style and standard as the core +TensorFlow API, which is documented [here](https://github.com/tensorflow/community/blob/master/governance/api-reviews.md#docstrings). +We should explicitly list out the difference between "internal" API and core+API, and also choose a different place on tensorflow.org so that existing user +are not confused.++### Naming and sub namespace+Similar "internal" APIs should be grouped together as sub namespaces, e.g., test +related APIs should live under "tf.internal.test". This is aligned with the +existing TF naming conversion.++Try not to export experimental APIs since the "internal" API should be mature +enough.++### Current candidate+The following list is created from the private TF method usage within Keras, +when we were trying to convert Keras to use the public TF API only. This is by +no means the full list, but will serve as the first wave of review requests we +send to the API owner for review. We don't expect all of them to be approved, +and will discuss with the API owner on a case to case basis.++|Symbol location  |API Name  | +:-------------- |:---------------------------------------------------- |+|python.framework.func_graph.FuncGraph |tf.internal.FuncGraph     |

And also related to this: would it trace twice rather than once? That would also be a performance hit to our existing usages.

qlzh727

comment created time in 2 hours

Pull request review commenttensorflow/community

RFC: tf.internal API namespace.

+# tf.internal API namespace++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [278](https://github.com/tensorflow/community/pull/278)|+| **Author(s)** | Qianli Zhu (scottzhu@google.com) |+| **Sponsor**   | Martin Wicke (wicke@google.com), Alex Apassos (apassos@google.com)|+| **Updated**   | 2020-08-10                                           |+| **Intended audience**| tf-api-owners, tf-addons, keras-team, deepmind/sonnet|++## Objective++Adding a new "internal" API namespace in TF to host APIs for framework building/testing, etc. The API will have looser contracts compared to core TF API.++## Motivation++TensorFlow has a strong contract to ensure the stability of its API. The core +API, once added, can't be updated with backward incompatible change or removed. +TF API owners will pay extra attention to any new API proposal due to this +restriction, based on the cost we have to bear to maintain the API +compatibility. See more details about API review +[here](https://github.com/tensorflow/community/blob/master/governance/api-reviews.md).++Keras team is trying to split its code into a [standalone Github repository](https://github.com/tensorflow/community/blob/master/rfcs/20200205-standalone-keras-repository.md). +One of the key actions before the repository split is that Keras code has to +rely on the public TF API only, to avoid any future breakages due to private +method/class change.++Historically, Keras was an internal component of TF, and relies heavily on a lot+of private TF utilities and functionalities that are not part of the public API. +Exposing those functions into public APIs will be very confusing to the end +user, since those functions are useful to framework-builders, but not typical +TensorFlow users. In addition, these utilities would be overly restricted by the +standard TensorFlow API contract, as we would like to change/remove them more +often than the existing core API.+++## Design Proposal++Add a new namespace "tensorflow.internal" to host the framework building/testing +related APIs. <b>(See alternative naming in the sections below, the naming of +the namespace will be discussed in the design meeting.)</b>++### Backward compatibility+The contract for all the APIs under this namespace is looser compared to core TF+API. It will remain backward compatible for at least 1 minor TF release. ++This means for any API that is added in 2.x release, it will remain the same in +2.x+1. If we choose to remove the same API, we will mark it as deprecated in +2.x+1, and delete it at 2.x+2 release. TensorFlow is released every 3 months, +and this will give enough time for all the clients to change to the new +API/alternative.++Any deprecation and backward incompatible API change will be explicitly called +out in TF release notes. This applies to both core API and "internal" API.++### Acceptance Criteria+The candidate of the "internal" API should:++1. Does NOT fit for core TF API, otherwise it should be exposed as core TF API.+1. Are effectively required to build/test/deploy a framework or library on top +   of TensorFlow, or to integrate such a framework into TensorFlow's API.+1. Mature enough and won't change behavior/interface for every release. +1. Non-trivial, otherwise the function should be copied to the client side.+1. Widely used in the implementation of TF public APIs (i.e. new functionality +   isn’t immediately added to the tf.internal namespace)+1. Has at least two downstream libraries which are known to need it.++TF API owners will review the new API proposal, and follow the existing review +process for core TF API.+++### Documentation and Docstring+The "internal" API should have the same style and standard as the core +TensorFlow API, which is documented [here](https://github.com/tensorflow/community/blob/master/governance/api-reviews.md#docstrings). +We should explicitly list out the difference between "internal" API and core+API, and also choose a different place on tensorflow.org so that existing user +are not confused.++### Naming and sub namespace+Similar "internal" APIs should be grouped together as sub namespaces, e.g., test +related APIs should live under "tf.internal.test". This is aligned with the +existing TF naming conversion.++Try not to export experimental APIs since the "internal" API should be mature +enough.++### Current candidate+The following list is created from the private TF method usage within Keras, +when we were trying to convert Keras to use the public TF API only. This is by +no means the full list, but will serve as the first wave of review requests we +send to the API owner for review. We don't expect all of them to be approved, +and will discuss with the API owner on a case to case basis.++|Symbol location  |API Name  | +:-------------- |:---------------------------------------------------- |+|python.framework.func_graph.FuncGraph |tf.internal.FuncGraph     |

Would the tf.function traces get saved in some sort of never-cleared cache? Or is it possible to explicitly gc them? part of our usage of FuncGraph is to encapsulate traces that can then be immediately cleaned up.

qlzh727

comment created time in 2 hours

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)+      ones_count = np.count_nonzero(sparse_matrix)+      return ones_count, mean_nonzeros++    seed = 0+    ones_indices = []+    ones_counts = []+    for _ in range(100):+      matrix_sparse_permute = self.get_permuteones(ratio)(shape, seed=seed)+      ones_count, mean_nonzeros = _check_scatter(matrix_sparse_permute)+      ones_indices.append(mean_nonzeros)+      ones_counts.append(ones_count)+      seed += 1+    # check that 1/0s are scattered+    # mean of entire matrix and mean of entire nonzeros+    self.assertAllClose(ones_indices[:len(ones_indices)//2], ones_indices[len(ones_indices)//2:])+    self.assertAllClose(ones_counts[:len(ones_counts)//2], ones_counts[len(ones_counts)//2:])++  @parameterized.parameters(+    (shape,) for shape in self.shapes+  )+  def testBernouilliMatrixFraction(self, shape):+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    seed = 0+    expected_counts = [16, 1.6, 4, 8, 2, 1.7, 0]+    expected_stdev = list(map(calculate_stdev, ratios))+    counts = []+    for ratio in self.ratios:+      ratio_list = []+      for _ in range(100):+        output = self.get_bernouilli(ratio)(shape, seed=seed)+        ratio_list.append(np.count_nonzero(output))

Since this is structured w/ just count_nonzero, it's never verifying that all the values are 0s or 1s, it's just checking the number of non-zeros.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)+      ones_count = np.count_nonzero(sparse_matrix)+      return ones_count, mean_nonzeros++    seed = 0+    ones_indices = []+    ones_counts = []+    for _ in range(100):+      matrix_sparse_permute = self.get_permuteones(ratio)(shape, seed=seed)+      ones_count, mean_nonzeros = _check_scatter(matrix_sparse_permute)+      ones_indices.append(mean_nonzeros)+      ones_counts.append(ones_count)+      seed += 1+    # check that 1/0s are scattered+    # mean of entire matrix and mean of entire nonzeros+    self.assertAllClose(ones_indices[:len(ones_indices)//2], ones_indices[len(ones_indices)//2:])+    self.assertAllClose(ones_counts[:len(ones_counts)//2], ones_counts[len(ones_counts)//2:])++  @parameterized.parameters(+    (shape,) for shape in self.shapes+  )+  def testBernouilliMatrixFraction(self, shape):+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    seed = 0+    expected_counts = [16, 1.6, 4, 8, 2, 1.7, 0]+    expected_stdev = list(map(calculate_stdev, ratios))+    counts = []+    for ratio in self.ratios:+      ratio_list = []+      for _ in range(100):+        output = self.get_bernouilli(ratio)(shape, seed=seed)+        ratio_list.append(np.count_nonzero(output))+        self.assertAllEqual(output.shape, shape)+        seed += 1+      counts.append(ratio_list)+    mean_counts = np.mean(counts, axis=-1)

Inline these checks inside of the loops for ratios, rather than trying to simultaneously compute over all ratios. (It makes it easier to check that you're doing things correctly)

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Convenience functions for sparse training."""++class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    if not (p >= 0. and p <= 1.):+      raise ValueError('p parameter must be a valid probability, i.e. in [0, 1].')+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(probability * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  This initializer takes in an input ratio and sets exactly+  that ratio of the mask entries as ones  leaving the rest as zeros.+  The ones are detministically, randomly permmuted across the tensor.+  """+  def __init__(self, ratio=None):

If you need to randomly choose a ratio for uniform sparsity, it makes more sense to put that logic in the pruningconfig than in the initializer itself

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)+      ones_count = np.count_nonzero(sparse_matrix)+      return ones_count, mean_nonzeros++    seed = 0+    ones_indices = []+    ones_counts = []+    for _ in range(100):+      matrix_sparse_permute = self.get_permuteones(ratio)(shape, seed=seed)+      ones_count, mean_nonzeros = _check_scatter(matrix_sparse_permute)+      ones_indices.append(mean_nonzeros)+      ones_counts.append(ones_count)+      seed += 1+    # check that 1/0s are scattered+    # mean of entire matrix and mean of entire nonzeros+    self.assertAllClose(ones_indices[:len(ones_indices)//2], ones_indices[len(ones_indices)//2:])+    self.assertAllClose(ones_counts[:len(ones_counts)//2], ones_counts[len(ones_counts)//2:])++  @parameterized.parameters(+    (shape,) for shape in self.shapes+  )+  def testBernouilliMatrixFraction(self, shape):+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    seed = 0+    expected_counts = [16, 1.6, 4, 8, 2, 1.7, 0]

These seem like they don't match the ratios in self.ratios, and would depend on the shapes too?

Replace w/ a simple, easily-verifiable computation.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)+      ones_count = np.count_nonzero(sparse_matrix)+      return ones_count, mean_nonzeros++    seed = 0+    ones_indices = []+    ones_counts = []+    for _ in range(100):+      matrix_sparse_permute = self.get_permuteones(ratio)(shape, seed=seed)+      ones_count, mean_nonzeros = _check_scatter(matrix_sparse_permute)+      ones_indices.append(mean_nonzeros)+      ones_counts.append(ones_count)+      seed += 1+    # check that 1/0s are scattered+    # mean of entire matrix and mean of entire nonzeros+    self.assertAllClose(ones_indices[:len(ones_indices)//2], ones_indices[len(ones_indices)//2:])+    self.assertAllClose(ones_counts[:len(ones_counts)//2], ones_counts[len(ones_counts)//2:])

Ah I see you sort of have that check in a separate test below.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)+      ones_count = np.count_nonzero(sparse_matrix)+      return ones_count, mean_nonzeros++    seed = 0+    ones_indices = []+    ones_counts = []+    for _ in range(100):+      matrix_sparse_permute = self.get_permuteones(ratio)(shape, seed=seed)+      ones_count, mean_nonzeros = _check_scatter(matrix_sparse_permute)+      ones_indices.append(mean_nonzeros)+      ones_counts.append(ones_count)+      seed += 1+    # check that 1/0s are scattered+    # mean of entire matrix and mean of entire nonzeros+    self.assertAllClose(ones_indices[:len(ones_indices)//2], ones_indices[len(ones_indices)//2:])+    self.assertAllClose(ones_counts[:len(ones_counts)//2], ones_counts[len(ones_counts)//2:])

Same as above, this check seems wrong.

You want to explicitly check that the mean of ones_count is approximately ratio * num_elements_in_shape

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)+      ones_count = np.count_nonzero(sparse_matrix)+      return ones_count, mean_nonzeros++    seed = 0+    ones_indices = []+    ones_counts = []+    for _ in range(100):+      matrix_sparse_permute = self.get_permuteones(ratio)(shape, seed=seed)+      ones_count, mean_nonzeros = _check_scatter(matrix_sparse_permute)+      ones_indices.append(mean_nonzeros)+      ones_counts.append(ones_count)+      seed += 1+    # check that 1/0s are scattered+    # mean of entire matrix and mean of entire nonzeros+    self.assertAllClose(ones_indices[:len(ones_indices)//2], ones_indices[len(ones_indices)//2:])

This check is really confusing to interpet, and seems pretty wrong?

Isn't this checking that the first half of the trials match the second half of the trials?

Just check that the mean matrix (if you reduce the initialized values across all trials but leave it as a matrix) approximately matches a tensor of the corresponding input shape where all the values are ratio.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):

Inline this helper.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):

Inline this helper.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios+  )+  def testScatteredInitialOnes(self, ratio):+    shape = (4, 4)+    matrix = self.matrix_init(shape)+    self.assertAllEqual(shape, matrix.shape)++    def _check_scatter(sparse_matrix):+      ones_indices = np.where(sparse_matrix == 1.)+      mean_nonzeros = np.mean(ones_indices)

Return the actual sparse matrix, not the mean here.

You want to reduce on the axis that is the # of trials, whereas this reduces on the axes within a single trial. (That mean would always end up being ratio)

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):

Same as above, inline this helper

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):

Inline this helper. It doesn't buy much syntactic sugar and makes the tests harder to interpret. E.g. if you added extra logic to this helper it would no longer match the Bernoulli API and your test would be testing the helper, not your Bernoulli method.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(shape=(), mu=0, sigma=1.):+  sample = tf.random.normal(shape, mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def calculate_stdev(p):+  return np.sqrt(p * (1 - p))++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.ratios = [0.0, 0.1, 0.25, 0.5, 0.75, 0.9, 1.0]+    self.shapes = [(), [1], [5], [2, 2], [4, 4], [8, 4], [3, 17], [3, 6, 9]]+    self.seeds = range(20)++  def matrix_init(self, _shape):+    return tf.ones(_shape)+  +  def matrix_init_with_type(self, _shape, _type):+    tf.ones(_shape, dtype=_type)++  def get_bernouilli(self, p):+    return sparse_utils.Bernouilli(p)++  def get_permuteones(self, ratio):+    return sparse_utils.PermuteOnes(ratio)++  @parameters.parameters(+    (-1.,), (1.5,)+  )+  def testInvalidRatioRaisesError(self, p):+    with self.assertRaises(ValueError):+      bernouilli = self.get_bernouilli(p)+      permuteones = self.get_permuteones(p)++  @parameterized.parameters(+    (i,) for i in self.ratios

This isn't a valid way to specify the parameters, because self is not reference-able here and setup() won't have been called. You can make the variables static, or inline their values here.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Convenience functions for sparse training."""++class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    if not (p >= 0. and p <= 1.):+      raise ValueError('p parameter must be a valid probability, i.e. in [0, 1].')+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(probability * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  This initializer takes in an input ratio and sets exactly+  that ratio of the mask entries as ones  leaving the rest as zeros.+  The ones are detministically, randomly permmuted across the tensor.+  """+  def __init__(self, ratio=None):+    """+    ratio: the exact number of 1s sampled.+    If ratio is None, will sample randomly from uniform distribution for sparsity.+    """+    if ratio is not None and not (ratio >= 0. and ratio <= 1.):+      raise ValueError('ratio parameter must be a valid percentage, i.e. in [0, 1].')+    self.ratio = ratio if ratio else tf.random.uniform(())

This comment won't matter when you get rid of the ratio=None option, but:

This check would trigger tf.random.uniform(()) both when ratio is None, and when it's 0. So, you would probably want to use if ratio is None rather than if ratio.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Convenience functions for sparse training."""++class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    if not (p >= 0. and p <= 1.):+      raise ValueError('p parameter must be a valid probability, i.e. in [0, 1].')+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(probability * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  This initializer takes in an input ratio and sets exactly+  that ratio of the mask entries as ones  leaving the rest as zeros.+  The ones are detministically, randomly permmuted across the tensor.

I would get rid of deterministic because I'm not sure this is actually guaranteeing determinism if you don't specify a seed in __call__. (And passing this in directly as an initializer object won't use seed)

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Convenience functions for sparse training."""++class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    if not (p >= 0. and p <= 1.):+      raise ValueError('p parameter must be a valid probability, i.e. in [0, 1].')+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(probability * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  This initializer takes in an input ratio and sets exactly+  that ratio of the mask entries as ones  leaving the rest as zeros.+  The ones are detministically, randomly permmuted across the tensor.+  """+  def __init__(self, ratio=None):

Does it actually make sense to have ratio optional? It seems really confusing to randomly choose a ratio when you don't specify one.

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

 py_library(     ], ) +py_library(+    name = "sparse_utils",+    srcs = ["sparse_utils.py"],+    srcs_version = "PY2AND3",+    visibility = ["//visibility:public"],+    deps = [+        # tensorflow dep1,+        # python:summary tensorflow dep2,

I don't think you need this specific dependency, since you're not using tf summaries via a direct import

xwinxu

comment created time in a day

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)+      counts.append(np.count_nonzero(tmp))+    mean_count = np.mean(counts)+    self.assertAllEqual(mean_count, 4)++  def testDeterministicMaskFraction(self):+    # PermuteOnes+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    ratio = 0.5+    mask_sparse = self.permuteones(ratio)(mask.shape)+    self.assertAllEqual(np.count_nonzero(mask_sparse), 4)++  def testMaskDeterminism(self):+    shape = (4, 4)+    mask1 = self.mask_init(shape)+    mask2 = self.mask_init(shape)+    self.assertAllEqual(shape, mask1.shape)+    self.assertAllEqual(shape, mask2.shape)++    ratio = 0.5+    seed = 0+    permuteones = self.permuteones(ratio)+    mask1_sparse = permuteones(shape=mask1.shape, seed=seed) +    mask2_sparse = permuteones(shape=mask2.shape, seed=seed)+    self.assertAllEqual(mask1_sparse, mask2_sparse)++  def testMaskDtype(self):+    dtypes = [tf.int32, tf.float32, tf.int64, tf.float64]

Excellent job parameterizing on the dtypes!

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)+      counts.append(np.count_nonzero(tmp))

Should add a check that the output shape is correct.

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)+      counts.append(np.count_nonzero(tmp))+    mean_count = np.mean(counts)+    self.assertAllEqual(mean_count, 4)++  def testDeterministicMaskFraction(self):+    # PermuteOnes+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    ratio = 0.5+    mask_sparse = self.permuteones(ratio)(mask.shape)+    self.assertAllEqual(np.count_nonzero(mask_sparse), 4)++  def testMaskDeterminism(self):

For this test & the one below, also remove all references to masks.

It'd be good to use multiple seeds in these as well. (the test should pass for any seed)

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)+      counts.append(np.count_nonzero(tmp))+    mean_count = np.mean(counts)+    self.assertAllEqual(mean_count, 4)++  def testDeterministicMaskFraction(self):

Similar comments for this test as above, expect here it's correct to use an assertAllEqual instead of an assertAllClose

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)+      counts.append(np.count_nonzero(tmp))+    mean_count = np.mean(counts)+    self.assertAllEqual(mean_count, 4)

Also, this number seems wrong.

If your shape is (4, 4) there are 16 elements. With a probability of 0.5 I would expect a mean of 8 nonzeros, not 4.

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)+      counts.append(np.count_nonzero(tmp))+    mean_count = np.mean(counts)+    self.assertAllEqual(mean_count, 4)

When you're using a different seed each time, it should be an assertAllClose, not assertequal. I would be very concerned if they're actually all equal.

It might be good to also check the standard deviation against the expected mean standard deviation of a bernoulli distribution.

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)+    self.assertAllEqual(shape, mask.shape)++    seed = 0+    ratio = 0.5+    counts = []+    for _ in range(100):+      tmp = self.bernouilli(ratio)(mask.shape, seed=0)

Pass shape directly, not mask.shape.

Also pass a different seed each time (tests should never rely on the exact behavior of a given seed)

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)+    self.mask_init_w_type = lambda x, y: tf.ones(x, dtype=y)+    self.bernouilli = lambda p: sparse_utils.Bernouilli(p)+    self.permuteones = lambda ratio: sparse_utils.PermuteOnes(ratio)++  def testBernouilliMaskFraction(self):+    shape = (4, 4)+    mask = self.mask_init(shape)

Remove all references to mask from all of these tests. These utilities should be entirely independent from the concept of a mask.

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Convenience functions for sparse training."""++class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(sparsity * size) in expectation."""

Don't reference sparsity, just reference probability

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++class SparseUtilsTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(SparseUtilsTest, self).setUp()+    self.mask_init = lambda x: tf.ones(x)

Remove this setup and inline all of these methods. The extra indirection by having these isn't buying anything

xwinxu

comment created time in 4 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test+++def sample_noise(x, mu=0, sigma=1.):

This isn't used anywhere in your test, get rid of it

xwinxu

comment created time in 4 days

issue commenttensorflow/tensorflow

tf.image.ssim_multiscale broke in tensorflow 2.1.0-rc2

Glad to hear it, thanks @isaacgerg! Please don't hesitate to report any other such issues you run into.

isaacgerg

comment created time in 5 days

issue closedtensorflow/tensorflow

Keras predict is slow on first call when using variable input shape

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Collab
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.2.0
  • Python version: 3.6.9
  • Bazel version (if compiling from source): -
  • GCC/Compiler version (if compiling from source): -
  • CUDA/cuDNN version: 10.1
  • GPU model and memory: Tesla K80 11441MiB

Describe the current behavior

When using a variable input shape the first prediction for a new shape is slow. Following predictions for the same shape are fast.

Describe the expected behavior

I would expect to see the same speed when the change in shape is small.

Standalone code to reproduce the issue

This simple collab shows the problem using a ResNet50.

https://colab.research.google.com/drive/1s2-O_cRtwL_kkk2Mm-rq-xiDHdZDs5h4?usp=sharing

I notice that a warning about retracing is shown, but I don't know how to use that information to solve the problem

closed time in 6 days

ironbar

issue commenttensorflow/tensorflow

Keras predict is slow on first call when using variable input shape

Hi @ironbar the experimental_relax_shapes lazily relaxes shapes as it sees them, so it may still take a few examples of different shapes to fully relax.

If you manually wrap your model.call with a tf.function and call your model directly instead of predicting, you can directly specify theinput_signature arg to match your data: tensorflow.org/api_docs/python/tf/function

This will allow your tf.function to trace only once at the start, instead of lazily figuring out the input shapes from your data.

Closing this issue for now, but +@rohan100jain because this is related to some past discussions about whether tf.function should trace more generally at the start before tracing more specific shapes.

ironbar

comment created time in 6 days

issue closedtensorflow/tensorflow

tf.image.ssim_multiscale broke in tensorflow 2.1.0-rc2

System information Python 3.7.6 on Windows 10, x64.

Using tensorflow 2.1.0-rc2.

GPU Hardware: pciBusID: 0000:01:00.0 name: TITAN X (Pascal) computeCapability: 6.1 coreClock: 1.531GHz coreCount: 28 deviceMemorySize: 12.00GiB deviceMemoryBandwidth: 447.48GiB/s

Describe the current behavior

Code should print the word 'done'

Describe the expected behavior

tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a tf.Tensor as a Python bool is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

Standalone code to reproduce the issue

import tensorflow as tf
tf.test.gpu_device_name()
print(tf.__version__)

# Build model
img_input = tf.keras.layers.Input(shape=(128, 128, 1))
img_output = tf.keras.layers.Convolution2D(1, 1)(img_input) 
model = tf.keras.models.Model(img_input, img_output)

# Add reconstruction loss 
# Toggle between the next 2 lines of code to see that ssim_multiscale does not work but simple MSE does.
loss = tf.reduce_mean(tf.image.ssim_multiscale(img_input, img_output, 1.0))  # This loss does not
#loss = tf.reduce_mean((img_input - img_output)**2)  # This loss works
model.add_loss(loss)

model.compile(optimizer = tf.keras.optimizers.RMSprop(lr=1e-4), loss = None)  
model.summary()

# The error Iget when using the ssim_multiscale loss is:
#tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.
    
print('done')

Other info / logs

This problem is present in 1.15.0 and 2.1.0. This bug is not present in in 1.13.1.

I have tried several image metrics in tf.image including ssim and psnr and they all result in the same error.

closed time in 6 days

isaacgerg

issue commenttensorflow/tensorflow

tf.image.ssim_multiscale broke in tensorflow 2.1.0-rc2

This issue should now be fixed in the nightlies, as we've enabled the Functional API refactoring I mentioned above.

isaacgerg

comment created time in 6 days

pull request commenttensorflow/tensorflow

Don't check for attribute `is_tensor_like` in `is_tensor`

To clarify there are a number of downstream users of Keras who currently rely on this behavior in their unit tests. I think if tensorflow_probability is updated we can try to update the various client code though. I'm not sure currently what the timelines for any of that would look like currently.

lithuak

comment created time in 6 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the utilities in sparse training pipelines."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils++dtypes = tf.dtypes+test = tf.test++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class SparseUtilsTest(test.TestCase, parameterized.TestCase):

Don't forget tests!

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(sparsity * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  """+  def __init__(self, p=None):+    """+    p: probability parameter of success (i.e. 1).+    If p is None, will sample randomly from uniform distribution for sparsity.+    """+    self.p = p if p else tf.random.uniform(())++  def get_n_ones(self, shape, dtype=tf.dtypes.float32):+    sparsity = self.p if self.p else 0.0+    return tf.math.ceil(sparsity * tf.cast(tf.math.reduce_sum(shape)), dtype=dtype)++  def __call__(self, shape, dtype=tf.dtypes.float32, seed=None):+    flat_mask = tf.reshape(tf.ones(shape), (-1,))+    num_ones = self.get_n_ones(shape, dtype)+    _indices = tf.cast(tf.reshape(tf.linspace(0, num_ones - 1, int(num_ones)), (-1,)), tf.int32)+    indices = tf.reshape(_indices, (-1, 1))+    updates = tf.ones_like(_indices)+    flat_shape = flat_mask.shape+    unshuffled_mask = tf.scatter_nd(indices, udpates, flat_shape)+    shuffled_mask = tf.random.shuffle(unshuffled_mas, seed=seed)++    return tf.reshape(shuffled_mask, shape)+++class ErdosRenyi(tf.keras.Initializers.Initializer):

After seeing how your code elsewhere is written (using the distributions to generate a sparsity float from the shape), I would say:

  1. If you decide to pass a sparsity float to the pruner directly, your ErdosRenyi* distributions should be callables/util methods that output a sparsity float but do not extend keras.initializers.Initializer

  2. If you decide to pass an initializer directly to the RIGLPruner instead of a sparsity float, it makes sense to have these two ErdosRenyi* distributions be initializers as well.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(sparsity * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  """+  def __init__(self, p=None):+    """+    p: probability parameter of success (i.e. 1).

Probably should be ratio instead of p, because it's not the probability of sampling a 1, it exactly controls the number of ones.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(sparsity * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.

Flesh out this docstring further. This initializer takers an input ratio, and sets exactly that ratio of the entries as ones and the rest as zeros. The ones are deterministically, randomly permuted across the tensor.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement sparsity utilities with tests

+class Bernouilli(tf.keras.initializers.Initializer):+  """+  Initialization distributio following a Bernouilli process..+  """++  def __init__(self, p):+    """+    p: probability parameter of success (i.e. 1).+    """+    self.p = p++  def get_config(self):+    return {'p': self.p}++  @classmethod+  def from_config(cls, config):+    return cls(**config)++  def __call__(self, shape, dytpe=tf.dtypes.float32):+    """Number of zeros = np.ceil(sparsity * size) in expectation."""+    probs = tf.zeros(shape=list(shape)) + self.p+    uniform = tf.random.uniform(shape)+    initial = tf.less(uniform, probs)++    return tf.cast(initial, dtype=dtype)++class PermuteOnes(tf.keras.initializers.Initializer):+  """+  Initialization of a deterministically sparse matrix.+  """+  def __init__(self, p=None):+    """+    p: probability parameter of success (i.e. 1).+    If p is None, will sample randomly from uniform distribution for sparsity.+    """+    self.p = p if p else tf.random.uniform(())++  def get_n_ones(self, shape, dtype=tf.dtypes.float32):+    sparsity = self.p if self.p else 0.0+    return tf.math.ceil(sparsity * tf.cast(tf.math.reduce_sum(shape)), dtype=dtype)++  def __call__(self, shape, dtype=tf.dtypes.float32, seed=None):+    flat_mask = tf.reshape(tf.ones(shape), (-1,))+    num_ones = self.get_n_ones(shape, dtype)+    _indices = tf.cast(tf.reshape(tf.linspace(0, num_ones - 1, int(num_ones)), (-1,)), tf.int32)+    indices = tf.reshape(_indices, (-1, 1))+    updates = tf.ones_like(_indices)+    flat_shape = flat_mask.shape+    unshuffled_mask = tf.scatter_nd(indices, udpates, flat_shape)+    shuffled_mask = tf.random.shuffle(unshuffled_mas, seed=seed)

Is there any stateless random shuffle op? Or a way to deterministically shuffle?

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_normal(*args, **kwargs)+    return tf.random.normal(*args, **kwargs)++  def _get_grow_grads(self, optimizer, step, mask, grad):+    """Grow connections baased on gradient information.+    """+    grad_scores = tf.math.abs(grad)+    return grad_scores++  def _get_drop_weights(self, optimizer, step, mask, weight, noise_std=0):+    """+    Drop connections based on weight magnitude.+    """+    masked_weight = mask * weight+    weight_scores = tf.math.abs(masked_weight)+    # add noise to break ties, although the possibility is very low+    if noise_std != 0:+      weight_scores.assign_add(self._random_normal(+        weight_scores.shape, stddev=noise_std, dtype=weight_scores.dtype,+        seed=(hash(weight.name + 'drop'))))+    return weight_scores++  def _reset_momentum(self, optimizer, weight, new_connections):+    """Zeros out optimizer slots whose connections have been recovered."""+    for slot_name in optimizer._optimizer.get_slot_names():+      # reset aggregated momentum variables to 0+      opt_var = optimizer._optimizer.get_slot(weight, slot_name)+      new_values = tf.where(new_connections,+                            tf.zeros_like(opt_var), opt_var)+      opt_var.assign(new_values)++  def _generic_top_k(self, scores, mask, n_new, n_total):+     # sort the entire array so that it can be constant for TPUs+    _, sorted_idx = tf.math.top_k(+      tf.reshape(scores, (-1), k=n_total)+    )+    expanded_sorted_idx = tf.expand_dims(sorted_idx, 1)+    new_values = tf.where(+      tf.range(n_total) < n_new,+      tf.ones_like(sorted_idx, dtype=mask.dtype),+      tf.zeros_like(sorted_idx, dtype=mask.dtype)+    )+    updated_mask = tf.scatter_nd(expanded_sorted_idx, new_values, new_values.shape)++    return updated_mask++  def _get_new_connections(self, reinit_when_same, grown_mask_reshaped, mask):+    if reinit_when_same:+      new_connections = tf.math.equal(grown_mask_reshaped, 1)+    else:+      new_connections = tf.math.logical_and(+        tf.math.equal(grown_mask_reshaped, 1), tf.math.equal(mask, 0)+      )+    return new_connections++  def _update_mask(self, step, update_fraction, mask, weight, grad):+    """Called by _maybe_update_block_mask.++    Updates mask based on weight and grad information.+    """+    # compute the top k magnitudes then update the current mask+    drop_scores = self._get_drop_weights(optimizer, step, mask, weight, noise_std=self._noise_std)+    # need access to exactly which entries are growing to zero out optimizer slot+    grow_scores = self._get_grow_grads(optimizer, step, mask, grad)+    n_total = tf.size(drop_scores)+    n_ones = tf.cast(tf.reduce_sum(mask), dtype=tf.int32)

It's probably way more efficient to compute this from a sparsity float at the start rather than trying to reduce the mask every iteration.

This could be an argument for taking a sparsity float at the start instead of an initializer. (Though there are valid arguments for taking an initializer and not a sparsity float too.)

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_normal(*args, **kwargs)+    return tf.random.normal(*args, **kwargs)++  def _get_grow_grads(self, optimizer, step, mask, grad):+    """Grow connections baased on gradient information.+    """+    grad_scores = tf.math.abs(grad)+    return grad_scores++  def _get_drop_weights(self, optimizer, step, mask, weight, noise_std=0):+    """+    Drop connections based on weight magnitude.+    """+    masked_weight = mask * weight+    weight_scores = tf.math.abs(masked_weight)+    # add noise to break ties, although the possibility is very low+    if noise_std != 0:+      weight_scores.assign_add(self._random_normal(+        weight_scores.shape, stddev=noise_std, dtype=weight_scores.dtype,+        seed=(hash(weight.name + 'drop'))))+    return weight_scores++  def _reset_momentum(self, optimizer, weight, new_connections):+    """Zeros out optimizer slots whose connections have been recovered."""+    for slot_name in optimizer._optimizer.get_slot_names():+      # reset aggregated momentum variables to 0+      opt_var = optimizer._optimizer.get_slot(weight, slot_name)+      new_values = tf.where(new_connections,+                            tf.zeros_like(opt_var), opt_var)+      opt_var.assign(new_values)++  def _generic_top_k(self, scores, mask, n_new, n_total):+     # sort the entire array so that it can be constant for TPUs+    _, sorted_idx = tf.math.top_k(+      tf.reshape(scores, (-1), k=n_total)+    )+    expanded_sorted_idx = tf.expand_dims(sorted_idx, 1)+    new_values = tf.where(+      tf.range(n_total) < n_new,+      tf.ones_like(sorted_idx, dtype=mask.dtype),+      tf.zeros_like(sorted_idx, dtype=mask.dtype)+    )+    updated_mask = tf.scatter_nd(expanded_sorted_idx, new_values, new_values.shape)++    return updated_mask++  def _get_new_connections(self, reinit_when_same, grown_mask_reshaped, mask):+    if reinit_when_same:+      new_connections = tf.math.equal(grown_mask_reshaped, 1)+    else:+      new_connections = tf.math.logical_and(+        tf.math.equal(grown_mask_reshaped, 1), tf.math.equal(mask, 0)+      )+    return new_connections++  def _update_mask(self, step, update_fraction, mask, weight, grad):+    """Called by _maybe_update_block_mask.++    Updates mask based on weight and grad information.+    """+    # compute the top k magnitudes then update the current mask+    drop_scores = self._get_drop_weights(optimizer, step, mask, weight, noise_std=self._noise_std)+    # need access to exactly which entries are growing to zero out optimizer slot+    grow_scores = self._get_grow_grads(optimizer, step, mask, grad)+    n_total = tf.size(drop_scores)+    n_ones = tf.cast(tf.reduce_sum(mask), dtype=tf.int32)+    n_prune = tf.cast(+      tf.cast(n_ones, dtype=tf.float32) * update_fraction, tf.int32+    )+    n_keep = n_ones - n_prune++    dropped_mask = self._generic_top_k(drop_scores, mask, n_keep, n_total)++    if grow_scores is not None:+      # flatten the scores+      grow_scores = tf.reshape(grow_scores, (-1))+      # set enabled connections (ones) to min(scores) - 1, i.e. they have the lowest scores+      grow_scores_lifted = tf.where(+        tf.math.equal(dropped_mask, 1),+        tf.ones_like(dropped_mask) * (tf.reduce_min(grow_scores) - 1), grow_scores+      )+      grown_mask = self._generic_top_k(grow_scores_lifted, mask, n_prune, n_total)+      # ensure that masks are disjoint

Is it by definition true that the grown & dropped masks will be disjoint? Is it necessary for a weight to have previously been masked out to be a candidate for growing?

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.

Update this docstring, link to the RIGL paper and give a summary of what this pruner is doing.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_normal(*args, **kwargs)+    return tf.random.normal(*args, **kwargs)++  def _get_grow_grads(self, optimizer, step, mask, grad):+    """Grow connections baased on gradient information.+    """+    grad_scores = tf.math.abs(grad)+    return grad_scores++  def _get_drop_weights(self, optimizer, step, mask, weight, noise_std=0):+    """+    Drop connections based on weight magnitude.+    """+    masked_weight = mask * weight+    weight_scores = tf.math.abs(masked_weight)+    # add noise to break ties, although the possibility is very low+    if noise_std != 0:+      weight_scores.assign_add(self._random_normal(+        weight_scores.shape, stddev=noise_std, dtype=weight_scores.dtype,+        seed=(hash(weight.name + 'drop'))))+    return weight_scores++  def _reset_momentum(self, optimizer, weight, new_connections):+    """Zeros out optimizer slots whose connections have been recovered."""

In the docstring add an explanation for why we are zeroing out optimizer slots in the nested optimizer.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in riglpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import riglpruner as pruner++dtypes = tf.dtypes+test = tf.test+++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class RiglPruningTest(test.TestCase, parameterized.TestCase):

My comment here is to add tests :)

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):

You're not using _random_uniform anywhere? Go ahead and remove it

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)

As mentioned above/below, just use self._seed w/o using self._seed_offset

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_normal(*args, **kwargs)+    return tf.random.normal(*args, **kwargs)++  def _get_grow_grads(self, optimizer, step, mask, grad):+    """Grow connections baased on gradient information.+    """+    grad_scores = tf.math.abs(grad)+    return grad_scores++  def _get_drop_weights(self, optimizer, step, mask, weight, noise_std=0):+    """+    Drop connections based on weight magnitude.+    """+    masked_weight = mask * weight+    weight_scores = tf.math.abs(masked_weight)+    # add noise to break ties, although the possibility is very low+    if noise_std != 0:+      weight_scores.assign_add(self._random_normal(+        weight_scores.shape, stddev=noise_std, dtype=weight_scores.dtype,+        seed=(hash(weight.name + 'drop'))))+    return weight_scores++  def _reset_momentum(self, optimizer, weight, new_connections):+    """Zeros out optimizer slots whose connections have been recovered."""+    for slot_name in optimizer._optimizer.get_slot_names():+      # reset aggregated momentum variables to 0+      opt_var = optimizer._optimizer.get_slot(weight, slot_name)+      new_values = tf.where(new_connections,

Haha I guess tf.where has finally made an appearance :)

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_normal(*args, **kwargs)+    return tf.random.normal(*args, **kwargs)++  def _get_grow_grads(self, optimizer, step, mask, grad):+    """Grow connections baased on gradient information.+    """+    grad_scores = tf.math.abs(grad)+    return grad_scores++  def _get_drop_weights(self, optimizer, step, mask, weight, noise_std=0):+    """+    Drop connections based on weight magnitude.+    """+    masked_weight = mask * weight+    weight_scores = tf.math.abs(masked_weight)+    # add noise to break ties, although the possibility is very low+    if noise_std != 0:+      weight_scores.assign_add(self._random_normal(+        weight_scores.shape, stddev=noise_std, dtype=weight_scores.dtype,+        seed=(hash(weight.name + 'drop'))))

It's probably better to just use self._seed as the seed, we'd like to avoid having weight names be load-bearing in terms of the random number generation.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to +        its original value, or to set it to 0+    """+    super(RiGLPruner, self).__init__(+        update_schedule, block_size, block_pooling_type)+    self.initializer = initializer+    self.target_sparsity = sparsity+    self.update_schedule = update_schedule+    self._block_size = block_size+    self.block_pooling_type = block_pooling_type+    self._stateless = stateless+    self._seed = seed+    self._seed_offset = seed_offset+    self._noise_std = noise_std+    self._reinit_when_same = reinit+  +  def create_slots(self, optimizer, var):+    base_dtype = var.dtype+    optimizer.add_slot(var, 'mask', initializer=self.initializer(self.target_sparsity))+++  def _validate_block(self, var):+    if self._block_size != [1, 1]:+      if var.get_shape().ndims != 2:+        raise ValueError('Block Sparsity can only be used for layers which '+                          'have 2-dimensional weights.')++  def _random_uniform(self, step, *args, **kwargs):+    """Uniform noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_uniform(*args, **kwargs)+    return tf.random.uniform(*args, **kwargs)++  def _random_normal(self, step, *args, **kwargs):+    """Gaussian noise distribution"""+    if self._stateless:+      offset_seed = self._seed_offset + kwargs.get('seed', 0)+      kwargs['seed'] = tf.cast(+        tf.stack([offset_seed, step], tf.int32)+      )+      return tf.random.stateless_normal(*args, **kwargs)+    return tf.random.normal(*args, **kwargs)++  def _get_grow_grads(self, optimizer, step, mask, grad):+    """Grow connections baased on gradient information.+    """+    grad_scores = tf.math.abs(grad)+    return grad_scores++  def _get_drop_weights(self, optimizer, step, mask, weight, noise_std=0):+    """+    Drop connections based on weight magnitude.+    """+    masked_weight = mask * weight+    weight_scores = tf.math.abs(masked_weight)+    # add noise to break ties, although the possibility is very low+    if noise_std != 0:+      weight_scores.assign_add(self._random_normal(

weight_scores isn't a variable, so you can't assign_add on it.

Do weight_scores = weight_scores + ...

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.+      reinit: boolean for whether to reinitialize a connection that was drop and regorwn to 

Super nit: typo in regrown

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.+      seed: assigned by PruningConfig for multiworker consistency in random processes.+      seed_offset: added to seed to run different experiments.

Only take a seed in the pruner, don't take a seed_offset. You can pass a seed to the PruningConfig and just let it figure out a seed (+ offsets) to use for each pruner.

Reason: You're always adding seed and seed_offset together by summing them, so there's no need to leak the higher-level concept of "separate experiment seed offset" into the pruner itself.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,+      stateless=False,+      seed=0,+      seed_offset=0,+      noise_std=0,+      reinit=False+    ):+    """The logic for magnitude-based pruning weight tensors.++    Args:+      update_schedule: A `PruningSchedule` object that controls pruning rate+        throughout training.+      sparsity: the sparsity at which the dynamic sparse training method uses+      block_size: The dimensions (height, weight) for the block sparse pattern+        in rank-2 weight tensors.+      block_pooling_type: (optional) The function to use to pool weights in the+        block. Must be 'AVG' or 'MAX'.+      initializer: the initial sparsity distribution Callable for each layer of the network.+      stateless: whether or not being run on TPU/multi-GPU workers.

Always use stateless, it's exclusively better to have well-set up deterministic pseudo-random numbers than non-deterministic pseudo-random numbers.

xwinxu

comment created time in 7 days

Pull request review commenttomerk/model-optimization

Implement RIGL Pruner and Tests

++# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Helper functions to add support for iterative magnitude/gradient pruning as seen in the RiGL experiments."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from absl import logging++from tensorflow.python.ops import summary_ops_v2+from tensorflow.python.summary import summary as summary_ops_v1+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import schedule as update_schedule+from tensorflow_model_optimization.python.core.sparsity_tf2 import pruner+from tensorflow_model_optimization.python.core.sparsity_tf2 import sparse_utils+++class RiGLPruner(pruner.Pruner):+  """+  Implementation of the RiGL dynamic sparse training algorithm.+  """++  def __init__(self,+      update_schedule=update_schedule.ConstantSchedule(0.5, 0),+      sparsity=0.5,+      block_size=(1,1),+      block_pooling_type='AVG',+      initializer=sparse_utils.PermuteOnes,

Now that I've had the chance to look at how the code is set up right now:

Either take sparsity and directly build a PermuteOnes initializer when you need it, or take an already-instantiated initializer directly but don't take a sparsity float.

xwinxu

comment created time in 7 days

issue closedtensorflow/tensorflow

Keras layer weights/sublayers getting deleted when creating a model with them. model.summary() / plot_model still shows those weights as part of graph though

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Google colab enviroment

  • TensorFlow installed from (source or binary): Google colab default

  • Python version: Python 3, Google colab default

  • CUDA/cuDNN version: Google colab default

  • GPU model and memory: Tested on both Google colab p-100 GPU and CPU

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

2020-06-20 21:44:17.003371: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1 v2.2.0-0-g2b96f3662b 2.2.0

Describe the current behavior

I created a new model using two layers from and old model. However, now all of the layers/weights from the old model are Not showing up in the new model.

model.summary() and

tf.keras.utils.plot_model(
    model, to_file='model.png', show_shapes=False, show_layer_names=True,
    rankdir='TB', expand_nested=False, dpi=96
)

still has those weights, so I think they're a part of the graph. But when I print them out, those weights/layers are missing altogether

Describe the expected behavior

All weights from component layers to should be in the model.

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

Here is a Colabnotebook with a minimal example that reproduced the issue.

https://colab.research.google.com/drive/1n3_XNhdgH6Qo7GT-M570lIKWAoU3TML5?usp=sharing

And here is the code

!pip install transformers --q
%tensorflow_version 2.x

from transformers import TFBertModel, AutoModel, TFRobertaModel, AutoTokenizer
import tensorflow as tf
import tensorflow_addons as tfa

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

from tensorflow import keras
from tensorflow.keras import layers
from copy import deepcopy

logger = tf.get_logger()
logger.info(tf.__version__)


def get_mini_models():
    tempModel = TFRobertaModel.from_pretrained('bert-base-uncased', from_pt=True)

    layer9 = deepcopy(tempModel.layers[0].encoder.layer[8])
    layer10 = deepcopy(tempModel.layers[0].encoder.layer[9])

    inputHiddenVals = tf.keras.Input(shape=[None, None], dtype=tf.float32, name='input_Q',
                                    batch_size=None) 

    hidden1 = layer9((inputHiddenVals, None, None))
    hidden2 = layer10((hidden1[0], None, None))
    modelNew = tf.keras.Model(inputs=inputHiddenVals, outputs=hidden2)

    del tempModel

    return modelNew

@tf.function
def loss_fn(_, probs):
    bs = tf.shape(probs)[0]
    labels = tf.eye(bs, bs)
    return tf.losses.categorical_crossentropy(labels,
                                              probs,
                                              from_logits=True)

model = get_mini_models()
model.compile(loss=loss_fn,
                optimizer=tfa.optimizers.AdamW(weight_decay=1e-4, learning_rate=1e-5, 
                                                epsilon=1e-06))

# Get model and layers directly to compare
tempModel = TFRobertaModel.from_pretrained('bert-base-uncased', from_pt=True)
layer9 = deepcopy(tempModel.layers[0].encoder.layer[8])
layer10 = deepcopy(tempModel.layers[0].encoder.layer[9])

# Only one layer, and that layer also has missing weights. 
for i, var in enumerate(model.weights):
    print(model.weights[i].name)

# Full weights for one layer 
for i, var in enumerate(layer9.weights):
    print(layer9.weights[i].name)

# Test what correct output should be 

tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
inputt = tokenizer.encode('This is a sentence', return_tensors='tf')
outt = tempModel(inputt)[0]

# Test model output. Not the same. 

model(outt)

# Model summary somehow lists the weights 
model.summary()

# Model diagram shows the correct connections between all the layers. 

tf.keras.utils.plot_model(
    model, to_file='model.png', show_shapes=False, show_layer_names=True,
    rankdir='TB', expand_nested=False, dpi=96
)

Edit: I also tried making the layers from scratch, and setting the weights directly, and got the same result. Here's a colab notebook that does this. https://colab.research.google.com/drive/1EC_fObSp9lUsj_PFaYgFtRI93ErPYmU9?usp=sharing

And here's the code

!pip install transformers --q
%tensorflow_version 2.x

from transformers import TFBertModel, AutoModel, TFRobertaModel, AutoTokenizer

import tensorflow as tf
import tensorflow_addons as tfa

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers import (Dense,
                                     Dropout)
import numpy as np
import os

logger = tf.get_logger()
logger.info(tf.__version__)

class TFBertSelfAttention2(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        if config.hidden_size % config.num_attention_heads != 0:
            raise ValueError(
                "The hidden size (%d) is not a multiple of the number of attention "
                "heads (%d)" % (config.hidden_size, config.num_attention_heads)
            )

        self.num_attention_heads = config.num_attention_heads
        assert config.hidden_size % config.num_attention_heads == 0
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = tf.keras.layers.Dense(
            self.all_head_size, kernel_initializer=get_initializer(config.initializer_range), name="query_2"
        )
        self.key = tf.keras.layers.Dense(
            self.all_head_size, kernel_initializer=get_initializer(config.initializer_range), name="key_2"
        )
        self.value = tf.keras.layers.Dense(
            self.all_head_size, kernel_initializer=get_initializer(config.initializer_range), name="value_2"
        )

        self.dropout = tf.keras.layers.Dropout(config.attention_probs_dropout_prob)

    def transpose_for_scores(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_attention_heads, self.attention_head_size))
        return tf.transpose(x, perm=[0, 2, 1, 3])

    def call(self, inputs, training=False):
        hidden_states, attention_mask, head_mask, output_attentions = inputs

        batch_size = shape_list(hidden_states)[0]
        mixed_query_layer = self.query(hidden_states)
        mixed_key_layer = self.key(hidden_states)
        mixed_value_layer = self.value(hidden_states)

        query_layer = self.transpose_for_scores(mixed_query_layer, batch_size)
        key_layer = self.transpose_for_scores(mixed_key_layer, batch_size)
        value_layer = self.transpose_for_scores(mixed_value_layer, batch_size)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = tf.matmul(
            query_layer, key_layer, transpose_b=True
        )  # (batch size, num_heads, seq_len_q, seq_len_k)
        dk = tf.cast(shape_list(key_layer)[-1], tf.float32)  # scale attention_scores
        attention_scores = attention_scores / tf.math.sqrt(dk)

        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in TFBertModel call() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = tf.nn.softmax(attention_scores, axis=-1)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs, training=training)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = tf.matmul(attention_probs, value_layer)

        context_layer = tf.transpose(context_layer, perm=[0, 2, 1, 3])
        context_layer = tf.reshape(
            context_layer, (batch_size, -1, self.all_head_size)
        )  # (batch_size, seq_len_q, all_head_size)

        outputs = (
            (context_layer, attention_probs) if cast_bool_to_primitive(output_attentions) is True else (context_layer,)
        )

        return outputs


class TFBertSelfOutput2(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(
            config.hidden_size, kernel_initializer=get_initializer(config.initializer_range), name="dense2"
        )
        self.LayerNorm = tf.keras.layers.LayerNormalization(epsilon=config.layer_norm_eps, name="LayerNorm2")
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)

    def call(self, inputs, training=False):
        hidden_states, input_tensor = inputs

        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states, training=training)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class TFBertAttention2(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.self_attention = TFBertSelfAttention2(config, name="self2")
        self.dense_output = TFBertSelfOutput2(config, name="output2")

    def prune_heads(self, heads):
        raise NotImplementedError

    def call(self, inputs, training=False):
        input_tensor, attention_mask, head_mask, output_attentions = inputs

        self_outputs = self.self_attention(
            [input_tensor, attention_mask, head_mask, output_attentions], training=training
        )
        attention_output = self.dense_output([self_outputs[0], input_tensor], training=training)
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs


class TFBertIntermediate2(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(
            config.intermediate_size, kernel_initializer=get_initializer(config.initializer_range), name="dense2"
        )
        if isinstance(config.hidden_act, str):
            self.intermediate_act_fn = ACT2FN[config.hidden_act]
        else:
            self.intermediate_act_fn = config.hidden_act

    def call(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.intermediate_act_fn(hidden_states)
        return hidden_states


class TFBertOutput2(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(
            config.hidden_size, kernel_initializer=get_initializer(config.initializer_range), name="dense2"
        )
        self.LayerNorm = tf.keras.layers.LayerNormalization(epsilon=config.layer_norm_eps, name="LayerNorm2")
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)

    def call(self, inputs, training=False):
        hidden_states, input_tensor = inputs

        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states, training=training)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class TFBertLayer2(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.attention = TFBertAttention2(config, name="attention2")
        self.intermediate = TFBertIntermediate2(config, name="intermediate2")
        self.bert_output = TFBertOutput2(config, name="output2")

    def call(self, inputs, training=False):
        hidden_states, attention_mask, head_mask, output_attentions = inputs

        attention_outputs = self.attention(
            [hidden_states, attention_mask, head_mask, output_attentions], training=training
        )
        attention_output = attention_outputs[0]
        intermediate_output = self.intermediate(attention_output)
        layer_output = self.bert_output([intermediate_output, attention_output], training=training)
        outputs = (layer_output,) + attention_outputs[1:]  # add attentions if we output them
        return outputs


class TFBertSelfAttention(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        if config.hidden_size % config.num_attention_heads != 0:
            raise ValueError(
                "The hidden size (%d) is not a multiple of the number of attention "
                "heads (%d)" % (config.hidden_size, config.num_attention_heads)
            )

        self.num_attention_heads = config.num_attention_heads
        assert config.hidden_size % config.num_attention_heads == 0
        self.attention_head_size = int(config.hidden_size / config.num_attention_heads)
        self.all_head_size = self.num_attention_heads * self.attention_head_size

        self.query = tf.keras.layers.Dense(
            self.all_head_size, kernel_initializer=get_initializer(config.initializer_range), name="query_"
        )
        self.key = tf.keras.layers.Dense(
            self.all_head_size, kernel_initializer=get_initializer(config.initializer_range), name="key_"
        )
        self.value = tf.keras.layers.Dense(
            self.all_head_size, kernel_initializer=get_initializer(config.initializer_range), name="value_"
        )

        self.dropout = tf.keras.layers.Dropout(config.attention_probs_dropout_prob)

    def transpose_for_scores(self, x, batch_size):
        x = tf.reshape(x, (batch_size, -1, self.num_attention_heads, self.attention_head_size))
        return tf.transpose(x, perm=[0, 2, 1, 3])

    def call(self, inputs, training=False):
        hidden_states, attention_mask, head_mask, output_attentions = inputs

        batch_size = shape_list(hidden_states)[0]
        mixed_query_layer = self.query(hidden_states)
        mixed_key_layer = self.key(hidden_states)
        mixed_value_layer = self.value(hidden_states)

        query_layer = self.transpose_for_scores(mixed_query_layer, batch_size)
        key_layer = self.transpose_for_scores(mixed_key_layer, batch_size)
        value_layer = self.transpose_for_scores(mixed_value_layer, batch_size)

        # Take the dot product between "query" and "key" to get the raw attention scores.
        attention_scores = tf.matmul(
            query_layer, key_layer, transpose_b=True
        )  # (batch size, num_heads, seq_len_q, seq_len_k)
        dk = tf.cast(shape_list(key_layer)[-1], tf.float32)  # scale attention_scores
        attention_scores = attention_scores / tf.math.sqrt(dk)

        if attention_mask is not None:
            # Apply the attention mask is (precomputed for all layers in TFBertModel call() function)
            attention_scores = attention_scores + attention_mask

        # Normalize the attention scores to probabilities.
        attention_probs = tf.nn.softmax(attention_scores, axis=-1)

        # This is actually dropping out entire tokens to attend to, which might
        # seem a bit unusual, but is taken from the original Transformer paper.
        attention_probs = self.dropout(attention_probs, training=training)

        # Mask heads if we want to
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

        context_layer = tf.matmul(attention_probs, value_layer)

        context_layer = tf.transpose(context_layer, perm=[0, 2, 1, 3])
        context_layer = tf.reshape(
            context_layer, (batch_size, -1, self.all_head_size)
        )  # (batch_size, seq_len_q, all_head_size)

        outputs = (
            (context_layer, attention_probs) if cast_bool_to_primitive(output_attentions) is True else (context_layer,)
        )

        return outputs


class TFBertSelfOutput(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(
            config.hidden_size, kernel_initializer=get_initializer(config.initializer_range), name="dense"
        )
        self.LayerNorm = tf.keras.layers.LayerNormalization(epsilon=config.layer_norm_eps, name="LayerNorm")
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)

    def call(self, inputs, training=False):
        hidden_states, input_tensor = inputs

        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states, training=training)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class TFBertAttention(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.self_attention = TFBertSelfAttention(config, name="self")
        self.dense_output = TFBertSelfOutput(config, name="output")

    def prune_heads(self, heads):
        raise NotImplementedError

    def call(self, inputs, training=False):
        input_tensor, attention_mask, head_mask, output_attentions = inputs

        self_outputs = self.self_attention(
            [input_tensor, attention_mask, head_mask, output_attentions], training=training
        )
        attention_output = self.dense_output([self_outputs[0], input_tensor], training=training)
        outputs = (attention_output,) + self_outputs[1:]  # add attentions if we output them
        return outputs


class TFBertIntermediate(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(
            config.intermediate_size, kernel_initializer=get_initializer(config.initializer_range), name="dense"
        )
        if isinstance(config.hidden_act, str):
            self.intermediate_act_fn = ACT2FN[config.hidden_act]
        else:
            self.intermediate_act_fn = config.hidden_act

    def call(self, hidden_states):
        hidden_states = self.dense(hidden_states)
        hidden_states = self.intermediate_act_fn(hidden_states)
        return hidden_states


class TFBertOutput(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.dense = tf.keras.layers.Dense(
            config.hidden_size, kernel_initializer=get_initializer(config.initializer_range), name="dense"
        )
        self.LayerNorm = tf.keras.layers.LayerNormalization(epsilon=config.layer_norm_eps, name="LayerNorm")
        self.dropout = tf.keras.layers.Dropout(config.hidden_dropout_prob)

    def call(self, inputs, training=False):
        hidden_states, input_tensor = inputs

        hidden_states = self.dense(hidden_states)
        hidden_states = self.dropout(hidden_states, training=training)
        hidden_states = self.LayerNorm(hidden_states + input_tensor)
        return hidden_states


class TFBertLayer(tf.keras.layers.Layer):
    def __init__(self, config, **kwargs):
        super().__init__(**kwargs)
        self.attention = TFBertAttention(config, name="attention")
        self.intermediate = TFBertIntermediate(config, name="intermediate")
        self.bert_output = TFBertOutput(config, name="output")

    def call(self, inputs, training=False):
        hidden_states, attention_mask, head_mask, output_attentions = inputs

        attention_outputs = self.attention(
            [hidden_states, attention_mask, head_mask, output_attentions], training=training
        )
        attention_output = attention_outputs[0]
        intermediate_output = self.intermediate(attention_output)
        layer_output = self.bert_output([intermediate_output, attention_output], training=training)
        outputs = (layer_output,) + attention_outputs[1:]  # add attentions if we output them
        return outputs

configBase = {
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "eos_token_id": 2,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-05,
  "max_position_embeddings": 514,
  "model_type": "roberta",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 1,
  "type_vocab_size": 1,
  "vocab_size": 50265
}

class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

config = AttrDict(configBase)

def get_initializer(initializer_range=0.02):
    """Creates a `tf.initializers.truncated_normal` with the given range.
    Args:
        initializer_range: float, initializer range for stddev.
    Returns:
        TruncatedNormal initializer with stddev = `initializer_range`.
    """
    return tf.keras.initializers.TruncatedNormal(stddev=initializer_range)


def gelu(x):
    """ Gaussian Error Linear Unit.
    Original Implementation of the gelu activation function in Google Bert repo when initially created.
        For information: OpenAI GPT's gelu is slightly different (and gives slightly different results):
        0.5 * x * (1 + torch.tanh(math.sqrt(2 / math.pi) * (x + 0.044715 * torch.pow(x, 3))))
        Also see https://arxiv.org/abs/1606.08415
    """
    cdf = 0.5 * (1.0 + tf.math.erf(x / tf.math.sqrt(2.0)))
    return x * cdf

ACT2FN = {
    "gelu": tf.keras.layers.Activation(gelu),
}

def shape_list(x):
    """Deal with dynamic shape in tensorflow cleanly."""
    static = x.shape.as_list()
    dynamic = tf.shape(x)
    return [dynamic[i] if s is None else s for i, s in enumerate(static)]

def cast_bool_to_primitive(bool_variable, default_tensor_to_true=False):
    """Function arguments can be inserted as boolean tensor
        and bool variables to cope with keras serialization
        we need to cast `output_attentions` to correct bool
        if it is a tensor
    Args:
        default_tensor_to_true: bool, if tensor should default to True
        in case tensor has no numpy attribute
    """
    # if bool variable is tensor and has numpy value
    if tf.is_tensor(bool_variable):
        if hasattr(bool_variable, "numpy"):
            return bool(bool_variable.numpy())
        elif default_tensor_to_true:
            return True

    # else variable is bool
    return bool_variable

def get_2_transformerLayerP(numb):
    tokenizer = AutoTokenizer.from_pretrained('allenai/biomed_roberta_base')
    inputt = tokenizer.encode('This is a sentence', return_tensors='tf')
    tempModel = TFRobertaModel.from_pretrained('allenai/biomed_roberta_base', from_pt=True)
    outt = tempModel(inputt)[0]

    t_layer11 = TFBertLayer(config, name="layer_._{}".format(11+numb))
    t_layer12 = TFBertLayer2(config, name="layer_._{}".format(12+numb))

    t_layer11((outt, None, None, None))
    t_layer12((outt, None, None, None))

    t_layer11.set_weights( tempModel.layers[0].encoder.layer[10].get_weights() )
    t_layer12.set_weights( tempModel.layers[0].encoder.layer[11].get_weights() )

    t_layer12.intermediate.intermediate_act_fn = tf.keras.activations.tanh

    del tokenizer
    del tempModel

    return t_layer11, t_layer12

def get_mini_models():
    P_trans11, P_trans12 = get_2_transformerLayerP(6)

    inputHiddenVals = tf.keras.Input(shape=[None, None], dtype=tf.float32, name='input_Q',
                                    batch_size=None) 

    P_outputs = P_trans11((inputHiddenVals, None, None, None))[0]
    P_outputsFinal = P_trans12((P_outputs, None, None, None))[0]
    modelNew = tf.keras.Model(inputs=inputHiddenVals, outputs=P_outputsFinal)

    return modelNew

@tf.function
def loss_fn(_, probs):

    bs = tf.shape(probs)[0]
    labels = tf.eye(bs, bs)
    return tf.losses.categorical_crossentropy(labels,
                                              probs,
                                              from_logits=True)

model = get_mini_models()
model.compile(loss=loss_fn,
                optimizer=tfa.optimizers.AdamW(weight_decay=1e-4, learning_rate=1e-5, 
                                                epsilon=1e-06))

for i, var in enumerate(model.trainable_weights):
    print(model.trainable_weights[i].name)

closed time in 7 days

Santosh-Gupta

issue commenttensorflow/tensorflow

Keras layer weights/sublayers getting deleted when creating a model with them. model.summary() / plot_model still shows those weights as part of graph though

The functional API refactoring has landed in the nightlies, so this should now be fixed in the nightlies.

Santosh-Gupta

comment created time in 7 days

issue closedtensorflow/tensorflow

Memory Leak with latest Tensorflow

System information

  • Have I written custom code: Yes, but minimal
  • OS Platform and Distribution: Windows 10
  • TensorFlow installed from: conda
  • TensorFlow version: 2.3.0
  • Python version: 3.7.4

Current behavior Memory leaks. Expected behavior Memory does not leak. Code to reproduce the issue

import numpy as np
import tensorflow
from tensorflow import keras
from tensorflow.keras import layers
import gc
import tracemalloc    
if __name__ == "__main__":
    tracemalloc.start()
    while True:
        inputs = keras.Input(shape=(10,))
        out = layers.Dense(1)(inputs)
        model = keras.Model(inputs=inputs, outputs=out)
        model.compile(optimizer="adam", loss="mse")
        train = np.random.rand(1000,10)
        label = np.random.rand(1000)
        model.fit(train, label)
        gc.collect()
        current, peak = tracemalloc.get_traced_memory()
        print(f"Current memory usage is {current / 10**6}MB; Peak was {peak / 10**6}MB")

closed time in 8 days

fabianvdW

issue commenttensorflow/tensorflow

Memory Leak with latest Tensorflow

Hi @fabianvdW, in TF 2.3 and earlier Keras puts all model construction in the same global background graph workspace, which leads to a memory leak unless you explicitly call keras.backend.clear_session.

A few days ago though we enabled a refactoring of the Functional API implementation internals in the nightlies which gets rid of this background global graph, so you should no longer see this memory leak. (In the nightlies at least). Please do re-open this issue if you're still seeing one in the nightlies though or if you spot memory leaks elsewhere!

Best, Tomer

fabianvdW

comment created time in 8 days

Pull request review commenttensorflow/community

RFC: TensorFlow Extension Types

+# Extension Types++| Status        | Proposed                                             |+:-------------- |:---------------------------------------------------- |+| **Authors**   | Edward Loper (edloper@google.com) |+| **Sponsor**   | Alex Passos (apassos@google.com)                     |+| **Updated**   | 2020-07-21                                           |++## Objective++This RFC proposes a protocol that can be used to define **_user-defined+object-oriented Python types_** that are supported by TensorFlow APIs.++## Motivation++Object oriented types can make systems more readable, modular, maintainable.+However, most TensorFlow APIs do not currently support user-defined Python+types.  This includes both high-level APIs (such as `Keras`, `tf.function`,+`tf.SavedModel`) and lower-level APIs (such as `tf.while_loop` and `tf.add`).  ++This RFC proposes a set of protocols that will allow TensorFlow APIs to handle+user-defined Python types.  A version of this interface is already used+internally to implement several core TensorFlow data types, including+`tf.SparseTensor`, `tf.RaggedTensor`, `tf.data.Dataset`, and+`tf.StructuredTensor`.++At a high level, types supported by this interface can be divided into two broad+categories:++* **_General data structures_**.+  These types are handled by "generic" APIs whose behavior does not depend on+  the value of each object (such as `tf.function`, `SavedModel`, and+  `tf.while_loop`).++* **_Tensor-like types_**, which specialize or extend tf.Tensor.+  Values of these types have a `rank`, a `shape`, and usually a `dtype`.  In+  addition to the "generic" APIs, these types can be handled by Tensor-specific+  APIs (such as `tf.stack`, `tf.add`, and `tf.reduce_mean`).++Examples of user-defined types that could defined or extended with this protocol+include:++**General data structures:**++* `tfp.Distribution`: Encodes a statistical distribution.+* `TensorDigraph`: Encodes the set of nodes and edges in a directed graph.+* `DimensionAlignment`: Encodes a correspondence between two related dimensions+  (e.g., between a `word` dimension and a `speaker` dimension).++**Tensor-like types:**++* `CSRSparseTensor`: A sparsely-encoded tensor that uses the Compressed Sparse+  Row encoding.+* `MaskedTensor`: Pairs a Tensor with a corresponding boolean mask, indicating+  which values are valid, and automatically updates the mask as appropriate+  when used with TensorFlow ops (such as `tf.add` or `tf.reduce_sum`).+* `LabeledTensor`: Pairs a Tensor with a list of axis names, which can be used+  for error detection and reporting, for specifying axes by name, and for+  broadcasting.++## User Benefit++### Object-Oriented TensorFlow++This proposal brings the benefits of Object-Oriented Programming to TensorFlow+users, allowing them to define modular encapsulated data structures that+interoperate with TensorFlow APIs.  This allows TensorFlow models to be defined+at a higher level of abstraction.++### Development outside the main TensorFlow repository++Prior to this proposal, the only way to develop such data structures (e.g.+`SparseTensor`) was to develop them inside the main TensorFlow code base.  This+introduced significant barriers to rapid development, including slow release+cycles, strong backwards compatibility constraints, and centralized API+approval.  By allowing such data structures to be developed outside the main+TensorFlow code base, we hope to make it much easier to experiment with new+types and designs.  If general-purpose types are developed that become+sufficiently mature, we may consider bringing them into the main TensorFlow code+base.++### TensorFlow APIs supported by user-defined types++User-defined types that implement the interface proposed by this RFC will be+supported by the following APIs:++* **Keras**: User-defined types can be used as inputs and outputs for Keras+  `Models` and `Layers`.

There's also examples of this object-oriented dispatch question in the core TF APIs themselves (e.g. for the various tf.linalg.LinearOperator APIs)

The tricky thing is these may not provide any sort of canonical way to serialize/deserialize traced dispatched operations, because they can't be mapped to/from strings directly at the top-level api.

edloper

comment created time in 8 days

issue commenttensorflow/tensorflow

Possible bug(?): tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor X>]

Hi @hoangcuong2011 , it's this way because models contain layer objects that they reuse, rather than re-creating the layers each time you call the model. These layers may get called in different settings (individually, as part of the larger model, in a tf.function, totally eagerly, etc.

We can handle Functional model definition via layer calls because we define __call__ in Keras and have users just override call. So, we can track a variety of metadata under the hood. We use this metadata whenever you call the constructed model to get the model working in the above settings. (e.g. we keep track of what partially-created values need to be forwarded to what)

But, arbitrary layer constructors for custom layers don't give us the same ability to track model structure metadata. Arguably there's maybe stuff we could do with python's __new__ in certain circumstances, but it would be very unreliable and would make for a poor user experience. So, it's the most straightforward to disallow it altogether.

It's generally fairly easy to make sure a layer takes all tensor inputs directly in call as opposed to in the constructor.

hoangcuong2011

comment created time in 11 days

issue commenttensorflow/tensorflow

model.predict is much slower on TF 2.1+

@ectg Did you try tf.function-ing your model.call method before calling your model? As such:

model.call = tf.function(model.call, experimental_relax_shapes)
model(..., training=False)

Also, what are the use cases where you all are finding that building a tf.dataset at the start and batch predicting is impractical? Knowing this would help our prioritization.

lihanchen

comment created time in 11 days

issue commenttensorflow/tensorflow

tf.image.ssim_multiscale broke in tensorflow 2.1.0-rc2

@Sajal-1 You can make & use custom layers in Keras, e.g. following this guide: https://www.tensorflow.org/guide/keras/custom_layers_and_models

Or use an arbitrary python lambda as a layer in Keras using this API: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Lambda?hl=en

isaacgerg

comment created time in 12 days

issue closedtensorflow/tensorflow

tf.debugging is not compatible with symbolic Keras tensors

System information

  • Have I written custom code: Yes
  • OS Platform and Distribution: Ubuntu 16.04
  • TensorFlow installed from: binary
  • TensorFlow version: 2.2.0
  • Python version: 3.6.9

Describe the current behavior

Functions from tf.debugging (such as tf.debugging.assert_equal) raise an exception when passing a Keras tensor as argument.

It appears that the functions run the eager code path even when one of the input is a symbolic tensor.

Describe the expected behavior

tf.debugging functions should work with Keras tensors.

Standalone code to reproduce the issue

import tensorflow as tf
x = tf.keras.Input(shape=[5], batch_size=2)
batch_size = tf.shape(x)[0]
tf.debugging.assert_equal(batch_size, 2)

Other info / logs

Traceback (most recent call last):
  File "assert.py", line 4, in <module>
    tf.debugging.assert_equal(batch_size, 2)
  File "/lib/python3.6/site-packages/tensorflow/python/ops/check_ops.py", line 648, in assert_equal_v2
    return assert_equal(x=x, y=y, summarize=summarize, message=message, name=name)
  File "/lib/python3.6/site-packages/tensorflow/python/ops/check_ops.py", line 659, in assert_equal
    data, summarize, message, name)
  File "/lib/python3.6/site-packages/tensorflow/python/ops/check_ops.py", line 334, in _binary_assert
    if condition:
  File "/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 778, in __bool__
    self._disallow_bool_casting()
  File "/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 548, in _disallow_bool_casting
    self._disallow_in_graph_mode("using a `tf.Tensor` as a Python `bool`")
  File "/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 537, in _disallow_in_graph_mode
    " this function with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

closed time in 12 days

guillaumekln

issue commenttensorflow/tensorflow

tf.debugging is not compatible with symbolic Keras tensors

Hi @guillaumekln,

By design you can only use tf APIs as layers in Keras Functional models when the APIs take tensors as input and output tensors. The various tf.debugging APIs return None (or an op that needs to be used as a control dependency), so they do not support Keras inputs.

If you're just looking to do a static shape check here you can statically get the shape instead of getting a symbolic tensor representing the shape by doing:

batch_size = int(x.shape[0])
tf.debugging.assert_equal(batch_size, 2)

On the other hand, if you want to encode a tf.debugging.assert as part of your actual model you can:

  1. put it in a tf.keras.Lambda layer that returns the inputs (w/ the debugging assert as a side effect run in the lambda), or 2. Put the tf.debugging call in a custom layer / custom model.
guillaumekln

comment created time in 12 days

issue commenttensorflow/tensorflow

Tensorflow 2.2 takes much more time than 2.1/2.0 to start training with "keras.fit"

Adding @sanjoy who may have more familiarity with the cuda setup than I do.

It also seems related to https://github.com/tensorflow/tensorflow/issues/33002

edwardyehuang

comment created time in 18 days

issue closedtensorflow/tensorflow

Possible bug(?): tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor X>]

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): MacOS High Sierra
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): >= 2.0
  • Python version: 3.6
  • Running on: CPUs (But I guess it happens on GPUs as well)

The following issue is very similar to this one I posted before. The difference here is that I use eager execution in the code and it produces an error of Keras symbolic tensors. To be fair I am not sure this is a bug or the error is on purpose. Here is the code to produce the error:

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

class MyWordEmbedding(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(MyWordEmbedding, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight(shape=(300, 512), dtype='float32')
        super(MyWordEmbedding, self).build(input_shape)  # Be sure to call this at the end
    
    def call(self, inputs):
        return tf.nn.embedding_lookup(params=self.kernel, ids=inputs[0])

class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, mask_para, **kwargs):
        self.mask_para = mask_para
        super(EncoderLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.Qdense = self.add_weight(name='Qdense', shape=(512, 512))
        super(EncoderLayer, self).build(input_shape)

    def call(self, x):
        Qoutput = tf.einsum('aij,jk->aik', x[0], self.Qdense)
        Koutput =  tf.einsum('aij,jk->aik', x[0], self.Qdense)
        Voutput =  tf.einsum('aij,jk->aik', x[0], self.Qdense)
        a = tf.einsum('ajk,afk->ajf', Qoutput, Koutput) * tf.tile(K.expand_dims(self.mask_para, axis=1), [1, 64, 1])
        a = tf.matmul(a, Voutput)
        print(a)
        return a

    def compute_mask(self, inputs, mask):
        return mask

    def compute_output_shape(self, input_shape):
        return input_shape[0]

def create_encoder_model():
    word_ids_fr = tf.keras.layers.Input(dtype='int32', shape=(None,))
    a = MyWordEmbedding()([word_ids_fr])
    a = EncoderLayer(K.cast(K.not_equal(0, word_ids_fr), dtype='float32'))([a])
    model = tf.keras.models.Model(inputs=[word_ids_fr], outputs=a)
    return model

def create_model():
    word_ids_en = tf.keras.layers.Input(dtype='int32', shape=(None,))
    a = tf.keras.layers.Input(shape=(None, 512,))
    b = MyWordEmbedding()([word_ids_en])
    b = b + a
    model = tf.keras.models.Model(inputs=[word_ids_en, a], outputs=b)
    return model
    
def evaluate():
    source_sequence_ids = pad_sequences(np.random.randint(5, size=(3, 64)), maxlen=64, padding='pre')
    output = decoder_model.predict([pad_sequences(np.random.randint(5, size=(3, 64)), maxlen=64, padding='post'), encoder_model(source_sequence_ids, training=False)], steps=1, verbose=1, batch_size=256)

decoder_model = create_model()
encoder_model = create_encoder_model()
evaluate()

Error:

tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor 'MatMul:0' shape=(3, 64, 512) dtype=float32>]

Meanwhile, I also provide a way to fix this as follows (just a simple modification):

import tensorflow as tf
from tensorflow.keras import backend as K
from tensorflow.keras.preprocessing.sequence import pad_sequences
import numpy as np

class MyWordEmbedding(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(MyWordEmbedding, self).__init__(**kwargs)

    def build(self, input_shape):
        self.kernel = self.add_weight(shape=(300, 512), dtype='float32')
        super(MyWordEmbedding, self).build(input_shape)  # Be sure to call this at the end
    
    def call(self, inputs):
        return tf.nn.embedding_lookup(params=self.kernel, ids=inputs[0])

class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(EncoderLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        self.Qdense = self.add_weight(name='Qdense', shape=(512, 512))
        super(EncoderLayer, self).build(input_shape)

    def call(self, x):
        mask_para = x[1]
        Qoutput = tf.einsum('aij,jk->aik', x[0], self.Qdense)
        Koutput =  tf.einsum('aij,jk->aik', x[0], self.Qdense)
        Voutput =  tf.einsum('aij,jk->aik', x[0], self.Qdense)
        a = tf.einsum('ajk,afk->ajf', Qoutput, Koutput) * tf.tile(K.expand_dims(mask_para, axis=1), [1, 64, 1])
        a = tf.matmul(a, Voutput)
        print(a)
        return a

    def compute_mask(self, inputs, mask):
        return mask

    def compute_output_shape(self, input_shape):
        return input_shape[0]

def create_encoder_model():
    word_ids_fr = tf.keras.layers.Input(dtype='int32', shape=(None,))
    a = MyWordEmbedding()([word_ids_fr])
    a = EncoderLayer()([a, K.cast(K.not_equal(0, word_ids_fr), dtype='float32')])
    model = tf.keras.models.Model(inputs=[word_ids_fr], outputs=a)
    return model

def create_model():
    word_ids_en = tf.keras.layers.Input(dtype='int32', shape=(None,))
    a = tf.keras.layers.Input(shape=(None, 512,))
    b = MyWordEmbedding()([word_ids_en])
    b = b + a
    model = tf.keras.models.Model(inputs=[word_ids_en, a], outputs=b)
    return model
    
def evaluate():
    source_sequence_ids = pad_sequences(np.random.randint(5, size=(3, 64)), maxlen=64, padding='pre')
    output = decoder_model.predict([pad_sequences(np.random.randint(5, size=(3, 64)), maxlen=64, padding='post'), encoder_model(source_sequence_ids, training=False)], steps=1, verbose=1, batch_size=256)

decoder_model = create_model()
encoder_model = create_encoder_model()
evaluate()

closed time in 19 days

hoangcuong2011

issue commenttensorflow/tensorflow

Possible bug(?): tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors, but found [<tf.Tensor X>]

Hi @hoangcuong2011 This unfortunately isn't the most helpful of error messages (we can look into trying to provide a more meaningful error message for this in the future once an upcoming refactoring of the internals lands), but it is behaving as expected.

Keras symbolic tensors cannot be passed to a layer's constructor, it is only valid to pass them into a layer's call. Your first example passes the symbolic tensors to the constructor and tries to use them, raising the error message. Your second example makes sure to pass all symbolic tensors as arguments to the layer calls, so it works.

hoangcuong2011

comment created time in 19 days

issue closedtensorflow/tensorflow

Error in TF 2.3.0rc0/1 when mixing eager and non-eager Keras models

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: n/a
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.3.0rc1
  • Python version: 3.6
  • Bazel version (if compiling from source): n/a
  • GCC/Compiler version (if compiling from source): n/a
  • CUDA/cuDNN version: n/a
  • GPU model and memory: n/a

Describe the current behavior

Mixing eager and non-eager models in TensorFlow 2.3.0 results in an error.

Describe the expected behavior

There should be no error, as in TensorFlow<2.3.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np

DO_BUG = True

inputs = tf.keras.Input((1,))
outputs = tf.keras.layers.Dense(10)(inputs)
model0 = tf.keras.Model(inputs=inputs, outputs=outputs)

if DO_BUG:
    with tf.Graph().as_default():
        inputs = tf.keras.Input((1,))
        outputs = tf.keras.layers.Dense(10)(inputs)
        model1 = tf.keras.Model(inputs=inputs, outputs=outputs)

model0.compile(optimizer=tf.optimizers.SGD(0.1), loss=tf.losses.mse)
model0.fit(np.zeros((4, 1)), np.zeros((4, 10)))

Other info / logs

Traceback (most recent call last):
  File ".../tmp.py", line 15, in <module>
    model0.fit(np.zeros((4, 1)), np.zeros((4, 10)))
  File "...\tensorflow\python\keras\engine\training_v1.py", line 807, in fit
    use_multiprocessing=use_multiprocessing)
  File "...\tensorflow\python\keras\engine\training_arrays.py", line 666, in fit
    steps_name='steps_per_epoch')
  File "...\tensorflow\python\keras\engine\training_arrays.py", line 189, in model_iteration
    f = _make_execution_function(model, mode)
  File "...\tensorflow\python\keras\engine\training_arrays.py", line 557, in _make_execution_function
    return model._make_execution_function(mode)
  File "...\tensorflow\python\keras\engine\training_v1.py", line 2072, in _make_execution_function
    self._make_train_function()
  File "...\tensorflow\python\keras\engine\training_v1.py", line 2021, in _make_train_function
    **self._function_kwargs)
  File "...\tensorflow\python\keras\backend.py", line 3933, in function
    'eager execution. You passed: %s' % (updates,))
ValueError: `updates` argument is not supported during eager execution. You passed: [<tf.Operation 'training/SGD/SGD/AssignAddVariableOp' type=AssignAddVariableOp>]

closed time in 21 days

drasmuss

issue commenttensorflow/tensorflow

Error in TF 2.3.0rc0/1 when mixing eager and non-eager Keras models

We've added a clearer error message to the new RC. Mixing eager and non-eager models like this risks putting your models/layers in an invalid state. Sometimes this invalid state would happen to still silently run, and sometimes it would raise unclear errors like this one.

It will now raise an explicit error saying that switching between graph vs eager models invalidates all pre-created models.

drasmuss

comment created time in 21 days

issue commenttensorflow/tensorflow

Keras layer weights/sublayers getting deleted when creating a model with them. model.summary() / plot_model still shows those weights as part of graph though

Update on this issue: We weren't able to get a fix into 2.3 for a variety of reasons, but we were able to add raising a meaningful error message for 2.3 in this setting rather than silently missing some of the weights.

(The proper fix is still in the nightlies guarded by the Functional API refactoring, as mentioned above.)

Santosh-Gupta

comment created time in 22 days

PR opened tensorflow/tensorflow

Raise an error when some but not all values passed to the first layer…

Raise an error when some but not all values passed to the first call arg of a custom layer are symbolic. This setting can cause functional models to be constructed incorrectly.

.This means the setting described in GitHub Issue #40638 will raise an error with an actionable message instead of silently missing weights.

Support for this functionality will be added when we enable the KerasTensors refactoring.

+85 -4

0 comment

2 changed files

pr created time in a month

create barnchtomerk/tensorflow

branch : cherrypicks_DC0YA

created branch time in a month

issue commenttensorflow/tensorflow

Keras layer weights/sublayers getting deleted when creating a model with them. model.summary() / plot_model still shows those weights as part of graph though

Layers in a functional API can only output tensors/data structures of tensors.

As a workaround in the same vein though if you don't want to redesign the underlying layers: You can have a layer contain a nested transformer layer, and pass the input args + Nones to the nested layer. You can then use this layer in a functional model w/o issues.

Alternatively if you're okay trying the tf nightlies, you can experiment with the refactoring I mentioned earlier by directly flipping our internal experimental flag:

from tensorflow.python.keras.engine import keras_tensor
keras_tensor.enable_keras_tensors()

I believe it should fix your issue and it should make the functional api generally much more reliable, but like I said it will only be landing in 2.4.

Santosh-Gupta

comment created time in a month

issue commenttensorflow/tensorflow

Keras layer weights/sublayers getting deleted when creating a model with them. model.summary() / plot_model still shows those weights as part of graph though

Ah sorry by outer functional model I just meant the functional model you are building (as opposed to the layers inside of the model).

Essentially rather than constructing the functional model as: inputs -> layer 1 -> layer 2 -> outputs (which is what should be expected),

it ends up inlining all the contents of layer 1 & layer 2: inputs -> first op in layer 1 -> nested sublayer in layer 1 -> etc. -> first op in layer 2 ... -> outputs And so the model lacks a reference to layer 1 & layer 2 themselves and misses any weights that weren't contained in their subweights.

For clarity, should sending tuples/lists be avoided when sending to layers, or just Nones? Arbitrary data structures should be fine (as should data structures that may or may not contain symbolic values in positional/keyword args outside of the first arg)

What you're experiencing is specifically an issue when whatever data structure is passed to the first positional arg happens to contain any item that is not a symbolic functional input/output.

This specific behavior is a historical edge case dating back to when Keras layers only ever accepted a single positional argument that could not be an arbitrary data structure, and all of the inputs had to be symbolic keras inputs/outputs. Unfortunately it's caused this surprising behavior when combined w/ other functionality that has been added since (automatically turning tf op layers into keras layers). So, historically trying to pass in Nones like you're doing would have triggered a (hard to interpret) error message, because TF/Keras wouldn't be able to inline the tf ops inside the functional model when it calls the layer. Now it silently behaves in a way you didn't expect because tf ops can be used during functional API construction.

Santosh-Gupta

comment created time in a month

issue commenttensorflow/tensorflow

Keras layer weights/sublayers getting deleted when creating a model with them. model.summary() / plot_model still shows those weights as part of graph though

Hi @Santosh-Gupta:

Will reply w/ a longer explanation shortly, but it looks like what's going on is: The layers currently enter a 'functional api construction' mode only if all of the inputs in the first argument come from other Keras layers. However, you have None included in the inputs in the first positional arg, so it's not triggering functional api construction.

That causes the layer to get 'inlined' in the outer functional model rather than correctly included. You should be able to work around this by changing the layer api so Nones should not get passed in.

We have a major cleanup/refactoring of the Functional API mostly done that will fix this & a number of other issues w/ it. But, that will only land in 2.4. It's not immediately obvious if we can squeeze a fix into tf 2.3 as the RC is already out.

Santosh-Gupta

comment created time in a month

create barnchtomerk/tensorflow

branch : cherrypicks_ORV3K

created branch time in a month

PR opened tensorflow/tensorflow

Explicitly raise a (clearer) error message when models end up in inva…

…lid states due to interleaving graph and eager.

In rare cases code may have run w/o crashing when in these invalid states, but it's safer to error with an explanation rather than risk silent failures/fragile behavior.

PiperOrigin-RevId: 321192744 Change-Id: I9e97ac3b7cea27c9b389e5202de9f1c09a4aa2b8

+23 -0

0 comment

2 changed files

pr created time in a month

issue commenttensorflow/tensorflow

Keras predict is slow on first call when using variable input shape

Hi @ironbar are you seeing this with the 2.3 RC?

Keras uses a tf.function to speed up model.predict calls. By default tf.function runs w/ experimental_relax_shapes=False:

https://www.tensorflow.org/api_docs/python/tf/function?version=nightly

This means that the tf.function will retrace every time it sees a new input shape. If experimental_relax_shapes is set to True, tf.function will attempt to make more general traces if it ends up seeing similar but slightly different shapes.

However, in 2.3 Keras should be running with experimental_relax_shapes=True when it wraps the internals in a tf.function.

ironbar

comment created time in a month

push eventtomerk/model-optimization

Tomer Kaftan

commit sha c1690299954dc72d198fa340ec0ff4d4087062cc

Update build files further, split out a pruning_config base class, make tests run in py2 for now while I get my python install fixed

view details

push time in a month

push eventtomerk/model-optimization

Tomer Kaftan

commit sha 3d4c90c298ce36f0d10ff681640da8a26a202c3b

Fix split a base `Pruner` class out of the low magnitude pruner, get the pruner_test running for the low magnitude pruner, start making PrunableModel build

view details

push time in a month

push eventtomerk/model-optimization

Winnie Xu

commit sha f807c7f4f65112017bec1352752c87c2acda1fca

Merge pull request #1 from tensorflow/master Merge upstream

view details

Winnie Xu

commit sha daa03762fc241b89916a7ba7926a3032b8d28220

pruner tests passing

view details

Winnie Xu

commit sha bad40fb905df54d4e7d2e0ab727f1da4a51e169c

Merge pull request #2 from tomerk/tf2-optimizer-prototype merge Tf2 optimizer prototype

view details

Winnie Xu

commit sha f0ae80d195bdc48d3e84dcf8cb13ac95f00b5e99

implement lth and refactor names

view details

Winnie Xu

commit sha e0631d5b41321eb63fb8257cd4ae0c44ee89ccb3

Pruner test WIP

view details

Winnie Xu

commit sha da74c14848f1ff8c7da801f2ae4bc3cf57b32ab1

Add deleted files from pruner commit

view details

Winnie Xu

commit sha c4e6fbc4c2ff2cc57702ee58ce230efa50296c0d

test save weights iter k

view details

Winnie Xu

commit sha b58145cc62cf52ba899bdf528e45ae73f914c7ce

save testReloadWeightsIterationK

view details

Winnie Xu

commit sha 8b9f70d236d7489bd675bb3356274a815207d94a

save WIP

view details

Winnie Xu

commit sha a27225c72aa93e2a23e826b5ecf6b605884f4a8d

tf2.2 error float32 and 64

view details

Winnie Xu

commit sha c07fe58ef1e4c9351bfe9a456d005ae1d7893f63

pruner test save

view details

Winnie Xu

commit sha 58335822a2be23889a603296384bd76efe993f11

updated all tests passing, add tf.function eager

view details

Winnie Xu

commit sha 4e698b3c91333e6c1151f6b6bc1e2072d8e95d17

Resolve all tests and update optimizer

view details

Winnie Xu

commit sha 3cba759fa4599c3bbeba834d3eb69ddc776a5256

fix tests passing

view details

Tomer Kaftan

commit sha 18d3355a9202510dfb67e99b7b3c1da10cadb271

Merge pull request #2 from xwinxu/tf2first LTHPruner implementation and tests

view details

push time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

 def _create_slots(self, var_list):         pruner.create_slots(self, var)    def _resource_apply_dense(self, grad, var, apply_state):+    self.preprocess_weights(self, var, grad)

(since we may modify the gradient)

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  +  def testReloadWeightsIterationK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)++    _train(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    tf.function(_train)(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)+++  def testReloadTwoTimes(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_first_saved_initialization = None+      expected_second_saved_initialization = None+      p.create_slots(optimizer, weight)+      for i in range(0, 13): # this should save, reload, and update exactly twice+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round - 1:+          expected_first_saved_initialization = optimizer.get_slot(weight, "original_initialization")+        if i == (save_round - 1) * 2:+          expected_second_saved_initialization = optimizer.get_slot(weight, "original_initialization")+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_first_saved_initialization, expected_second_saved_initialization++    expected_first_saved_initialization, expected_second_saved_initialization = _train(weight)++    self.assertAllEqual(expected_first_saved_initialization, expected_second_saved_initialization)++    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_first_saved_initialization)

Still need to add this check for the weights right after reloading like how you do above

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

 def _create_slots(self, var_list):         pruner.create_slots(self, var)    def _resource_apply_dense(self, grad, var, apply_state):+    self.preprocess_weights(self, var, grad)     self._optimizer._resource_apply_dense(grad, var, apply_state)-    self.prune(var, grad)+    self.postprocess_weights(self, var, grad)    def _resource_apply_sparse(self, grad, var, indices, **kwargs):+    self.preprocess_weights(self, var, grad)

grad = ...

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

 def _create_slots(self, var_list):         pruner.create_slots(self, var)    def _resource_apply_dense(self, grad, var, apply_state):+    self.preprocess_weights(self, var, grad)

grad = self.preprocess_weights(self, var, grad)

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

 def testReloadWeightsatInitialization(self):         block_size=self.block_size,         block_pooling_type=self.block_pooling_type) -    optimizer = self.dummy_optimizer+    optimizer =   tf.keras.optimizers.SGD(learning_rate=0.01)     optimizer.iterations.assign(0)     expected_saved_initialization = None -    def _train(weight):+    def _train(optimizer, weight):       expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      expected_reload = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1       p.create_slots(optimizer, weight)       for i in tf.range(0, 2):         p.preprocess_weights(optimizer, weight, self.grad(weight))         if i == save_round:           expected_saved_initialization = weight.read_value()         weight.assign(tf.math.add(weight, sample_noise(i)))         p.postprocess_weights(optimizer, weight, self.grad(weight))+        should_reload = p._reload_schedule._should_prune_in_step(optimizer.iterations,

As written this is checking implementation, not behavior. (It's checking a bunch of private attributes of the pruner and the pruning schedule)

can we make this test have a reload_step specified at the start w/ save_step, and we directly compare it to i like how we do for save round?

xwinxu

comment created time in a month

MemberEvent
MemberEvent

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)

& Apply this to another tests I didn't put this comment this on.

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  +  def testReloadWeightsIterationK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)++    _train(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    tf.function(_train)(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)+++  def testReloadTwoTimes(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_first_saved_initialization = None+      expected_second_saved_initialization = None+      p.create_slots(optimizer, weight)+      for i in range(0, 13): # this should save, reload, and update exactly twice+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round - 1:+          expected_first_saved_initialization = optimizer.get_slot(weight, "original_initialization")+        if i == (save_round - 1) * 2:+          expected_second_saved_initialization = optimizer.get_slot(weight, "original_initialization")+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_first_saved_initialization, expected_second_saved_initialization++    expected_first_saved_initialization, expected_second_saved_initialization = _train(weight)++    self.assertAllEqual(expected_first_saved_initialization, expected_second_saved_initialization)++    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_first_saved_initialization)

Lets compare to what the masked values were at the time, instead of just against the initializations.

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  +  def testReloadWeightsIterationK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)++    _train(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    tf.function(_train)(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)+++  def testReloadTwoTimes(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_first_saved_initialization = None+      expected_second_saved_initialization = None+      p.create_slots(optimizer, weight)+      for i in range(0, 13): # this should save, reload, and update exactly twice+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round - 1:

Let's capture the weight after post_process weights (when it will be reloaded + masked).

(assuming this is the correct step to check against? it might not be?)

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  +  def testReloadWeightsIterationK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):

Let's iterate for > 7 rounds, but make sure to capture the step right after reloading.

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  +  def testReloadWeightsIterationK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)++    _train(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    tf.function(_train)(weight)+    masked_orig_init = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, 'mask'))+    self.assertAllEqual(masked_orig_init, weight)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)+++  def testReloadTwoTimes(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_first_saved_initialization = None+      expected_second_saved_initialization = None+      p.create_slots(optimizer, weight)+      for i in range(0, 13): # this should save, reload, and update exactly twice

iterate for more rounds, just make sure to capture after reloading.

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):

Nit: capitalize the at

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)

Let's make sure to create a brand-new optimizer (w/o any slot variables yet) instead of just setting the step count back to zero.

Make _train take the optimizer as an argument.

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    initialization_slot = optimizer.get_slot(weight, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  +  def testReloadWeightsIterationK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round + 1, end_iter, frequency)

Let's add a comment explicitly specifying what step # this will prune & reload at. I'm guessing it's the second arg, but it's not immediately obvious from this pruning schedule definition.

xwinxu

comment created time in a month

Pull request review commenttomerk/model-optimization

LTHPruner implementation and tests

+# Copyright 2019 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+"""Tests for the key functions in lthpruner library."""++from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++# import g3++from absl.testing import parameterized+import numpy as np+import tensorflow as tf+print("Running TF", tf.__version__)++# TODO(b/139939526): move to public API.+from tensorflow.python.keras import keras_parameterized+from tensorflow_model_optimization.python.core.keras import compat+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_schedule+from tensorflow_model_optimization.python.core.sparsity.keras import pruning_utils+from tensorflow_model_optimization.python.core.sparsity_tf2 import lthpruner as pruner++dtypes = tf.dtypes+test = tf.test++def get_lth_sparsity(save_round, n_rounds, target_sparsity, end_epoch):+  """+  save_round: when to save the weight; 1 onwards+  n_rounds: number of pruning cycles to do+  target_sparsity: the sparsity percentage to achieve by the end of the iteration+  end_epoch: total number of epochs to train for where pruning is eligible++  Returns:+    percent to prune to after each cyle/round+  """+  # no pruning until weights are saved TODO: off by one??+  n_rounds = tf.constant(n_rounds, dtype='float32') # dtype='float32'+  frequency = tf.math.floordiv(end_epoch - save_round + 1, n_rounds) # range(0, end, freq)+  prune_ratio_per_round = tf.math.pow(target_sparsity, tf.math.divide(1, n_rounds))+  return tf.cast(frequency, tf.int64), prune_ratio_per_round++def make_pruning_schedule(target_sparsity, begin, end, freq):+  return pruning_schedule.ConstantSparsity(target_sparsity, begin, end, freq)++def sample_noise(x, mu=0, sigma=1.):+  sample = tf.random.normal((), mean=mu,  stddev=sigma, dtype=tf.float64)+  return sample++def _dummy_gradient(x, dtype=tf.float32):+  try:+    base_type = x.dtype+  except:+    base_type = dtype+  grad = tf.ones_like(x, dtype=base_type)+  return grad++class PruningTest(test.TestCase, parameterized.TestCase):++  def setUp(self):+    super(PruningTest, self).setUp()+    self.block_size = (1, 1)+    self.block_pooling_type = "AVG"+    self.target_sparsity = 0.5+    self.constant_sparsity = pruning_schedule.ConstantSparsity(self.target_sparsity, 0, 100, 1)+    self.save_init = tf.Variable(0)+    self.save_itr_10 = 10+    self.dummy_optimizer = tf.keras.optimizers.SGD(learning_rate=0.01)+    self.grad = _dummy_gradient++  # Variable initialization outside of setUp() is needed for compatibility with+  # run_all_keras_modes.+  #+  # setUp() lies outside of the "eager scope" that wraps the test cases+  # themselves, resulting in initializing graph tensors instead of eager+  # tensors when testing eager execution.++  def testUpdateSingleMask(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)++    mask_before_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_before_pruning), 100)++    next_step = optimizer.iterations.assign_add(1)+    p.update_masks(pruning_vars, next_step)++    mask_after_pruning = mask.read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 50)++  def testConstructsMaskAndThresholdCorrectly(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    p = pruner.LTHPruner(+        # Sparsity math often returns values with small tolerances.+        pruning_schedule=lambda x: (True, 0.200000018),+        save_iteration=self.save_init,+        block_size=(1, 1), block_pooling_type=None)+        +    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    # input matrix is [ 1.0, 2.0, ..., 8.0, 9.0, 10.0 ]+    threshold, mask = p._update_mask(step, np.arange(1, 11))++    self.assertEqual(3, threshold)+    self.assertAllEqual(+        # expected matrix is [ 0.0, 0.0, 1.0, 1.0 ... 1.0 ]+        np.concatenate((np.zeros(2), np.ones(8))), mask)++  def _blockMasking(self, block_size, block_pooling_type, weight,+                    expected_mask):+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight.dtype),+        name="mask",+        dtype=weight.dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight.dtype), name="threshold", dtype=weight.dtype)++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=self.constant_sparsity,+        save_iteration=self.save_init,+        block_size=block_size,+        block_pooling_type=block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    _, new_mask = p._maybe_update_block_mask(step, weight)+    # Check if the mask is the same size as the weights+    self.assertAllEqual(new_mask.get_shape(), weight.get_shape())+    mask_after_pruning = new_mask+    self.assertAllEqual(mask_after_pruning, expected_mask)++  def testBlockMaskingAvg(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    weight = tf.Variable([[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                          [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4, 0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingMax(self):+    block_size = (2, 2)+    block_pooling_type = "MAX"+    weight = tf.Variable([[0.1, 0.0, 0.2, 0.0], [0.0, -0.1, 0.0, -0.2],+                                   [0.3, 0.0, 0.4, 0.0], [0.0, -0.3, 0.0,+                                                          -0.4]])+    expected_mask = [[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                     [1., 1., 1., 1.], [1., 1., 1., 1.]]++    self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testBlockMaskingWithHigherDimensionsRaisesError(self):+    block_size = (2, 2)+    block_pooling_type = "AVG"+    # Weights as in testBlockMasking, but with one extra dimension.+    weight = tf.Variable([[[0.1, 0.1, 0.2, 0.2], [0.1, 0.1, 0.2, 0.2],+                                    [0.3, 0.3, 0.4, 0.4], [0.3, 0.3, 0.4,+                                                           0.4]]])+    expected_mask = [[[0.0, 0.0, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0],+                      [1., 1., 1., 1.], [1., 1., 1., 1.]]]++    # Block masking should only be used with 2 Dimensional weights.+    with self.assertRaises(ValueError):+      self._blockMasking(block_size, block_pooling_type, weight, expected_mask)++  def testConditionalMaskUpdate(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    mask = tf.Variable(+        tf.ones(weight.get_shape(), dtype=weight_dtype),+        name="mask",+        dtype=weight_dtype)+    threshold = tf.Variable(+        tf.zeros([], dtype=weight_dtype), name="threshold", dtype=weight_dtype)+    pruning_vars = [(weight, mask, threshold)]++    def linear_sparsity(step):+      sparsity_val = tf.convert_to_tensor(+          [0.0, 0.1, 0.1, 0.3, 0.3, 0.5, 0.5, 0.5, 0.5, 0.5])+      return tf.convert_to_tensor(True), sparsity_val[step]++    def weight_mask_op(pruning_vars):+      values_and_vars = []+      for weight, mask, _ in  pruning_vars:+        weight.assign(tf.math.multiply(weight, mask))++    # Set up pruning+    p = pruner.LTHPruner(+        pruning_schedule=linear_sparsity,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    p.create_slots(optimizer, weight)+    step = optimizer.iterations++    non_zero_count = []+    for _ in range(10):+      step = optimizer.iterations+      p.update_masks(pruning_vars, step)+      weight_mask_op(pruning_vars)+      optimizer.iterations.assign_add(1)++      non_zero_count.append(tf.math.count_nonzero(weight))++    # Weights pruned at steps 1,3,5+    expected_non_zero_count = [100, 90, 90, 70, 70, 50, 50, 50, 50, 50]+    self.assertAllEqual(expected_non_zero_count, non_zero_count)+++  def testSaveOriginalInitializations(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    weight_dtype = weight.dtype.base_dtype+    save_round = 0+    n_rounds = 5+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(prune_ratio_per_round, 0, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    init_weights_before_pruning = weight.read_value()++    def _train():+      p.create_slots(optimizer, weight)+      p.preprocess_weights(optimizer, weight, self.grad(weight))+      weight.assign(tf.math.add(weight, sample_noise(0))) # update weights+      p.postprocess_weights(optimizer, weight, self.grad(weight))+      optimizer.iterations.assign_add(1)+      initialization_slot = optimizer.get_slot(weight, "original_initialization")+      return initialization_slot+    +    initialization_slot = _train()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+    +    initialization_slot = tf.function(_train)()+    self.assertAllEqual(initialization_slot, init_weights_before_pruning)+++  def testSaveWeightsIterK(self):+    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 5+    n_rounds = 24+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=save_round,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in range(7):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if optimizer.iterations == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1)+      return expected_saved_initialization+      +    expected_saved_initialization = _train(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)+    expected_saved_initialization = tf.function(_train)(weight)+    self.assertAllEqual(optimizer.get_slot(weight, "original_initialization"), expected_saved_initialization)++    initialization_slot_k = tf.math.multiply(optimizer.get_slot(weight, "original_initialization"), optimizer.get_slot(weight, "mask"))+    masked_expected = tf.math.multiply(expected_saved_initialization, optimizer.get_slot(weight, "mask"))+    self.assertAllEqual(initialization_slot_k, masked_expected)++    mask_after_pruning = optimizer.get_slot(weight, "mask").read_value()+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++  def testReloadWeightsatInitialization(self):+    weight1 = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")++    save_round = 0+    n_rounds = 20+    end_iter = 100+    frequency, prune_ratio_per_round = get_lth_sparsity(save_round, n_rounds, self.target_sparsity, end_iter)+    pruning_schedule = make_pruning_schedule(1 - prune_ratio_per_round, save_round, end_iter, frequency)+    +    p = pruner.LTHPruner(+        pruning_schedule=pruning_schedule,+        save_iteration=self.save_init,+        block_size=self.block_size,+        block_pooling_type=self.block_pooling_type)++    optimizer = self.dummy_optimizer+    optimizer.iterations.assign(0)+    expected_saved_initialization = None++    def _train(weight):+      expected_saved_initialization = tf.ones_like(weight, dtype=weight.dtype.base_dtype) * -1+      p.create_slots(optimizer, weight)+      for i in tf.range(0, 2):+        p.preprocess_weights(optimizer, weight, self.grad(weight))+        if i == save_round:+          expected_saved_initialization = weight.read_value()+        weight.assign(tf.math.add(weight, sample_noise(i)))+        p.postprocess_weights(optimizer, weight, self.grad(weight))+        optimizer.iterations.assign_add(1) # save weights right before iteration+      return expected_saved_initialization++    expected_saved_initialization = _train(weight1)+    initialization_slot = optimizer.get_slot(weight1, "original_initialization")+    self.assertAllEqual(initialization_slot, expected_saved_initialization)++    mask_after_pruning = optimizer.get_slot(weight1, "mask").read_value()+    masked_weight_expected = tf.math.multiply(initialization_slot, mask_after_pruning)+    self.assertAllEqual(np.count_nonzero(masked_weight_expected), 97)+    self.assertAllEqual(np.count_nonzero(mask_after_pruning), 97)++    weight = tf.Variable(np.linspace(1.0, 100.0, 100), name="weights")+    optimizer.iterations.assign(0)

Same deal here, make a new optimizer

xwinxu

comment created time in a month

more