profile
viewpoint

keras-team/keras 48920

Deep Learning for humans

keras-team/keras-tuner 1989

Hyperparameter tuning for humans

omalleyt12/speech_commands 1

kaggle speech commands competition

omalleyt12/tensorflow 1

An Open Source Machine Learning Framework for Everyone

omalleyt12/community 0

Stores documents used by the TensorFlow developer community

omalleyt12/elpy 0

Emacs Python Development Environment

omalleyt12/hdi_pip_scripts 0

Scripts for installing Python packages on HDInsight

omalleyt12/keras 0

Deep Learning for humans

omalleyt12/keras-tuner 0

Hyperparameter tuning for humans

omalleyt12/Mask_RCNN 0

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pull request review commentkeras-team/keras-tuner

stop server after finish

 def start_server(oracle):     port = os.environ['KERASTUNER_ORACLE_PORT']     server = grpc.server(         futures.ThreadPoolExecutor(max_workers=1))+    oracle_servicer = OracleServicer(oracle)     service_pb2_grpc.add_OracleServicer_to_server(-        OracleServicer(oracle), server)+        oracle_servicer, server)     server.add_insecure_port('{}:{}'.format(ip_addr, port))     server.start()     while True:         # The server does not block otherwise.-        time.sleep(10)+        time.sleep(30)++        if oracle_servicer.stop_triggered:+            while oracle.ongoing_trials:+                print(f'Stop is triggered. Remaining open trials: {oracle.ongoing_trials}.')

Maybe we should remove this to avoid printing a lot of lines

yixingfu

comment created time in 16 days

push eventkeras-team/keras-tuner

yixingfu

commit sha 798b6da010ac8b9e2b07aae24ffb47a2f5fe55ea

show only active parameters in summary (#322) * add test for merge inactive hp * add condition to Fixed.to_config * check hp (not scope) cond. on reg. * add examples to use conditional s.t. only relevant hps are in summary

view details

push time in 16 days

PR merged keras-team/keras-tuner

show only active parameters in summary

Trying to fix #321.

Inactive parameters show up in summary because in _register(), the hp.conditions is not checked before adding the name-value pair to hps.values dictionary. Hence all hyperparameters, active or not, are in hps.values. Usually _register() is called by _retrieve() who already test the conditions, but when calling merge() the conditions are not tested.

A test on merge for inactive hyperparameter is added to unit test.

When fixing above problem it emerges that Fixed values do not carry condition in get_config() as other subclasses of HyperParameter, which causes Fixed to lost its conditions when deep copying hp in _register. This is added to get_config method in Fixed.

I also added the code in #321 into helloworld example as case 8 and 9 to demonstrate how to get a summary that show correctly, since the mismatch in hyperparameters has been asked by lots of people.

+105 -2

0 comment

3 changed files

yixingfu

pr closed time in 16 days

issue closedkeras-team/keras-tuner

Summary is showing inactive hyperparameters

Consider following example modified from base case in helloworld.py. By specifying conditional scope explicitly such that the hyperparameter is used iff it is active, I would expect the summary to correctly print out only relevant units hyperparameters.

def build_model(hp):                                                                                                                                                                       
    model = keras.Sequential()                                                                                                                                                             
    model.add(layers.Flatten(input_shape=(28, 28)))                                                                                                                                        
    min_layers = 2                                                                                                                                                                         
    max_layers = 5                                                                                                                                                                         
    for i in range(hp.Int('num_layers', min_layers, max_layers)):                                                                                                                          
        with hp.conditional_scope('num_layers', list(range(i + 1, max_layers + 1))):                                                                                                       
            model.add(layers.Dense(units=hp.Int('units_' + str(i), 32, 256, 32),                                                                                                           
                                   activation='relu'))                                                                                                                                   
    model.add(layers.Dense(10, activation='softmax'))                                                                                                                                      
    model.compile(                                                                                                                                                                         
        optimizer=keras.optimizers.Adam(1e-4),                                                                                                                                             
        loss='sparse_categorical_crossentropy',                                                                                                                                            
        metrics=['accuracy'])                                                                                                                                                              
    return model                                                                                                                                                                           
                                                                                                                                                                                           
                                                                                                                                                                                           
tuner = RandomSearch(                                                                                                                                                                      
    build_model,                                                                                                                                                                           
    objective='val_accuracy',                                                                                                                                                              
    max_trials=10,                                                                                                                                                                         
    executions_per_trial=3,                                                                                                                                                                
    directory='test_dir')                                                                                                                                                                  
                                                                                                                                                                                           
tuner.search_space_summary()                                                                                                                                                               
                                                                                                                                                                                           
tuner.search(x=x,                                                                                                                                                                          
             y=y,                                                                                                                                                                          
             epochs=3,                                                                                                                                                                     
             validation_data=(val_x, val_y))                                                                                                                                               
                                                                                                                                                                                           
tuner.results_summary()

However, I still get all hyperparameters created along the search on the summary.

closed time in 16 days

yixingfu

Pull request review commentkeras-team/keras-tuner

Allow save_freq keyword for search

 def on_epoch_end(self, trial, model, epoch, logs=None):         pass      def run_trial(self, trial, *fit_args, **fit_kwargs):+        save_freq = 'epoch'+        if 'save_freq' in fit_kwargs:+            save_freq = int(fit_kwargs.pop('save_freq'))+         model_checkpoint = keras.callbacks.ModelCheckpoint(

Hm maybe instead, we could check to see if the user passed their own ModelCheckpoint object? If they did, instead of creating our own, we could patch the save directory on that object so that it saves to a different directory each time, and use that?

We do something similar for TensorBoard here: https://github.com/keras-team/keras-tuner/blob/master/kerastuner/engine/tuner.py#L272

yixingfu

comment created time in 17 days

Pull request review commentkeras-team/keras-tuner

stop server after finish

 def start_server(oracle):     port = os.environ['KERASTUNER_ORACLE_PORT']     server = grpc.server(         futures.ThreadPoolExecutor(max_workers=1))+    oracle_servicer = OracleServicer(oracle)     service_pb2_grpc.add_OracleServicer_to_server(-        OracleServicer(oracle), server)+        oracle_servicer, server)     server.add_insecure_port('{}:{}'.format(ip_addr, port))     server.start()     while True:         # The server does not block otherwise.-        time.sleep(10)+        time.sleep(30)++        if oracle_servicer.stop_triggered:+            while oracle.ongoing_trials:+                print(f'Stop is triggered. Remaining open trials: {oracle.ongoing_trials}.')+                time.sleep(10)++            print('Exiting in 10s.')+            server.stop(10)+            break

Should we raise an error after stopping the Oracle server?

The reason is, we expect users to write their code like:

tuner = Tuner(...). # Runs an Oracle server if KERASTUNER_TUNER_ID='chief' environment variable is set
tuner.search(...)

Because of this, if the Oracle server stops, the tuner will try and run tuner.search but it won't work

Maybe we should raise a StopIterationError to prevent this? It will have the unfortunate side effect of having the chief oracle job marked as failure though, so i'm not sure. Wdyt?

yixingfu

comment created time in 17 days

push eventkeras-team/keras-tuner

yixingfu

commit sha 2f44f35ab1f8c4923600e8f7bd94cab24cede689

minor fix on readme and example (#327) * use EPOCH const for epoch * fix examples: num_classes to classes

view details

push time in 17 days

PR merged keras-team/keras-tuner

minor fix on readme and example

In readme and examples, hypermodels in applications are called with num_classes while the actual keywords used are classes. To be consistent I changed all occurance of num_classes in readme to classes.

Also in the example, EPOCH constant is created but not used.

+13 -13

0 comment

3 changed files

yixingfu

pr closed time in 17 days

push eventkeras-team/keras-tuner

Blake

commit sha c23537d4bddd6cefb73b53d88dacaad1b48fc06f

Fixed hyperparameter handles boolean value (#306)

view details

push time in 17 days

PR merged keras-team/keras-tuner

Fixed hyperparameter handles boolean value

resolves #305

Boolean is a subclass of int, it should be checked before integer types.

>>> int.__subclasses__()
[<class 'bool'>]
+23 -1

0 comment

2 changed files

blakey22

pr closed time in 17 days

issue closedkeras-team/keras-tuner

Fixed hyperparameter treats True/False as 0/1

I found Fixed hyperparameter always converts a boolean value to an integer. It also happens while accessing .values fields.

import kerastuner as kt

hp = kt.HyperParameters()
fixed = hp.Fixed('fixed', True)

assert fixed is True
assert hp.values['fixed'] is True

closed time in 17 days

blakey22

PR merged keras-team/keras-tuner

Add metric tracking and logging

Improves usability by tracking and displaying metrics such as the best score and elapsed time during a hyperparameter search.

+114 -511

0 comment

9 changed files

Aviously

pr closed time in 17 days

issue commenttensorflow/tensorflow

[Regression] on_train_batch_begin callbacks with no batch number and size

@KTTrev Thanks for the issue!

on_train_batch_begin no longer receives any keys in the logs, could you explain your use case? I can advise on workarounds

KTTrev

comment created time in 18 days

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def on_epoch_end(self, epoch, logs=None): # TODO: Add more extensive display. class Display(object): -    def __init__(self, verbose=1):+    def __init__(self, oracle, verbose=1):         self.verbose = verbose+        self.oracle = oracle      def on_trial_begin(self, trial):         if self.verbose >= 1:-            display.section('Starting new trial')+            print()+            trial_number = self.oracle.get_trial_number(trial)+            total_trials = self.oracle.max_trials or '?'+            print('Search: Running Trial {}/{}'.format(trial_number, total_trials))+            print()++            self.trial_start = time.time()++            template = "{0:20}|{1:10}|{2:20}"+            best_trials = self.oracle.get_best_trials()+            if len(best_trials) > 0:+                best_trial = best_trials[0]+            else:+                best_trial = None+            print(template.format('Hyperparameter', 'Value', 'Best Value So Far'))+            if trial.hyperparameters.values:+                for hp, value in trial.hyperparameters.values.items():+                    best_value = str(best_trial.hyperparameters.values.get(hp)) if best_trial else '?'+                    print(template.format(hp, str(value), best_value))+            else:+                print('default configuration')+            print()      def on_trial_end(self, trial):         if self.verbose >= 1:-            display.section('Trial complete')-            trial.summary()+            if IS_NOTEBOOK:+                display.clear_output()+            else:+                print() # Separate with a newline++            trial_number = self.oracle.get_trial_number(trial)+            total_trials = self.oracle.max_trials or '?'++            time_taken_str = self.format_time(time.time() - self.trial_start)+            print('Trial {}/{} Complete [{}]'.format(trial_number, total_trials, time_taken_str))++            if trial.score is not None:+                print('{}: {}'.format(self.oracle.objective.name, trial.score))++            best_trials = self.oracle.get_best_trials()+            if len(best_trials) > 0:+                best_score = best_trials[0].score+            else:+                best_score = None+            print('Best {} So Far: {}'.format(self.oracle.objective.name, best_score))++            time_remaining = self.oracle.get_time_remaining()

Let's just show the time the last trial took, and the total time since starting. Getting the total number of trials remaining from the chief Oracle adds too much complexity IMO

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def _random_values(self):             break         return values +    def get_trial_number(self, trial):

Same here

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def on_epoch_end(self, epoch, logs=None): # TODO: Add more extensive display. class Display(object): -    def __init__(self, verbose=1):+    def __init__(self, oracle, verbose=1):         self.verbose = verbose+        self.oracle = oracle      def on_trial_begin(self, trial):         if self.verbose >= 1:-            display.section('Starting new trial')+            print()+            trial_number = self.oracle.get_trial_number(trial)+            total_trials = self.oracle.max_trials or '?'+            print('Search: Running Trial {}/{}'.format(trial_number, total_trials))+            print()++            self.trial_start = time.time()++            template = "{0:20}|{1:10}|{2:20}"+            best_trials = self.oracle.get_best_trials()+            if len(best_trials) > 0:+                best_trial = best_trials[0]+            else:+                best_trial = None+            print(template.format('Hyperparameter', 'Value', 'Best Value So Far'))+            if trial.hyperparameters.values:+                for hp, value in trial.hyperparameters.values.items():+                    best_value = str(best_trial.hyperparameters.values.get(hp)) if best_trial else '?'+                    print(template.format(hp, str(value), best_value))+            else:+                print('default configuration')+            print()      def on_trial_end(self, trial):         if self.verbose >= 1:-            display.section('Trial complete')-            trial.summary()+            if IS_NOTEBOOK:+                display.clear_output()+            else:+                print() # Separate with a newline++            trial_number = self.oracle.get_trial_number(trial)+            total_trials = self.oracle.max_trials or '?'++            time_taken_str = self.format_time(time.time() - self.trial_start)+            print('Trial {}/{} Complete [{}]'.format(trial_number, total_trials, time_taken_str))++            if trial.score is not None:

Let's move this below the "Best so far" part, and maybe call it something like "Last val_accuracy:"

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def on_epoch_end(self, epoch, logs=None): # TODO: Add more extensive display. class Display(object): -    def __init__(self, verbose=1):+    def __init__(self, oracle, verbose=1):         self.verbose = verbose+        self.oracle = oracle      def on_trial_begin(self, trial):         if self.verbose >= 1:-            display.section('Starting new trial')+            print()+            trial_number = self.oracle.get_trial_number(trial)+            total_trials = self.oracle.max_trials or '?'+            print('Search: Running Trial {}/{}'.format(trial_number, total_trials))

Let's just show the number of Trials this worker has complete, by keeping a running tally in tuner_utils.Display

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def results_summary(self, num_trials=10):         Args:             num_trials (int, optional): Number of trials to display.                 Defaults to 10.-            sort_metric (str, optional): Sorting metric, when not specified-                sort models by objective value. Defaults to None.         """         display.section('Results summary')

Discussed offline, let's delete display and replace with print statements

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def __getattr__(self, name):             'objective',             'max_trials',             'allow_new_entries',-            'tune_new_entries'}+            'tune_new_entries',+            'get_trial_number',

Discussed offline, let's not attempt to track trials remaining right now given that this will involve more distributed work

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def __init__(self,          # trial_id -> Trial         self.trials = {}+        # trial_id -> Trial Number+        self.trial_number = {}

Let's remove these changes as we can hold off on the ETA display for now

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

   # Check if we are in a ipython/colab environement-try:

Let's try and delete this whole file and replace usages of display with simple print statements

We can move the IPython checking logic to kerastuner/utils.py

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 import math import numpy as np import six+import time  import tensorflow as tf from tensorflow import keras  from ..abstractions import display +IS_NOTEBOOK = display.is_notebook()+if IS_NOTEBOOK:+    from IPython import display

This seems to be overriding the same display from above

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def __getattr__(self, name):             'objective',             'max_trials',             'allow_new_entries',-            'tune_new_entries'}+            'tune_new_entries',+            'get_trial_number',

If we end up allowing access here, we'll have to add Protobuf methods here:

https://github.com/keras-team/keras-tuner/blob/master/kerastuner/protos/service.proto

Let's discuss offline

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

   # Check if we are in a ipython/colab environement-try:-    class_name = get_ipython().__class__.__name__-    if "Terminal" in class_name:-        IS_NOTEBOOK = False-    else:-        IS_NOTEBOOK = True--except NameError:-    IS_NOTEBOOK = False+def is_notebook():+    try:+        class_name = get_ipython().__class__.__name__+        if "Terminal" in class_name:+            return False+        else:+            return True++    except NameError:+        return False+IS_NOTEBOOK = is_notebook()

Should add two blank lines before IS_NOTEBOOK (2 lines b/t top-level definitions is standard)

Aviously

comment created time in a month

Pull request review commentkeras-team/keras-tuner

Add metric tracking and logging

 def results_summary(self, num_trials=10):         Args:             num_trials (int, optional): Number of trials to display.                 Defaults to 10.-            sort_metric (str, optional): Sorting metric, when not specified-                sort models by objective value. Defaults to None.         """         display.section('Results summary')

Does this still work?

Aviously

comment created time in a month

push eventkeras-team/keras-tuner

Makoto Uchida

commit sha 8fc1694a95b9ed21926e52e557c5e71813b406cf

Pass down tuner_id from Tuner class constractor (#269)

view details

push time in a month

PR merged keras-team/keras-tuner

Pass down tuner_id from Tuner class constructor

There was an unused, undocumented named constructor tuner_id arg to Tuner class. I believe this was meant to override tuner_id, if given.

+3 -0

1 comment

1 changed file

ucdmkt

pr closed time in a month

push eventkeras-team/keras-tuner

Haifeng Jin

commit sha 405ecb79c42c2bc25de1864763da2cb8c0577fed

lock sphinx version (#319)

view details

push time in a month

PR merged keras-team/keras-tuner

lock sphinx version

lock the sphinx version to 3.0.4. The 3.1.0 version will crash during the test.

+1 -0

0 comment

1 changed file

haifeng-jin

pr closed time in a month

push eventkeras-team/keras-tuner

Haifeng Jin

commit sha 2fc8d8829a70d723fe3a3ff7e05242fafed986fa

do not delete best epoch checkpoint (#318)

view details

push time in a month

pull request commentkeras-team/keras-tuner

do not delete best epoch checkpoint

Thanks for the PR! Merging now

haifeng-jin

comment created time in a month

issue commenttensorflow/tensorflow

AttributeError: 'Tensor' object has no attribute '_datatype_enum'

@tomerk this is likely something KerasTensors fixes

nbro

comment created time in a month

pull request commentkeras-team/keras-tuner

Fix #284. Parent condition should not be checked.

@yixingfu Thanks for the PR!

Could you provide some more context on what issue this is solving?

yixingfu

comment created time in a month

Pull request review commentkeras-team/keras-io

metric learning eg

+"""+Title: Metric learning using crossentropy+Author: [Mat Kelcey](https://twitter.com/mat_kelcey)+Date created: 2020/06/05+Last modified: 2020/06/05+Description: Example of using metric learning using crossentropy on synthetic data.+"""+"""+## Overview++Metric learning aims to train models that can embed inputs into a high-dimensional space+such that "similar" inputs, as defined by the training scheme, are located close to each+other. These models once trained can produce embeddings for downstream systems where such+similarity is useful; examples include as a ranking signal for search or as a form of+pretrained embedding model for another supervised problem.++For a more detailed overview of metric learning see:++* [What is metric learning?](http://contrib.scikit-learn.org/metric-learn/introduction.html)+* ["Using crossentropy for metric learning" tutorial](https://www.youtube.com/watch?v=Jb4Ewl5RzkI)+"""++"""+## Setup+"""++import random+import matplotlib.pyplot as plt+import numpy as np+import tensorflow as tf+from collections import defaultdict+from PIL import Image+from sklearn.metrics import ConfusionMatrixDisplay+from tensorflow import keras+from tensorflow.keras import layers++"""+## Dataset++For this example we will be using the+[CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset.+"""++from tensorflow.keras.datasets import cifar10++(x_train, y_train), (x_test, y_test) = cifar10.load_data()++x_train = x_train.astype("float32") / 255.0+y_train = np.squeeze(y_train)+x_test = x_test.astype("float32") / 255.0+y_test = np.squeeze(y_test)++x_train.shape, y_train.shape, x_test.shape, y_test.shape++"""+To get a sense of the dataset we can visualise a grid of 25 random examples.+++"""++height_width = 32+++def show_collage(examples):+    box_size = height_width + 2+    num_rows, num_cols = examples.shape[:2]++    collage = Image.new(+        mode="RGB",+        size=(num_cols * box_size, num_rows * box_size),+        color=(250, 250, 250),+    )+    for row_idx in range(num_rows):+        for col_idx in range(num_cols):+            array = (np.array(examples[row_idx, col_idx]) * 255).astype(np.uint8)+            collage.paste(+                Image.fromarray(array), (col_idx * box_size, row_idx * box_size)+            )++    # Double size for visualisation.+    collage = collage.resize((2 * num_cols * box_size, 2 * num_rows * box_size))+    return collage+++# Show a collage of 5x5 random images.+sample_idxs = np.random.randint(0, 50000, size=(5, 5))+examples = x_train[sample_idxs]+show_collage(examples)++"""+Metric learning provides training data not as explicit `(X, y)` pairs but instead uses+multiple instances that are related in the way we want to express similarity. In our+example we will use instances of the same class to represent similarity; a single+training instance will not be one image, but a pair of images of the same class. When+referring to the images in this pair we'll use the common metric learning names of the+`anchor` (a randomly chosen image) and the `positive` (another randomly chosen image of+the same class).+"""++"""+To facilitate this we need to build a form of lookup that maps from classes to the+instances of that class. When generating data for training we will sample from this+lookup.+"""++class_idx_to_train_idxs = defaultdict(list)+for y_train_idx, y in enumerate(y_train):+    class_idx_to_train_idxs[y].append(y_train_idx)++class_idx_to_test_idxs = defaultdict(list)+for y_test_idx, y in enumerate(y_test):+    class_idx_to_test_idxs[y].append(y_test_idx)++"""+For this example we are using the simplest approach to training; a batch will consist of+`(anchor, positive)` pairs spread across the classes. The goal of learning will be to+move the anchor and positive pairs closer together and further away from other instances+in the batch. In this case the batch size will be dictated by the number of classes; for+CIFAR-10 this is 10.+"""++num_classes = 10+++class AnchorPositivePairs(keras.utils.Sequence):+    def __init__(self, num_batchs):+        self.num_batchs = num_batchs++    def __len__(self):+        return self.num_batchs++    def __getitem__(self, _idx):+        x = np.empty((2, num_classes, height_width, height_width, 3), dtype=np.float32)+        for class_idx in range(num_classes):+            examples_for_class = class_idx_to_train_idxs[class_idx]+            anchor_idx = random.choice(examples_for_class)+            positive_idx = random.choice(examples_for_class)+            while positive_idx == anchor_idx:+                positive_idx = random.choice(examples_for_class)+            x[0, class_idx] = x_train[anchor_idx]+            x[1, class_idx] = x_train[positive_idx]+        return x+++"""+We can visualise a batch in another collage. The top row shows randomly chosen anchors+from the 10 classes, the bottom row shows the corresponding 10 positives.+"""++for examples in AnchorPositivePairs(num_batchs=1):+    pass++show_collage(examples)++"""+## Embedding model++We define a custom model with a `train_step` that first embeds both anchors and positives+and then uses their pairwise dot products as logits for a softmax.+"""+++class EmbeddingModel(keras.Model):+    def train_step(self, data):+        data = tf.reshape(data, (2, num_classes, height_width, height_width, 3))+        anchors, positives = data[0], data[1]

I think using the recently exposed (in tf-nightly) tf.keras.utils.unpack_x_y_sample_weight is the best solution here:

def train_step(self, data):
  x, _, _ = tf.keras.utils.unpack_x_y_sample_weight(data)

This will always work for all data formats Model.fit accepts

matpalm

comment created time in a month

pull request commentkeras-team/keras-tuner

Fix #284. Parent condition should not be checked.

@qlzh727 Yep I'll take a look now, thanks!

yixingfu

comment created time in a month

issue closedtensorflow/tensorflow

Multiple step predict seems to be wrong

20200520_163852

The original data of x_test has a non-linear random walk property,

but the newly predicted 20 values have a linear shape.

Clearly, did I make the wrong prediction?

you can see full source here https://colab.research.google.com/drive/1kk24KjpZQEZpdlBxr4D4DO-IGHJ0439v?usp=sharing

and my tf version is 2.1.0 and python 3.7.7 image

closed time in a month

Lay4U

issue commenttensorflow/tensorflow

Multiple step predict seems to be wrong

@Lay4U Thanks for the issue!

This just seems like a Model quality issue. Here is a simple example showing that predict is working correctly:

import numpy as np
import tensorflow as tf

class MyLayer(tf.keras.layers.Layer):
  def build(self, _):
    self.v = tf.Variable(2.)

  def call(self, x):
    return self.v * x

model = tf.keras.Sequential([MyLayer()])
model.compile('sgd', 'mse')
model.predict(x=np.arange(10).astype(np.float32))
Lay4U

comment created time in a month

push eventkeras-team/keras-tuner

Avichal Goel

commit sha d3597610946ce8f4fa7ab1949cc4444590dfdb4f

Control trial output verbosity (#312) * Control trial output verbosity * Self-contained logging logic * Add test for logging

view details

push time in a month

PR merged keras-team/keras-tuner

Control trial output verbosity

Allows individual trial outputs to be suppressed during a search by specifying verbose=0.

+30 -4

1 comment

3 changed files

Aviously

pr closed time in a month

pull request commentkeras-team/keras-tuner

Control trial output verbosity

Failures are unrelated, merging now

Aviously

comment created time in a month

push eventkeras-team/keras-tuner

Haifeng Jin

commit sha a820f78e8c2f51b28e4fc5d5e94ff1f1f7104eb3

Update README.md (#313)

view details

push time in a month

Pull request review commentkeras-team/keras-tuner

Control trial output verbosity

 def search(self, *fit_args, **fit_kwargs):                 # Oracle is calculating, resend request.                 continue -            self.on_trial_begin(trial)+            self.on_trial_begin(trial, verbose)

I think rather than adding arguments to on_trial_begin and on_trial_end, it might make more sense to do self._display.verbose = verbose and then have the display contain the logic for what it should/shouldn't print

That way the logging logic is more self-contained in the display object. WDYT?

Aviously

comment created time in a month

issue commentkeras-team/keras-tuner

What is the purpose of the ordered keyword for Choice params?

@ben-arnao Thanks for the issue!

The ordered keyword is used to distinguish bt/ categorical and discrete hyperparameters

As you mentioned, none of our current Oracles treat these differently. However, there is an advanced Oracle coming that calls into Google Cloud for very sophisticated optimization algorithms, and this Oracle uses this info (also it's possible for users to write their own Oracle that uses this)

As far as the BayesianOptimizationOracle, agreed it's possible that the ordering of a categorical param can affect the results. I think the only way around this though would be to run the search multiple times with different orderings. Drawing random values in prob->value seems dangerous to me, but it may work for your use case

ben-arnao

comment created time in a month

issue commentkeras-team/keras-tuner

FR: TensorBoard HP visualization

@rmgogogo Thanks for the FR!

Agreed, we have some plans for this, we're hoping to land this integration around the time TF2.3 is released

rmgogogo

comment created time in a month

issue closedkeras-team/keras-tuner

keras tuner for keras functional models

Is it possible to use keras tuner in keras functional models?

closed time in a month

amirzlq5

issue commentkeras-team/keras-tuner

keras tuner for keras functional models

@amirzlq5 Thanks for the issue!

Yes it's possible to use Keras Tuner with any Keras model type, please see the linked example in the comment above

amirzlq5

comment created time in a month

issue commenttensorflow/models

Evaluation/Finetuning of Resnet 50 in TF 2.X

@peri044 Thanks for the issue!

The handle for tensorflow.python.distribute.input_lib.DistributedDatasetsFromFunction is added inside TF-nightly I know for sure. @omalleyt12 Is it added for TF 2.2?

Unfortunately it looks like that support didn't make it into 2.2. At head we handle distributed datasets: code

In TF2.2, we expect model.fit to be passed a non-distributed dataset, and then we call tf.distribute.Strategy.distribute_dataset on it

peri044

comment created time in 2 months

issue closedtensorflow/tensorflow

Dataset iterating different behavior in TF 2.1 and 2.2

System information

  • OS Platform and Distribution: Windows 10 Home
  • TensorFlow versions:
    • 2.1: v2.1.0-rc2-17-ge5bf8de410 2.1.0
    • 2.2: v2.2.0-rc4-8-g2b96f3662b 2.2.0
  • Python version: 3.7.6

I was not sure how to report this issue as it might be bug or just expected behavior. There are difference in TF 2.1 and 2.2. This is a code snippet to reproduce my issue:

import math
import numpy as np
import tensorflow as tf

# simple dataset with zeros
batch_size = 32
features = np.zeros((10000, 60, 2))
labels = np.zeros((10000, 1))
train_data = tf.data.Dataset.from_tensor_slices((features, labels)).batch(batch_size)
train_steps = int(math.ceil(features.shape[0] / batch_size))

# simple model with Dense layers
inputs = tf.keras.Input(shape=(features[0].shape[0], features[0].shape[1]))
x = tf.keras.layers.Dense(32, activation="relu")(inputs)
outputs = tf.keras.layers.Dense(1, activation="relu")(x)
model = tf.keras.Model(inputs, outputs, name="example_model")

# model fitting
model.compile(loss="mse", optimizer="adam", metrics=["mse"])
model.fit(train_data, epochs=100, steps_per_epoch=train_steps)

When I run this code in TF2.1 it will produce this error: https://pastebin.com/4M43SE44 After the first epoch, there are warnings about end of sequence, that my input ran out of data. And finally, as you can see in the pasted output, it raises ValueError: Empty training data.. When I change line with dataset creation to <pre> train_data = tf.data.Dataset.from_tensor_slices((features, labels)).batch(batch_size).<b>repeat()</b> </pre> than everything works as expected.

This is behavior I would expect. (Note steps_per_epoch attribute as I wan to control this by myself, of course when I do not use repeat and steps_per_epoch is set to None, it will work under TF2.1 as it will iterate whole dataset every epoch).

When I run the same code with TF2.2 (no repeat, train_steps are specified) it works without any issue. Is this behavior intentional? Why does it work in TF2.2 and not 2.1? Could anyone elaborate on this issue?

closed time in 2 months

sondracek

issue commenttensorflow/tensorflow

Dataset iterating different behavior in TF 2.1 and 2.2

@sondracek Thanks for the issue!

Yep this is something we added support for in 2.2

We can sometimes know the exact size of the Dataset you pass in. If we can know this size, and you pass the exact size in steps_per_epoch, then we will assume that you meant for us to recreate the Iterator each epoch.

This seems to me to be the most intuitive behavior, since what we do if you don't pass steps_per_epoch is we infer it to be the entire size of the Dataset

Closing as intended behavior, but please re-open if you think this behavior is confusing and should be changed, I don't have a strong opinion either way

sondracek

comment created time in 2 months

issue closedtensorflow/tensorflow

keras training parameter value incorrect

tensorflow ver: 2.1.0

import tensorflow as tf
import numpy as np

class MyModel(tf.keras.Model):

    def __init__(self):
        super(MyModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
        self.dropout = tf.keras.layers.Dropout(0.5)

    def call(self, inputs, training=None):
        x = self.dense1(inputs)
        if training is True:
            print("in training")
            x = self.dropout(x, training=training)
        elif training is None:
            print("training None")
        else:
            print("not in training")
            
        return self.dense2(x)

model = MyModel()

optimizer = tf.keras.optimizers.Adam(1e-4)
loss = tf.keras.losses.CategoricalCrossentropy()
model.compile(optimizer, loss)
x = tf.random.normal((5,))
y = tf.ones((5,))
model.fit(x, y, epochs=1)

The system report "not in training first", then "in training". image

closed time in 2 months

w19787

issue commenttensorflow/tensorflow

keras training parameter value incorrect

@w19787 Thanks for the issue!

I fixed the example to pass batched tensors for x and y (IIUC these were just dummy values to show the error, not the root cause of the error).

It is now passing in TF2.2 for me:

import tensorflow as tf
import numpy as np

class MyModel(tf.keras.Model):

    def __init__(self):
        super(MyModel, self).__init__()
        self.dense1 = tf.keras.layers.Dense(4, activation=tf.nn.relu)
        self.dense2 = tf.keras.layers.Dense(5, activation=tf.nn.softmax)
        self.dropout = tf.keras.layers.Dropout(0.5)

    def call(self, inputs, training=None):
        x = self.dense1(inputs)
        if training is True:
            print("in training")
            x = self.dropout(x, training=training)
        elif training is None:
            print("training None")
        else:
            print("not in training")
            
        return self.dense2(x)

model = MyModel()

optimizer = tf.keras.optimizers.Adam(1e-4)
loss = tf.keras.losses.CategoricalCrossentropy()
model.compile(optimizer, loss)
x = tf.random.normal((1, 5))
y = tf.ones((1, 5))
model.fit(x, y, epochs=1)

Closing bc I can't repro, but if you are still seeing this issue with TF2.2 please reopen!

w19787

comment created time in 2 months

issue closedtensorflow/tensorflow

Customized loss function requires eager tensor, but symbolic tensor is passed

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): tensorflow-gpu==2.2.0
  • Python version:3.6
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 10.1/7.6.5
  • GPU model and memory: Quadro RTX 8000 / 48GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)" v2.2.0-rc4-8-g2b96f3662b 2.2.0

Describe the current behavior I need to apply a binary mask to the model output for computing loss. My current implementation uses a model that takes two inputs (the data and the mask), and use function closure to implement the customized loss.

However, this raises the error "tensorflow.python.eager.core._SymbolicException: Inputs to eager execution function cannot be Keras symbolic tensors".

Apparently, the mask input is treated as symbolic tensor.

Describe the expected behavior

This only happens in the eager mode. Apply disable_eager_execution() will eliminate the problem. However, I want to know if there is any way to make this work in the eager mode.

Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

This is the gist

import tensorflow as tf
import numpy as np

import tensorflow.keras.backend as K

from tensorflow.keras.layers import Input, Flatten, Dense
from tensorflow.keras import Model

x_data = np.zeros((32, 28, 28))
x_mask = np.zeros((32, 10))
y = np.zeros((32, 10))

input_data = Input(shape=(28, 28))
input_mask = Input(shape=(10,))

output= Flatten()(input_data)
output = Dense(64, activation='relu')(output)
output = Dense(10)(output)
model = Model(inputs=[input_data, input_mask], outputs=output)


def custom_loss():
    def loss(y_true, y_pred):
        # This line causes the error
        return K.mean(K.square(y_true - y_pred * model.inputs[1]), axis=-1)   

        # This line doesn't cause the error
        # return K.mean(K.square(y_true - y_pred), axis=-1)   
    return loss


model.compile(
    optimizer=tf.keras.optimizers.SGD(),
    loss=custom_loss(),
    metrics=['accuracy'])

for i in range (2):
    print(i)
    model.train_on_batch([x_data, x_mask], y)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 2 months

chuanli11

issue commenttensorflow/tensorflow

Customized loss function requires eager tensor, but symbolic tensor is passed

@chuanli11 Thanks for the issue!

Yes, the model.inputs are symbolic Tensors. For Functional Models, these Tensors are used to build the Model with a static graph, but in eager mode the Model is then executed with a tf.function. This means that model.inputs can't be used in losses, metrics, etc.

Instead, here's how I'd recommend achieving your use case. Essentially, sample_weight will handle this, rather than trying to access the mask an a keras.Input:

import tensorflow as tf
import numpy as np

from tensorflow.keras.layers import Input, Flatten, Dense
from tensorflow.keras import Model

x_data = np.zeros((32, 28, 28))
x_mask = np.zeros((32, 10))
y = np.zeros((32, 10))

input_data = Input(shape=(28, 28))
output= Flatten()(input_data)
output = Dense(64, activation='relu')(output)
output = Dense(10)(output)
model = Model(inputs=input_data, outputs=output)


class MyLoss(tf.keras.losses.Loss):
  def call(self, y_true, y_pred):
      return (y_true - y_pred) ** 2  


model.compile(
    optimizer=tf.keras.optimizers.SGD(),
    loss=MyLoss(name='loss'),
    metrics=['accuracy'])

for i in range (2):
    print(i)
    loss = model.train_on_batch(x_data, y, sample_weight=x_mask)

Hope that helps!

Closing out as this is intended behavior

chuanli11

comment created time in 2 months

pull request commenttensorflow/tensorflow

Add __reduce_ex__ to Keras Model to enable copy.deepcopy and pickle

@adriangb Thanks for the PR!

I really like the overall idea in this PR, and IMO pickle and copy.deepcopy is something we should definitely support. We actually had some tests for this but they don't have enough coverage

I have a few worries with the implementation hereL

(1) This will only work with Functional API models. IMO we need a solution that works for user-subclassed Models as well (2) We're locking ourselves into an implicit serialization format for Model that turns it into a 3-tuple of (Model.get_config(), compiled_state, weights)`

What I'd really like to see is an approach to pickle that uses the best of SavedModel and the best of pickle. That is, an implementation that delegates saving of Checkpoints, tf.function traces, etc to SavedModel, but is able to faithfully recreate the Python state using pickle.

Ideally I think it would be something like this:

When pickling:

(1) Extract all the things SavedModel can handle well (tf.Variables, tf.functions, etc) (2) Save those things using SavedModel format (3) Remove those things from the Model (4) Pickle the stripped Model

When unpickling:

(1) Unpickle the stripped Model (2) Load the SavedModel saved file corresponding to the pickle (3) Restore those objects into the stripped Model

This would make sure that you could pickle an object in one DistributionStrategy context and load it in another. It would also work with user-subclassed Models

I'm not sure how feasible this approach is though, wdyt?

@k-w-w for SavedModel and serialization knowledge

adriangb

comment created time in 2 months

issue commenttensorflow/models

i think it is because the eval process the source language and it concat target language output errors .

@paulrich1234 Different batch sizes are ok, but different sizes for the second dimension (36 and 56 in this example) won't work because Model.predict concatenates batches into one Tensor along the batch dimension.

To support outputs from Model.predict that have different sizes in non-batch dimensions, the Model outputs would have to be RaggedTensors

paulrich1234

comment created time in 2 months

issue commenttensorflow/models

i think it is because the eval process the source language and it concat target language output errors .

This could be due to the Model.predict rewrite

Q: are the expected return values here Tensors or RaggedTensors? It seems to be trying to concat two Tensors of shape [32, 178] and [32, 175] along the first dimension, which will fail because the second dimensions are not equal

Are these shapes expected?

paulrich1234

comment created time in 2 months

push eventkeras-team/keras-tuner

Haifeng Jin

commit sha deef71222908e17685714c5994bf9d6fab036957

add on_fit_begin for AutoKeras to override (#299) * add on_fit_begin * Update tuner.py * add test and change func to private * update tests

view details

push time in 2 months

PR merged keras-team/keras-tuner

add on_fit_begin for AutoKeras to override

Overriding this function would support preprocessing layers in AutoKeras.

Not sure why the tests fails. I don't think this would break any test.

+62 -0

2 comments

3 changed files

haifeng-jin

pr closed time in 2 months

pull request commentkeras-team/keras-tuner

add on_fit_begin for AutoKeras to override

Failures are unrelated, merging

haifeng-jin

comment created time in 2 months

Pull request review commentkeras-team/keras-tuner

add on_fit_begin for AutoKeras to override

 def on_train_begin(self, logs):                  callbacks=[logging_callback])      assert len(callback_instances) == 6+++def test_on_train_begin_existence(tmp_dir):

I think we should test that this method is actually called in run_trial, i.e. something like:

class MyTuner(Tuner):
  def _on_train_begin(self, ...):
    self.was_called = True

tuner = MyTuner()
tuner.run_trial(...)
assert tuner.was_called
haifeng-jin

comment created time in 2 months

Pull request review commentkeras-team/keras-tuner

add on_fit_begin for AutoKeras to override

 def run_trial(self, trial, *fit_args, **fit_kwargs):         copied_fit_kwargs['callbacks'] = callbacks          model = self.hypermodel.build(trial.hyperparameters)+        self.on_fit_begin(model, trial.hyperparameters, *fit_args, **copied_fit_kwargs)         model.fit(*fit_args, **copied_fit_kwargs) +    def on_fit_begin(model, hp, *fit_args, **fit_kwargs):

Discussed offline, a private hook is good for now

haifeng-jin

comment created time in 2 months

pull request commentkeras-team/keras-tuner

add on_fit_begin for AutoKeras to override

Thanks for the PR!

haifeng-jin

comment created time in 2 months

Pull request review commentkeras-team/keras-tuner

add on_fit_begin for AutoKeras to override

 def run_trial(self, trial, *fit_args, **fit_kwargs):         copied_fit_kwargs['callbacks'] = callbacks          model = self.hypermodel.build(trial.hyperparameters)+        self.on_fit_begin(model, trial.hyperparameters, *fit_args, **copied_fit_kwargs)         model.fit(*fit_args, **copied_fit_kwargs) +    def on_fit_begin(model, hp, *fit_args, **fit_kwargs):

Do we want to commit to supporting a hook here, or should we ask autokeras to override run_trial to support this?

I'm worried that it will be somewhat confusing when to supply a Callback with on_train_begin versus when to subclass Tuner, and we will probably have a native solution for preprocessing layers soon

haifeng-jin

comment created time in 2 months

issue closedtensorflow/tensorflow

tf.keras.models.Model.fit strange behaviour after upgrading from 2.1 to 2.2-rc1

I was experimenting with tf.keras.applications.inception_v3.InceptionV3 for classifying skin cancer lesions. It was going smooth since when Colaboratory decided to upgrade its VM's TF version from 2.1 to 2.2-rc1.

Now when loading the model from disk it says:

WARNING:tensorflow:Error in loading the saved optimizer state. As a result, your model is starting with a freshly initialized optimizer.

But most importantly the Model.fit is not able anymore to properly process a keras.utils.Sequence! Indeed during training the steps per epoch are no more inferred from the Sequence object showing 1/Unknown, also it does not actually terminate the epoch!

Snippet to reproduce the issue: Colab code: https://colab.research.google.com/drive/1wdlWES83ibvLCwzpHrhJsQBNep-Aycqj HAM1000 dataset: https://www.kaggle.com/kmader/skin-cancer-mnist-ham10000 ISIC dataset: https://www.isic-archive.com/#!/topWithHeader/wideContentTop/main

The code stopped working overnight after Colab updated their VM images.

In the code the HAM dataset is automatically downloaded and rearranged provided you have kaggle.json file with the API Token. The ISIC dataset needs to be downlaoded manually from their official website or from my GDrive here

NB: Also even if I create a new network and start training it with a Sequence the steps per epochs are not inferred as well.

closed time in 2 months

lamba92

issue commenttensorflow/tensorflow

tf.keras.models.Model.fit strange behaviour after upgrading from 2.1 to 2.2-rc1

Thanks for the issue! This should be fixed in 2.2

lamba92

comment created time in 2 months

issue commenttensorflow/tensorflow

[Regression] batch_begin/end callbacks no longer get batch number and size

2 questions on the code: Why is the flatten required? Isn't y_true a Tensor with the first dim being the batch size already?

It's to handle the case where a multi-output Model is used (just to be robust). If your Model only has 1 output it's not needed. tf.nest.flatten will turn any nested Python structure (tuple, list, dict, etc) into a flat list

What is the identity for? It's a no-op isn't it?

Just to make sure we're not returning the tf.Variable directly. It's probably ok to do this, but this protects from accidental modification

Flamefire

comment created time in 2 months

issue commenttensorflow/tensorflow

[Regression] batch_begin/end callbacks no longer get batch number and size

Yep this should work in TF2.1 as well, here's a full example (had to fix the metric code a bit):

import tensorflow as tf

class Count(tf.keras.metrics.Metric):
  def __init__(self, name=None, dtype=None, **kwargs):
    super(Count, self).__init__(name, dtype, **kwargs)
    self.count = tf.Variable(0)

  def update_state(self, y_true, y_pred, sample_weight=None):
    first_tensor = tf.nest.flatten(y_true)[0]
    batch_size = tf.shape(first_tensor)[0]
    self.count.assign_add(batch_size)

  def result(self):
    return tf.identity(self.count)


class PrintInfo(tf.keras.callbacks.Callback):
  def on_train_batch_end(self, batch, logs):
    print('Batch number: {}'.format(batch))
    print('Samples seen this epoch: {}'.format(logs['counter']))

model = tf.keras.Sequential([tf.keras.layers.Dense(1)])
model.compile(optimizer='sgd', loss='mse', metrics=[Count(name='counter')])
x, y = tf.ones((10, 10)), tf.ones((10, 1))
model.fit(x, y, batch_size=2, callbacks=[PrintInfo()], verbose=2)
Flamefire

comment created time in 2 months

issue commenttensorflow/tensorflow

[Regression] batch_begin/end callbacks no longer get batch number and size

I have a callback counting the number of examples processed (doing further statistics later). In general the batch size may not be constant (e.g. trailing batch) and passing the batch size to the callback and only counting batches duplicates the batch size and is hence error prone.

I think this is best handled by a custom Metric:

class Counter(tf.keras.metrics.Metric):
  def __init__(self, name=None, dtype=None, **kwargs):
    super(Counter, self).__init__(name, dtype, **kwargs)
    self.count = tf.Variable(0)

  def update_state(self, y_true, y_pred, sample_weight=None):
    batch_size = tf.shape(tf.nest.flatten(y_true))[0]
    self.count.assign_add(batch_size)

  def result(self):
    return tf.identity(self.count)

This is surprising as it is a breaking change not mentioned in the release notes and contradicts the above linked documentation and even the current docstring

Good point, we need to update the docstring

Flamefire

comment created time in 2 months

issue commenttensorflow/tensorflow

[Regression] batch_begin/end callbacks no longer get batch number and size

@Flamefire Thanks for the issue! For batch number, you can use the batch argument to those methods: on_train_batch_begin(batch, logs)

The batch size is no longer passed to the logs, this is expected due a rewrite of Model.fit. Could you explain your use case? I can advise on how this can be achieved with the new implementation

Flamefire

comment created time in 2 months

issue closedtensorflow/tensorflow

Unwanted tf.function retracing when using variable-length inputs

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): 2.2.0rc2
  • Python version: 3.6.8

Describe the current behavior

A lot of warnings saying that there is a tf.function retracing are happening when using a keras model in a loop with variable length inputs.

Describe the expected behavior

I would like not to have retracing if there is no need (for example a fully convolutionnal model).

Standalone code to reproduce the issue

from random import randint

import tensorflow as tf
from tensorflow.keras.layers import Conv1D
from tensorflow.keras.models import Sequential

model = Sequential()
model.add(Conv1D(8, 3))
model.build([None, 12, 1])

predict_tensors = [
    tf.random.normal([randint(1, 8), randint(4, 40), 1])
    for _ in range(10)
]
for t in predict_tensors:
    _ = model.predict(t)

Other info / logs

Logs:

WARNING: Logging before flag parsing goes to stderr.
W0406 09:22:52.525994 139643050075904 def_function.py:598] 5 out of the last 6 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f00a7fc1268> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
W0406 09:22:52.615050 139643050075904 def_function.py:598] 6 out of the last 7 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f00a7fc1268> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
W0406 09:22:52.653312 139643050075904 def_function.py:598] 7 out of the last 8 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f00a7fc1268> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.
W0406 09:22:52.706550 139643050075904 def_function.py:598] 8 out of the last 10 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7f00a7fc1268> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings is likely due to passing python objects instead of tensors. Also, tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. Please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for more details.

This issue was originally described here, and some other people have had trouble with training as well.

When switching back to 2.1, the problem is gone.

closed time in 2 months

zaccharieramzi

issue commenttensorflow/tensorflow

Unwanted tf.function retracing when using variable-length inputs

@zaccharieramzi Thanks for the issue! This should be fixed in the latest nightly

zaccharieramzi

comment created time in 2 months

issue closedtensorflow/tensorflow

overriding `make_train_function` does not work.

System information

  • TensorFlow version (you are using): 2.2.X
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state. Reference: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py

Overriding the train_step function of the Model class works well. But if you want to go a bit deeper down and override make_train_function then it does not work.

In the document, it says that it is possible

This method can be overridden to support custom training logic.

But when you do that, it gives some errors. I tried to import what it asks for in the errors but it still wants more to import.

Here is a sample code based on the recent François Chollet's notebook:

https://colab.research.google.com/drive/1pQX2pjXSU1AU182LuQge7xNeBJcFA3yb

Here you see that when you override it, it gives an error.

Will this change the current api? How? It can.

Who will benefit with this feature? Everybody who wants to write a customized training loop while taking advantage of the Keras fit.

As François Chollet' says:

"what if you need a custom training algorithm, but you still want to benefit from the convenient features of fit(), such as callbacks, built-in distribution support, or step fusing?"

closed time in 2 months

mmalekzadeh

issue commenttensorflow/tensorflow

overriding `make_train_function` does not work.

@mmalekzadeh Thanks for the issue!

Looks like there's two things going on in the example provided:

(1) The code is using code copy & pasted from Model.make_train_function. The default Model.make_train_function uses internal private TF APIs. To override, you will have to use public APIs

(2) The code is using the version at HEAD, which contains code that doesn't exist in 2.2

In general, I'd recommend only overriding Model.make_train_function if you want very low-level control over the tf.function that gets created and used in Model.fit. This should only be used for very advanced use cases. Overriding Model.train_step should be enough for most use cases. If Model.make_train_function is overridden, you will have to return a tf.function that accepts an tf.data.Iterator and returns a dict of logs, and should construct this tf.function using public APIs available in TF2.2

mmalekzadeh

comment created time in 2 months

issue closedtensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): "18.04.1 LTS (Bionic Beaver)"
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: no
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): tf.VERSION = 1.12.0
  • Python version: python3.6
  • Bazel version (if compiling from source): no
  • GCC/Compiler version (if compiling from source): no
  • CUDA/cuDNN version: cuda9.0 with cuDNN 7.4.1
  • GPU model and memory: GTX 1080 with 8 GB

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the current behavior I am trying to pass the tfrecords read through tf.data.Dataset api into the model.fit . Since the images could be of different sizes, I am storing the image shapes into tfrecords itself which are later on read and applied to the img data using tf.reshape . But the tensorflow.keras is unable to determine the shape of this image data at this stage and throws the error.

def _parse_function(proto):
    keys_to_features = {"im_path": tf.FixedLenSequenceFeature([], tf.string, allow_missing=True),
                        "im_shape": tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
                        "im_arr": tf.FixedLenSequenceFeature([], tf.string, allow_missing=True),
                        "label": tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
                        }

    parsed_features = tf.parse_single_example(serialized=proto, features=keys_to_features)
    parsed_features['im_arr'] = parsed_features['im_arr'][0]
    parsed_features['label'] = parsed_features['label'][0]
    parsed_features['im_arr'] = tf.decode_raw(parsed_features['im_arr'], tf.uint8)
    parsed_features['im_arr'] = tf.reshape(parsed_features['im_arr'], parsed_features['im_shape'])

    return parsed_features['im_arr'], parsed_features['label']

The error thrown is as follows :

Traceback (most recent call last):
  File "issue/IssueScript.py", line 75, in <module>
    verbose=1)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
    validation_split=validation_split)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
    class_weight, batch_size)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1117, in _standardize_weights
    exception_prefix='input')
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 284, in standardize_input_data
    data = [standardize_single_array(x) for x in data]
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 284, in <listcomp>
    data = [standardize_single_array(x) for x in data]
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 218, in standardize_single_array
    if x.shape is not None and len(x.shape) == 1:
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 579, in __len__
    raise ValueError("Cannot take the length of Shape with unknown rank.")
ValueError: Cannot take the length of Shape with unknown rank.

So as a debugging step, I removed the length check present in the standardize_single_array function by changing the check as (note the False and part which bypasses the length check)

  if x is None:
    return None
  if False and (x.shape is not None and len(x.shape) == 1):
    if tensor_util.is_tensor(x):
      return array_ops.expand_dims(x, axis=1)
    else:
      return np.expand_dims(x, 1)
  return x

Then I get the following error

Traceback (most recent call last):
  File "issue/IssueScript.py", line 75, in <module>
    verbose=1)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1536, in fit
    validation_split=validation_split)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 992, in _standardize_user_data
    class_weight, batch_size)
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1154, in _standardize_weights
    exception_prefix='target')
  File "opt/github/example-classification/env/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 323, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking target: expected activation_4 to have 2 dimensions, but got array with shape (None,)

I did the same with the above error. I removed the check present at line 323 by commenting out the length check as follows.

        """
        if len(data_shape) != len(shape):
          raise ValueError('Error when checking ' + exception_prefix +
                           ': expected ' + names[i] + ' to have ' +
                           str(len(shape)) + ' dimensions, but got array '
                           'with shape ' + str(data_shape))
        """

Now the training proceeds smoothly without error. I believe there is issue with tf.reshape when tensors are supplied as a shape to the function.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. Code : https://github.com/dineshdharme/tensorflow-issue1 Just run : python3 issue/IssueScript.py

I have also added a tfrecords generating script tfrecords_utils.py which you can call by To generate tfrecords file using the image data present in the data folder : python3 issue/tfrecords_utils.py

closed time in 2 months

dineshdharme

issue commenttensorflow/tensorflow

"ValueError: Cannot take the length of Shape with unknown rank". error when passing tf.data.Dataset tensors to model.fit

@dineshdharme Thanks for the issue!

Apologies for the delay, tf.keras's built-in training loops just went through a major rewrite in order to support custom training steps out-of-the-box. Many fixes were blocked on this rewrite.

This is now fixed at head, here's an example of passing Tensors of unknown rank to Model.fit:

import tensorflow as tf
import numpy as np

def my_numpy_fn(x, y):
  return -x, -y

features = np.arange(10).astype(np.float32)
labels = 2 * features
ds = tf.data.Dataset.from_tensor_slices((features, labels))
# Do a transformation that loses rank information.
ds = ds.map(
    lambda x, y: tf.numpy_function(
        my_numpy_fn, inp=[x, y], Tout=[tf.float32, tf.float32]
        )
    ).batch(2)

assert iter(ds).output_shapes[0] == tf.TensorShape(None)

# Model works with Tensors of unknown rank.
# Note that if your Model uses layers like `Dense`, etc. that
# only work with ceratin ranks, you should still use `x.set_shape`
# before passing the data to that layer, to give the Model a hint
# about the rank.
class MyModel(tf.keras.Model):
  def call(self, x):
    return 2 * x

model.compile('sgd', 'mse')
model.fit(ds)

For more info on the rewrite, please check out the 2.2 release notes

dineshdharme

comment created time in 2 months

issue commenttensorflow/tensorflow

TF 2.2 - Train_step output control

@ManiadisG Thanks for the update!

I think we should definitely fix ProgbarLogger, History, and TensorBoard so that they only try to handle log items that are scalars

Re adding another output to Model.train_step, I'm wary of adding another return value to the function. One way I've seen people handle this kind of use case is to smuggle the value out by assigning it to a tf.Variable, something like:

import numpy as np
import tensorflow as tf

BATCH_SIZE = 2

class MyModel(tf.keras.Model):

  def __init__(self):
    super(MyModel, self).__init__()
    self.outputs = tf.Variable(tf.zeros((BATCH_SIZE, 10)),
                               trainable=False)
    self.layer = tf.keras.layers.Dense(10)

  def call(self, inputs):
    return self.layer(inputs)

  def train_step(self, data):
    x, y = data
    with tf.GradientTape() as tape:
      y_pred = self(x)
      self.outputs.assign(y_pred)
      loss = self.compiled_loss(
          y, y_pred, regularization_losses=self.losses)
    trainable_variables = self.trainable_variables
    gradients = tape.gradient(loss, trainable_variables)
    self.optimizer.apply_gradients(zip(gradients, trainable_variables))
    self.compiled_metrics.update_state(y, y_pred)
    return {m.name: m.result() for m in self.metrics}

class MyCallback(tf.keras.callbacks.Callback):
  def on_train_batch_end(self, batch, logs=None):
    print(tf.reduce_sum(self.model.outputs))

model = MyModel()
model.compile('sgd', 'mse')
x, y = np.ones((10, 100)), 10 * np.ones((10, 10))
model.fit(x, y, batch_size=BATCH_SIZE, callbacks=[MyCallback()], verbose=2)

Do you think that pattern would work for you? I still think medium-term we should fix the built-in Callbacks to handle this use case

ManiadisG

comment created time in 2 months

Pull request review commentgoogle/TensorNetwork

Adding MPO layer

 def is_perfect_root(n, n_nodes):         trainable=True,         initializer=self.bias_initializer) if self.use_bias else None -  def call(self, inputs: tf.Tensor) -> tf.Tensor:+  def call(self, inputs: tf.Tensor, **kwargs) -> tf.Tensor: # pylint: disable=unused-argument

That's unfortunate, I'd recommend suppressing the original linter warning for now though, and removing the **kwargs here. We'll look into why this changed in 2.2

bpenchas

comment created time in 2 months

push eventomalleyt12/community

omalleyt12

commit sha dbe788e038181e1e90e8bf86273a290085c571a4

Update 2020-04-20-Optimizer-minimize.md

view details

push time in 2 months

issue commenttensorflow/tensorflow

TF 2.2 - Train_step output control

@ManiadisG Thanks for the issue!

but adding that output to the logs creates a domino of problems when using callbacks.

Could you provide example of the errors you've seen? It's probably best to fix the problem at this level IMO

ManiadisG

comment created time in 2 months

pull request commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

@omalleyt12 gradient transformations may have unintended consequences when paired with moment based optimizers such as Adam. These optimizers usually take the first and second moment of the gradient to compute adaptive learning rates for each parameter. Scaling a gradient by a factor X will not scale the parameter update by a factor of X, because the moments of the gradient do not scale linearly with the gradient.

@hyang0129 Agreed, the loss scaling optimizer example doesn't actually scale the gradients. What it does is that it temporarily scales up the loss to compute gradients in a numerically stable way, and then unscales them before applying Variable updates

For other, user-written gradient transformations such as gradient clipping, ensuring that the math is doing what they want is up to them

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):+    if is_tensor(loss) and not tape:+      raise ValueError('Must provide tape with tensor loss.')+    tape = tape or GradientTape()+    with tape:+      if callable(loss):+        loss = loss()+      loss = self._transform_loss(loss) # A no-op in our built-in optimizers+    gradients = self._get_gradients(loss, variables, tape)+    grads_and_vars = zip(gradients, variables)+    grads_and_vars = self._transform_unaggregated_gradients(grads_and_vars)+    if all_reduce_sum_gradients:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)+    return grads_and_vars++  def apply_gradients(self, grads_and_vars, aggregate=True):+    if aggregate:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)  # No-op by default

Since transform_gradients is a method that applies only to aggregated gradients, in this case it can't be performed w/o aggregation

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):

Fixed the names, meant to have the same name as our existing param, which shipped in 2.2 as experimental_aggregate_gradients

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):+    if is_tensor(loss) and not tape:+      raise ValueError('Must provide tape with tensor loss.')+    tape = tape or GradientTape()+    with tape:+      if callable(loss):+        loss = loss()+      loss = self._transform_loss(loss) # A no-op in our built-in optimizers+    gradients = self._get_gradients(loss, variables, tape)+    grads_and_vars = zip(gradients, variables)+    grads_and_vars = self._transform_unaggregated_gradients(grads_and_vars)+    if all_reduce_sum_gradients:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)+    return grads_and_vars++  def apply_gradients(self, grads_and_vars, aggregate=True):+    if aggregate:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)  # No-op by default+    # By passing all_reduce_sum_gradients, only the Variable updates are run.+    # This gives users complete control, in the case that they don't want to use+    # the hooks provided.+    self.apply_updates(grads_and_vars)+```+++Use of Optimizer.minimize in Model.train_step:++```python+class Model:++  def train_step(self, data):+    data = expand_1d(data)+    x, y, sample_weight = unpack_x_y_sample_weight(data)+    with tf.GradientTape() as tape:+       y_pred = self(x, training=True)+       loss = self.compiled_loss(y, y_pred, sample_weight, self.losses)+   self.optimizer.minimize(loss, self.trainable_variables, tape=tape)+   self.compiled_metrics.update_state(y, y_pred, sample_weight)+   return {m.name: m.result() for m in self.metrics}+```++Details of proposal:++* Adds the ability to accept a loss Tensor and a GradientTape to Optimizer.minimize.++* Maintains full backwards compatibility. When a callable loss is passed, simply create a GradientTape and call the loss inside it like currently done.++* Add public Optimizer methods that can be overridden to support custom functionality for the steps outlined in the Background section:+++(1) `Optimizer._transform_loss`++(2) `Optimizer._get_gradients`++(3) `Optimizer._transform_unaggregated_gradients`++(4) `Optimizer._aggregate_gradients`++(5) `Optimizer._transform_gradients` (aggregated gradients)++(6) `Optimizer._apply_updates` (calls existing existing _resource_apply_{dense|sparse})++(a) Item (6) mirrors `Sonnet`’s apply method (i.e. is “just the math”)++* Use Optimizer.minimize API in Model.fit++* Optimizer.apply_gradients method is kept. For users who want to control all loss and gradient manipulation, and want the Optimizer to simply apply the Variable updates, they can call `Optimizer.apply_gradients(..., aggregate=False)`+++## Examples++(1) Custom gradient clipping++```python+optimizer = tf.keras.optimizers.Adam(0.1, transform_gradients=my_gradient_clipping)+```++(2) Mixed precision (most complicated example):++```python+class LossScaleOptimizer(Optimizer)+  def __init__(self, optimizer):+    self.optimizer = optimizer++  def _get_hyper(self, name):+    # Optional. Allows access to the wrapped Optimizer's +    # hyperparameters (e.g. learning_rate)+    self.optimizer._get_hyper(name)++  def _transform_loss(self, loss):+    loss = self.optimizer._transform_loss(loss)+    # Mixed precision needs to scale loss before calculating gradients+    return self.scale_loss(loss)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Note: For performance, we could add a check here to see if+    # self.optimizer._transform_unaggregated_gradients is not implemented, and if+    # so to skip these scaling / unscalings. Or Grappler could optimize it out.+    gradients, variables = unpack(grads_and_vars)+    gradients = self.unscale_gradients(gradients)+    gradients = self.optimizer._transform_unaggregated_gradients(gradients)+    # Mixed precision needs to all-reduce on scaled gradients.+    gradients = self.scale_gradients(gradients)+    return zip(gradients, variables)++  def _aggregate_gradients(self, grads_and_vars):+    return aggregate_in_fp16(grads_and_vars)++  def _transform_gradients(self, grads_and_vars):+    gradients, variables = unpack(grads_and_vars)+    gradients = unscale_gradients(gradients)+    gradients = self.optimizer._transform_fgradients(gradients)+    return zip(gradients, updates)++  def _apply_updates(self, grads_and_vars):+    return self.optimizer._apply_updates(grads_and_vars)+```++(3) Horovod (only needs custom aggregation):++To support backwards compatibility for Horovod:++```python+class HorovodOptimizer(Optimizer):+  def __init__(self, optimizer):+    self.optimizer = optimizer++  def _get_hyper(self, name):+    # See previous example+    self.optimizer._get_hyper(name)++ def _aggregate_gradients(self, grads_and_vars):+    return horovod_aggregate_gradients(grads_and_vars)++ # All other methods described in this proposal simply delegate to `self.optimizer`+```+    +Or, if backwards compatibility is not needed, simply:++```python+optimizer = tf.keras.optimizers.Adam(1e-3, aggregate_gradients=horovod.aggregate)+```+++## Alternatives considered++#### Handle this only in Model.fit, via custom hooks exposed on the Model class+    +Why rejected:++Shifts the responsibility for implementing and calling these hooks onto each user rather than the writer of the Optimizer subclass (Many users will write custom training logic, many fewer will write Optimizer subclasses).++Solution is too Keras-specific, doesn’t solve the general problem.+++## Questions and Discussion Topics++Should we create a utility class to help with wrapping an `Optimizer`? I.e. `OptimizerWrapper`?

Sounds good, removed this from the discussion section and gave it a section on the main proposal

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):+    if is_tensor(loss) and not tape:+      raise ValueError('Must provide tape with tensor loss.')+    tape = tape or GradientTape()+    with tape:+      if callable(loss):+        loss = loss()+      loss = self._transform_loss(loss) # A no-op in our built-in optimizers+    gradients = self._get_gradients(loss, variables, tape)+    grads_and_vars = zip(gradients, variables)+    grads_and_vars = self._transform_unaggregated_gradients(grads_and_vars)+    if all_reduce_sum_gradients:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)+    return grads_and_vars++  def apply_gradients(self, grads_and_vars, aggregate=True):+    if aggregate:

Fixed the names. The name now is the same as the backwards compat name

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):+    if is_tensor(loss) and not tape:+      raise ValueError('Must provide tape with tensor loss.')

Done

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):

Changed the name of this param, but the idea is that the aggregation function is passed in the constructor, but whether aggregation is performed in Optimizer.apply_gradients is controlled by a bool in the method. We need to keep the bool for backwards compat

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns

Yes I was intending to make it so users could pass a single function or a list of functions

Added a more fleshed-out example using these functions. They accept and return grads_and_vars

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):

Done

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):+    if is_tensor(loss) and not tape:+      raise ValueError('Must provide tape with tensor loss.')+    tape = tape or GradientTape()+    with tape:+      if callable(loss):+        loss = loss()+      loss = self._transform_loss(loss) # A no-op in our built-in optimizers+    gradients = self._get_gradients(loss, variables, tape)+    grads_and_vars = zip(gradients, variables)+    grads_and_vars = self._transform_unaggregated_gradients(grads_and_vars)+    if all_reduce_sum_gradients:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)

Done

omalleyt12

comment created time in 2 months

push eventomalleyt12/community

omalleyt12

commit sha 488d3710dab17c83edc843efd9d5fd517da7acc1

Update 2020-04-20-Optimizer-minimize.md

view details

push time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,

tagged you in related comment for discussion

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses

@reedwm re related comment

omalleyt12

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):+    if is_tensor(loss) and not tape:+      raise ValueError('Must provide tape with tensor loss.')+    tape = tape or GradientTape()+    with tape:+      if callable(loss):+        loss = loss()+      loss = self._transform_loss(loss) # A no-op in our built-in optimizers+    gradients = self._get_gradients(loss, variables, tape)+    grads_and_vars = zip(gradients, variables)+    grads_and_vars = self._transform_unaggregated_gradients(grads_and_vars)+    if all_reduce_sum_gradients:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)+    return grads_and_vars++  def apply_gradients(self, grads_and_vars, aggregate=True):+    if aggregate:+      grads_and_vars = self._aggregate_gradients(grads_and_vars)+      grads_and_vars = self._transform_gradients(grads_and_vars)  # No-op by default+    # By passing all_reduce_sum_gradients, only the Variable updates are run.+    # This gives users complete control, in the case that they don't want to use+    # the hooks provided.+    self.apply_updates(grads_and_vars)+```+++Use of Optimizer.minimize in Model.train_step:++```python+class Model:++  def train_step(self, data):+    data = expand_1d(data)+    x, y, sample_weight = unpack_x_y_sample_weight(data)+    with tf.GradientTape() as tape:+       y_pred = self(x, training=True)+       loss = self.compiled_loss(y, y_pred, sample_weight, self.losses)+   self.optimizer.minimize(loss, self.trainable_variables, tape=tape)+   self.compiled_metrics.update_state(y, y_pred, sample_weight)+   return {m.name: m.result() for m in self.metrics}+```++Details of proposal:++* Adds the ability to accept a loss Tensor and a GradientTape to Optimizer.minimize.++* Maintains full backwards compatibility. When a callable loss is passed, simply create a GradientTape and call the loss inside it like currently done.++* Add public Optimizer methods that can be overridden to support custom functionality for the steps outlined in the Background section:+++(1) `Optimizer._transform_loss`++(2) `Optimizer._get_gradients`++(3) `Optimizer._transform_unaggregated_gradients`++(4) `Optimizer._aggregate_gradients`++(5) `Optimizer._transform_gradients` (aggregated gradients)++(6) `Optimizer._apply_updates` (calls existing existing _resource_apply_{dense|sparse})++(a) Item (6) mirrors `Sonnet`’s apply method (i.e. is “just the math”)++* Use Optimizer.minimize API in Model.fit++* Optimizer.apply_gradients method is kept. For users who want to control all loss and gradient manipulation, and want the Optimizer to simply apply the Variable updates, they can call `Optimizer.apply_gradients(..., aggregate=False)`+++## Examples++(1) Custom gradient clipping++```python+optimizer = tf.keras.optimizers.Adam(0.1, transform_gradients=my_gradient_clipping)

Done, fleshed out example

omalleyt12

comment created time in 2 months

push eventomalleyt12/community

omalleyt12

commit sha 02a04d8fd830a59a8f1ee9c9f37d60bb82f240ef

Update 2020-04-20-Optimizer-minimize.md

view details

push time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,

Fixed, actually I meant for this and below to both be transform_gradients

omalleyt12

comment created time in 2 months

push eventomalleyt12/community

omalleyt12

commit sha 642626e4eb52d7cd4119c2c2d85918367a740dcb

Update 2020-04-20-Optimizer-minimize.md

view details

push time in 2 months

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses

I changed the methods to public for now and added it as a discussion item for whether they should be private or public.

Re methods vs init arguments, my opinion is this: init arguments are great for the simple use cases we expect most users to have (e.g. gradient clipping, custom weight decay, aggregating by mean rather than by sum, etc). In these examples, each init argument is self-contained.

However, for more advanced use cases like LossScaling and differential privacy optimizers, the transformations needed at each step of the process (loss transform, unaggregated gradient transform, gradient aggregation, aggregated gradient transform) are tightly coupled. In these cases, having a subclass that contains all of this tightly coupled logic seems to make the most sense to me.

I think the current design, where the two most common transforms (aggregation and post-aggregations transformation) can be passed as __init__ arguments, and every discrete step of the minimize process has its own overridable method, achieves the best of the both worlds: simple use cases don't require subclassing, and advanced users have maximum flexibility

omalleyt12

comment created time in 2 months

push eventomalleyt12/community

omalleyt12

commit sha 4aac7f39604bcf3ebe05f8d563ee08fff832db9c

Update 2020-04-20-Optimizer-minimize.md

view details

push time in 2 months

more