profile
viewpoint

keras-team/keras 46840

Deep Learning for humans

keras-team/keras-preprocessing 688

Utilities for working with image data, text data, and sequence data.

pavithrasv/build 0

Build-related tools for TensorFlow

pavithrasv/examples 0

TensorFlow examples

pavithrasv/fairing 0

Python SDK for building, training, and deploying ML models

pavithrasv/keras 0

Deep Learning for humans

pavithrasv/keras-tuner 0

Hyperparameter tuning for humans

issue commenttensorflow/tensorflow

tf.keras.backend.set_floatx() causing ValueError (dtype conversion error) while computing tf.keras.metrics.*

Thank you @MarkDaoust. It should be cast to the predictions' dtype. If anyone would like to work on the fix please feel free to send me a PR.

Hemal-Mamtora

comment created time in 4 hours

push eventpavithrasv/cloud

Pavithra Vijay

commit sha 15305cbf044ec6994a442094230b5581a0fb4765

Remove region from the API as it is GCP specific.

view details

push time in 4 days

issue commentkeras-team/keras

invalid literal for int() with base 10: 'Participant1'

Sorry about the delay, can you share the code that you ran to get the above error, if possible as standalone code snippet that would help us repro the issue?

sekti92

comment created time in 4 days

issue closedtensorflow/tensorflow

Why K.mean is used in tf.keras.losses.binary_crossentropy ?

In line 993 of the code of tf.keras.losses.binary_crossentropy, K.mean is called on axis -1 of K.binary_crossentropy(y_true, y_pred, from_logits=from_logits).

I wonder why there is this K.mean call and why tf.keras.losses.binary_crossentropy doesn't simply return K.binary_crossentropy(y_true, y_pred, from_logits=from_logits).

On the contrary, tf.keras.losses.categorical_crossentropy and tf.keras.losses.sparse_categorical_crossentropy just return the call to their tf.keras.backend equivalent.

I think it may be inconsistent and misleading, especially because tf.keras.losses.categorical_crossentropy and tf.keras.backend.categorical_crossentropy behave similarly, but not tf.keras.losses.binary_crossentropy and tf.keras.backend.binary_crossentropy, so higher level objects like tf.keras.metrics.CategoricalCrossentropy may not work as one would expect.

closed time in 4 days

durandg12

pull request commenttensorflow/cloud

Initial commit.

Thank you for the quick review Francois, updated the docs and addressed the comments. PTAL!

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import google.auth+++def get_project_name():+    # https://google-auth.readthedocs.io/en/latest/reference/google.auth.html+    _, project_id = google.auth.default()+    if project_id is None:+        raise Exception('Could not determine the GCP project id.')++    return project_id+++def validate_machine_configuration(+        cpu_cores, memory, accelerator_type, accelerator_count):+    valid_configurations = _get_valid_machine_configurations()+    current_config = (+        cpu_cores, memory, accelerator_type.value, accelerator_count)+    if current_config not in valid_configurations:+        raise ValueError(+            'Invalid machine configuration: cpu_cores:{}, memory:{}, '+            'accelerator_type:{}, accelerator_count:{}. Please see the '+            'following AI platform comptibility table for all valid '+            'configurations: '+            'https://cloud.google.com/ml-engine/docs/using-gpus#'+            'compute-engine-machine-types-with-gpu'.format(+                cpu_cores, memory, str(accelerator_type), accelerator_count))+++def get_region():+    return 'us-central1'+++def get_accelerator_type(accl_type):+    if accl_type == 'CPU':+        return 'ACCELERATOR_TYPE_UNSPECIFIED'+    if accl_type == 'K80':+        return 'NVIDIA_TESLA_K80'+    if accl_type == 'P100':+        return 'NVIDIA_TESLA_P100'+    if accl_type == 'V100':+        return 'NVIDIA_TESLA_V100'+    if accl_type == 'P4':+        return 'NVIDIA_TESLA_P4'+    if accl_type == 'T4':+        return 'NVIDIA_TESLA_T4'+    else:+        raise ValueError('Invalid accelerator type.')+++def get_machine_type(cpu_cores, memory):+    config = (cpu_cores, memory)+    if config == (4, 15):+        return 'n1-standard-4'+    if config == (8, 30):+        return 'n1-standard-8'+    if config == (16, 60):+        return 'n1-standard-16'+    if config == (32, 120):+        return 'n1-standard-32'+    if config == (64, 240):+        return 'n1-standard-64'+    if config == (96, 360):+        return 'n1-standard-96'+    if config == (2, 13):+        return 'n1-highmem-2'+    if config == (4, 26):+        return 'n1-highmem-4'+    if config == (8, 52):+        return 'n1-highmem-8'+    if config == (16, 104):+        return 'n1-highmem-16'+    if config == (32, 208):+        return 'n1-highmem-32'+    if config == (64, 416):+        return 'n1-highmem-64'+    if config == (96, 624):+        return 'n1-highmem-96'+    if config == (16, 14.4):+        return 'n1-highcpu-16'+    if config == (32, 28.8):+        return 'n1-highcpu-32'+    if config == (64, 57.6):+        return 'n1-highcpu-64'+    if config == (96, 86.4):+        return 'n1-highcpu-96'+    else:+        raise ValueError('Invalid machine type.')

This actually should never happen, this is not a public API. 'Validate_machine_configurations' API above validates machine type at creation time and raises error if the input is invalid. That also provides link to the valid machine types list.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import google.auth+++def get_project_name():+    # https://google-auth.readthedocs.io/en/latest/reference/google.auth.html+    _, project_id = google.auth.default()+    if project_id is None:+        raise Exception('Could not determine the GCP project id.')++    return project_id+++def validate_machine_configuration(+        cpu_cores, memory, accelerator_type, accelerator_count):+    valid_configurations = _get_valid_machine_configurations()+    current_config = (+        cpu_cores, memory, accelerator_type.value, accelerator_count)+    if current_config not in valid_configurations:+        raise ValueError(+            'Invalid machine configuration: cpu_cores:{}, memory:{}, '+            'accelerator_type:{}, accelerator_count:{}. Please see the '+            'following AI platform comptibility table for all valid '+            'configurations: '+            'https://cloud.google.com/ml-engine/docs/using-gpus#'+            'compute-engine-machine-types-with-gpu'.format(+                cpu_cores, memory, str(accelerator_type), accelerator_count))+++def get_region():+    return 'us-central1'+++def get_accelerator_type(accl_type):+    if accl_type == 'CPU':+        return 'ACCELERATOR_TYPE_UNSPECIFIED'+    if accl_type == 'K80':+        return 'NVIDIA_TESLA_K80'+    if accl_type == 'P100':+        return 'NVIDIA_TESLA_P100'+    if accl_type == 'V100':+        return 'NVIDIA_TESLA_V100'+    if accl_type == 'P4':+        return 'NVIDIA_TESLA_P4'+    if accl_type == 'T4':+        return 'NVIDIA_TESLA_T4'+    else:+        raise ValueError('Invalid accelerator type.')

This actually should never happen, this is not a public API. Accelerator type class validates accelerator type at creation time and raises error if the input is invalid.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import google.auth+++def get_project_name():+    # https://google-auth.readthedocs.io/en/latest/reference/google.auth.html+    _, project_id = google.auth.default()+    if project_id is None:+        raise Exception('Could not determine the GCP project id.')++    return project_id+++def validate_machine_configuration(+        cpu_cores, memory, accelerator_type, accelerator_count):+    valid_configurations = _get_valid_machine_configurations()+    current_config = (+        cpu_cores, memory, accelerator_type.value, accelerator_count)+    if current_config not in valid_configurations:+        raise ValueError(+            'Invalid machine configuration: cpu_cores:{}, memory:{}, '+            'accelerator_type:{}, accelerator_count:{}. Please see the '+            'following AI platform comptibility table for all valid '+            'configurations: '+            'https://cloud.google.com/ml-engine/docs/using-gpus#'+            'compute-engine-machine-types-with-gpu'.format(+                cpu_cores, memory, str(accelerator_type), accelerator_count))+++def get_region():+    return 'us-central1'+++def get_accelerator_type(accl_type):+    if accl_type == 'CPU':+        return 'ACCELERATOR_TYPE_UNSPECIFIED'+    if accl_type == 'K80':+        return 'NVIDIA_TESLA_K80'+    if accl_type == 'P100':+        return 'NVIDIA_TESLA_P100'+    if accl_type == 'V100':+        return 'NVIDIA_TESLA_V100'+    if accl_type == 'P4':+        return 'NVIDIA_TESLA_P4'+    if accl_type == 'T4':+        return 'NVIDIA_TESLA_T4'+    else:+        raise ValueError('Invalid accelerator type.')+++def get_machine_type(cpu_cores, memory):

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import google.auth+++def get_project_name():+    # https://google-auth.readthedocs.io/en/latest/reference/google.auth.html+    _, project_id = google.auth.default()+    if project_id is None:+        raise Exception('Could not determine the GCP project id.')++    return project_id+++def validate_machine_configuration(+        cpu_cores, memory, accelerator_type, accelerator_count):+    valid_configurations = _get_valid_machine_configurations()+    current_config = (+        cpu_cores, memory, accelerator_type.value, accelerator_count)+    if current_config not in valid_configurations:+        raise ValueError(+            'Invalid machine configuration: cpu_cores:{}, memory:{}, '+            'accelerator_type:{}, accelerator_count:{}. Please see the '+            'following AI platform comptibility table for all valid '+            'configurations: '+            'https://cloud.google.com/ml-engine/docs/using-gpus#'+            'compute-engine-machine-types-with-gpu'.format(+                cpu_cores, memory, str(accelerator_type), accelerator_count))+++def get_region():+    return 'us-central1'+++def get_accelerator_type(accl_type):

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import os+import sys++from . import containerize+from . import deploy+from . import gcp+from . import machine_config+from . import package+from . import preprocess+from . import validate+++# Flag which indicates whether current process is running in a cloud+# environment created by the `cloud.run` API.+_IS_RUNNING_REMOTELY = False+++def _is_running_remotely():+    return _IS_RUNNING_REMOTELY+++def _set_running_remotely(value):+    global _IS_RUNNING_REMOTELY+    _IS_RUNNING_REMOTELY = value+++def run(+    entry_point,+    requirements_txt=None,+    distribution_strategy='auto',+    docker_base_image=None,+    chief_config='auto',+    worker_config='auto',+    worker_count=0,+    region=None,+    entry_point_args=None,+    stream_logs=False,+):+    """Runs your Tensorflow code in Google Cloud Platform.++    # Arguments:+        entry_point: String. Python file path to the file that contains the+            TensorFlow code.+            Note: This path must be in the current working directory tree.+            Example: 'train.py', 'training/mnist.py'+        requirements_txt: Optional string. File path to requirements.txt file+            containing aditionally pip dependencies if any.+            Note: This path must be in the current working directory tree.+            Example: 'requirements.txt', 'deps/reqs.txt'+        distribution_strategy: 'auto' or None. Defaults to 'auto'.+            'auto' means we will take care of creating a Tensorflow+            distribution strategy instance based on the machine configurations+            you have provided using the `chief_config`, `worker_config` and+            `worker_count` params.+            - If the number of workers > 0, we will use+                `tf.distribute.experimental.MultiWorkerMirroredStrategy`.+            - If number of GPUs > 0, we will use+                `tf.distribute.MirroredStrategy`+            If you have created a distribution strategy instance in your script+            already, please set `distribution_stratgey` as None here.+            For example, if you are using `tf.keras` custom training loops,+            you will need to create a strategy in the script for distributing+            the dataset.+        docker_base_image: Optional base docker image to use. Defaults to None.+            Example: 'gcr.io/my_gcp_project/deep_learning:v2'+            If a base docker image is not provided here, we will use a+            Tensorflow docker image (https://www.tensorflow.org/install/docker)+            as the base image. The version of TensorFlow and Python in that+            case will match your local environment.+        chief_config: Optional `MachineConfig` that represents the+            configuration for the chief worker in a distribution cluster.+            Defaults to 'auto'. 'auto' maps to a standard gpu config such as+            `COMMON_MACHINE_CONFIGS.P100_1X` (8 cpu cores, 30GB memory,+            1 Nvidia Tesla P100).+        worker_config: Optional `MachineConfig` that represents the+            configuration for the general workers in a distribution cluster.+            Defaults to 'auto'. 'auto' maps to a standard gpu config such as+            `COMMON_MACHINE_CONFIGS.P100_1X` (8 cpu cores, 30GB memory,+            1 Nvidia Tesla P100).+        worker_count: Optional integer that represents the number of general+            workers in a distribution cluster. Defaults to 0. This count does+            not include the chief worker.+        region: Optional string. Cloud region in which to submit the+            job. Defaults to 'us-central1' for GCP.+        entry_point_args: Optional list of strings. Defaults to None.+            Command line arguments to pass to the `entry_point` program.+        stream_logs: Boolean flag which when enabled streams logs back from+            the cloud job.+    """+    # If code is triggered in a cloud environment, do nothing.+    if _is_running_remotely():+        return+    _set_running_remotely(True)++    # Get defaults.+    if chief_config == 'auto':+        chief_config = machine_config.COMMON_MACHINE_CONFIGS['P100_1X']+    if worker_config == 'auto':+        worker_config = machine_config.COMMON_MACHINE_CONFIGS['P100_1X']+    region = region or gcp.get_region()+    dst_path_prefix = '/app/'+    docker_registry = 'gcr.io/{}'.format(gcp.get_project_name())++    # Run validations.+    validate.validate(+        entry_point, distribution_strategy, requirements_txt,+        chief_config, worker_config, worker_count, region,+        entry_point_args, stream_logs)++    # Create the script to run (starter_script).+    # Make the `entry_point` cloud and distribution ready.+    startup_script = preprocess.get_startup_script(+        entry_point, chief_config, worker_count, distribution_strategy)++    # Get all the files, that we need to package, mapped to the dst location.+    # This will include the startup script, requirements_txt, dockerfile,+    # files in the entry_point dir.+    dockerfile, file_map = containerize.get_file_map(+        entry_point, startup_script, chief_config, requirements_txt,+        dst_path_prefix, docker_base_image)++    # Create a tarball with the files.+    tarball = package.get_tarball(file_map)++    # Create docker image.+    docker_img = containerize.get_docker_image(docker_registry, tarball)++    # Delete all the temporary files we created.

Made 'wrapped_entry_point' script creation optional - one less deletion to worry about. I am writing some examples and detailed usages doc - will mention these in that.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import os+import sys++from . import containerize+from . import deploy+from . import gcp+from . import machine_config+from . import package+from . import preprocess+from . import validate+++# Flag which indicates whether current process is running in a cloud+# environment created by the `cloud.run` API.+_IS_RUNNING_REMOTELY = False+++def _is_running_remotely():+    return _IS_RUNNING_REMOTELY+++def _set_running_remotely(value):+    global _IS_RUNNING_REMOTELY+    _IS_RUNNING_REMOTELY = value+++def run(+    entry_point,+    requirements_txt=None,+    distribution_strategy='auto',+    docker_base_image=None,+    chief_config='auto',+    worker_config='auto',+    worker_count=0,+    region=None,+    entry_point_args=None,+    stream_logs=False,+):+    """Runs your Tensorflow code in Google Cloud Platform.++    # Arguments:

Fixed, using Args: throughtout.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import os+import sys++from . import containerize+from . import deploy+from . import gcp+from . import machine_config+from . import package+from . import preprocess+from . import validate+++# Flag which indicates whether current process is running in a cloud+# environment created by the `cloud.run` API.+_IS_RUNNING_REMOTELY = False+++def _is_running_remotely():+    return _IS_RUNNING_REMOTELY+++def _set_running_remotely(value):+    global _IS_RUNNING_REMOTELY+    _IS_RUNNING_REMOTELY = value+++def run(+    entry_point,+    requirements_txt=None,+    distribution_strategy='auto',+    docker_base_image=None,+    chief_config='auto',+    worker_config='auto',+    worker_count=0,+    region=None,+    entry_point_args=None,+    stream_logs=False,+):

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import os+import tempfile++from .machine_config import AcceleratorType+++def get_startup_script(entry_point,

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tarfile+import tempfile+++def get_tarball(file_location_map):

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import random+import string+import subprocess++from googleapiclient import discovery+from googleapiclient import errors++from . import gcp+++def deploy_job(region, image_uri, chief_config, worker_count, worker_config,+               entry_point_args, enable_stream_logs):+    job_name = _get_name()+    project_id = gcp.get_project_name()+    ml_apis = discovery.build('ml', 'v1')++    request_dict = _create_request_dict(+        job_name, region, image_uri, chief_config, worker_count, worker_config,+        entry_point_args)+    try:+        response = ml_apis.projects().jobs().create(+            parent='projects/{}'.format(project_id),+            body=request_dict+        ).execute()+        print('Job submitted successfully.')+        _print_logs_info(job_name, project_id)+        # TODO(psv): Add support for streaming logs.+    except errors.HttpError as err:+        print('There was an error submitting the job.')

Raising a RuntimeError with the error reason.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import random+import string+import subprocess++from googleapiclient import discovery+from googleapiclient import errors++from . import gcp+++def deploy_job(region, image_uri, chief_config, worker_count, worker_config,+               entry_point_args, enable_stream_logs):+    job_name = _get_name()+    project_id = gcp.get_project_name()+    ml_apis = discovery.build('ml', 'v1')++    request_dict = _create_request_dict(+        job_name, region, image_uri, chief_config, worker_count, worker_config,+        entry_point_args)+    try:+        response = ml_apis.projects().jobs().create(+            parent='projects/{}'.format(project_id),+            body=request_dict+        ).execute()+        print('Job submitted successfully.')+        _print_logs_info(job_name, project_id)+        # TODO(psv): Add support for streaming logs.+    except errors.HttpError as err:+        print('There was an error submitting the job.')+        print(err._get_reason())+    return job_name+++def _create_request_dict(job_name, region, image_uri, chief_config,+                         worker_count, worker_config, entry_point_args):+    trainingInput = {}

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import random+import string+import subprocess++from googleapiclient import discovery+from googleapiclient import errors++from . import gcp+++def deploy_job(region, image_uri, chief_config, worker_count, worker_config,+               entry_point_args, enable_stream_logs):+    job_name = _get_name()+    project_id = gcp.get_project_name()+    ml_apis = discovery.build('ml', 'v1')++    request_dict = _create_request_dict(+        job_name, region, image_uri, chief_config, worker_count, worker_config,+        entry_point_args)+    try:+        response = ml_apis.projects().jobs().create(+            parent='projects/{}'.format(project_id),+            body=request_dict+        ).execute()+        print('Job submitted successfully.')+        _print_logs_info(job_name, project_id)+        # TODO(psv): Add support for streaming logs.+    except errors.HttpError as err:+        print('There was an error submitting the job.')+        print(err._get_reason())+    return job_name+++def _create_request_dict(job_name, region, image_uri, chief_config,+                         worker_count, worker_config, entry_point_args):+    trainingInput = {}+    trainingInput['region'] = region+    trainingInput['scaleTier'] = 'custom'+    trainingInput['masterType'] = gcp.get_machine_type(+        chief_config.cpu_cores, chief_config.memory)++    # Set master config+    masterConfig = {}+    masterConfig['imageUri'] = image_uri+    masterConfig['acceleratorConfig'] = {}+    masterConfig['acceleratorConfig']['count'] = str(+        chief_config.accelerator_count)+    masterConfig['acceleratorConfig']['type'] = gcp.get_accelerator_type(+        chief_config.accelerator_type.value)++    trainingInput['masterConfig'] = masterConfig+    trainingInput['workerCount'] = str(worker_count)++    if worker_count > 0:+        trainingInput['workerType'] = gcp.get_machine_type(+            worker_config.cpu_cores, worker_config.memory)++        workerConfig = {}+        workerConfig['imageUri'] = image_uri+        workerConfig['acceleratorConfig'] = {}+        workerConfig['acceleratorConfig']['count'] = str(+            worker_config.accelerator_count)+        workerConfig['acceleratorConfig']['type'] = gcp.get_accelerator_type(+            worker_config.accelerator_type.value)+        trainingInput['workerConfig'] = workerConfig++    if entry_point_args is not None:+        trainingInput['args'] = entry_point_args+    trainingInput['use_chief_in_tf_config'] = True+    request_dict = {}+    request_dict['jobId'] = job_name+    request_dict['trainingInput'] = trainingInput+    return request_dict+++def _print_logs_info(job_name, project_id):+    print('Your job ID is: ', job_name)+    print('Please access your job logs at the following URL:')+    print('https://console.cloud.google.com/mlengine/jobs/{}?project={}'+          .format(job_name, project_id))+++def _get_name():+    unique_tag = ''.join(random.choice(

Replcaed with uuid.uuid4().

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import google.auth+++def get_project_name():+    # https://google-auth.readthedocs.io/en/latest/reference/google.auth.html+    _, project_id = google.auth.default()+    if project_id is None:+        raise Exception('Could not determine the GCP project id.')

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import random+import string+import subprocess++from googleapiclient import discovery+from googleapiclient import errors++from . import gcp+++def deploy_job(region, image_uri, chief_config, worker_count, worker_config,

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,+                 requirements_txt=None, dst_dir='/app/',+                 docker_base_image=None):+    location_map = {}+    # Map entry_point directory to the dst directory.+    entry_point_dir, _ = os.path.split(entry_point)+    if entry_point_dir == '':  # Current directory+        entry_point_dir = '.'+    location_map[entry_point_dir] = dst_dir++    # Place startup_script in the dst directory.+    _, startup_file_name = os.path.split(startup_script)+    location_map[startup_script] = os.path.join(dst_dir, startup_file_name)++    # Place requirements_txt in the dst directory.+    if requirements_txt is not None:+        _, requirements_txt_name = os.path.split(+            requirements_txt)+        location_map[requirements_txt] = os.path.join(+            dst_dir, requirements_txt_name)++    # Place docker file in the root directory.+    docker_file = _create_docker_file(+        startup_script, chief_config, requirements_txt,+        dst_dir, docker_base_image)+    location_map[docker_file] = 'Dockerfile'+    return docker_file, location_map+++def get_docker_image(docker_registry, tar_file):+    docker_client = APIClient(version='auto')+    # create docker image from tarball+    image_tag = _build_docker_image(docker_registry, tar_file, docker_client)+    # push to the registry+    _publish_docker_image(image_tag, docker_client)+    return image_tag+++def _build_docker_image(docker_registry, tar_file, docker_client):+    image_tag = _generate_name(docker_registry)+    logger.info(' Building docker image: {}'.format(image_tag))+    with open(tar_file, 'rb') as fileobj:+        bld_logs_generator = docker_client.build(+            path='.',+            custom_context=True,+            fileobj=fileobj,+            tag=image_tag,+            encoding='utf-8')+    _get_logs(bld_logs_generator, 'build')+    return image_tag+++def _publish_docker_image(image_tag, docker_client):+    logger.info(' Publishing docker image: {}'.format(image_tag))+    pb_logs_generator = docker_client.push(image_tag, stream=True)+    _get_logs(pb_logs_generator, 'publish')+++def _create_docker_file(startup_script, chief_config, requirements_txt,+                        dst_dir, docker_base_image):+    # Create a Dockerfile.+    _, output_file = tempfile.mkstemp()++    if docker_base_image is None:+        # Get the TF docker base image to use based on the current TF+        # and python version.+        docker_base_image = 'tensorflow/tensorflow:{}'.format(VERSION)+        if (chief_config.accelerator_type !=+                machine_config.AcceleratorType.NO_ACCELERATOR):+            docker_base_image += '-gpu'++        if sys.version_info[0] == 3:+            docker_base_image += '-py3'++    lines = ['FROM {}'.format(docker_base_image), 'WORKDIR {}'.format(dst_dir)]+    lines.append('COPY {} {}'.format(dst_dir, dst_dir))++    if requirements_txt is not None:+        _, requirements_txt_name = os.path.split(requirements_txt)+        dst_requirements_txt = os.path.join(requirements_txt_name)+        # install pip requirements from requirements_txt if it exists.+        lines.append('RUN if [ -e {} ]; '+                     'then pip install --no-cache -r {}; '+                     'fi'.format(dst_requirements_txt, dst_requirements_txt))++    _, startup_file_name = os.path.split(startup_script)+    # Using `ENTRYPOINT` here instead of `CMD` specifically because we want to+    # support passing user code flags.+    lines.extend([+        'ENTRYPOINT ["python", "{}"]'.format(startup_file_name)+    ])++    content = '\n'.join(lines)+    with open(output_file, 'w') as f:+        f.write(content)+    return output_file+++def _generate_name(docker_registry):+    unique_tag = ''.join(random.choice(+        string.ascii_lowercase + string.digits) for _ in range(32))

Replcaed with uuid.uuid4().

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,+                 requirements_txt=None, dst_dir='/app/',+                 docker_base_image=None):+    location_map = {}+    # Map entry_point directory to the dst directory.+    entry_point_dir, _ = os.path.split(entry_point)+    if entry_point_dir == '':  # Current directory+        entry_point_dir = '.'+    location_map[entry_point_dir] = dst_dir++    # Place startup_script in the dst directory.+    _, startup_file_name = os.path.split(startup_script)+    location_map[startup_script] = os.path.join(dst_dir, startup_file_name)++    # Place requirements_txt in the dst directory.+    if requirements_txt is not None:+        _, requirements_txt_name = os.path.split(+            requirements_txt)+        location_map[requirements_txt] = os.path.join(+            dst_dir, requirements_txt_name)++    # Place docker file in the root directory.+    docker_file = _create_docker_file(+        startup_script, chief_config, requirements_txt,+        dst_dir, docker_base_image)+    location_map[docker_file] = 'Dockerfile'+    return docker_file, location_map+++def get_docker_image(docker_registry, tar_file):+    docker_client = APIClient(version='auto')+    # create docker image from tarball+    image_tag = _build_docker_image(docker_registry, tar_file, docker_client)+    # push to the registry+    _publish_docker_image(image_tag, docker_client)+    return image_tag+++def _build_docker_image(docker_registry, tar_file, docker_client):+    image_tag = _generate_name(docker_registry)+    logger.info(' Building docker image: {}'.format(image_tag))+    with open(tar_file, 'rb') as fileobj:+        bld_logs_generator = docker_client.build(+            path='.',+            custom_context=True,+            fileobj=fileobj,+            tag=image_tag,+            encoding='utf-8')+    _get_logs(bld_logs_generator, 'build')+    return image_tag+++def _publish_docker_image(image_tag, docker_client):+    logger.info(' Publishing docker image: {}'.format(image_tag))+    pb_logs_generator = docker_client.push(image_tag, stream=True)+    _get_logs(pb_logs_generator, 'publish')+++def _create_docker_file(startup_script, chief_config, requirements_txt,+                        dst_dir, docker_base_image):+    # Create a Dockerfile.+    _, output_file = tempfile.mkstemp()++    if docker_base_image is None:+        # Get the TF docker base image to use based on the current TF+        # and python version.+        docker_base_image = 'tensorflow/tensorflow:{}'.format(VERSION)+        if (chief_config.accelerator_type !=+                machine_config.AcceleratorType.NO_ACCELERATOR):+            docker_base_image += '-gpu'++        if sys.version_info[0] == 3:

Good point, removed the python version check.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,+                 requirements_txt=None, dst_dir='/app/',+                 docker_base_image=None):+    location_map = {}+    # Map entry_point directory to the dst directory.+    entry_point_dir, _ = os.path.split(entry_point)+    if entry_point_dir == '':  # Current directory+        entry_point_dir = '.'+    location_map[entry_point_dir] = dst_dir++    # Place startup_script in the dst directory.+    _, startup_file_name = os.path.split(startup_script)+    location_map[startup_script] = os.path.join(dst_dir, startup_file_name)++    # Place requirements_txt in the dst directory.+    if requirements_txt is not None:+        _, requirements_txt_name = os.path.split(+            requirements_txt)+        location_map[requirements_txt] = os.path.join(+            dst_dir, requirements_txt_name)++    # Place docker file in the root directory.+    docker_file = _create_docker_file(+        startup_script, chief_config, requirements_txt,+        dst_dir, docker_base_image)+    location_map[docker_file] = 'Dockerfile'+    return docker_file, location_map+++def get_docker_image(docker_registry, tar_file):+    docker_client = APIClient(version='auto')+    # create docker image from tarball+    image_tag = _build_docker_image(docker_registry, tar_file, docker_client)+    # push to the registry+    _publish_docker_image(image_tag, docker_client)+    return image_tag+++def _build_docker_image(docker_registry, tar_file, docker_client):+    image_tag = _generate_name(docker_registry)+    logger.info(' Building docker image: {}'.format(image_tag))+    with open(tar_file, 'rb') as fileobj:+        bld_logs_generator = docker_client.build(+            path='.',+            custom_context=True,+            fileobj=fileobj,+            tag=image_tag,+            encoding='utf-8')+    _get_logs(bld_logs_generator, 'build')+    return image_tag+++def _publish_docker_image(image_tag, docker_client):+    logger.info(' Publishing docker image: {}'.format(image_tag))+    pb_logs_generator = docker_client.push(image_tag, stream=True)+    _get_logs(pb_logs_generator, 'publish')+++def _create_docker_file(startup_script, chief_config, requirements_txt,

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,+                 requirements_txt=None, dst_dir='/app/',+                 docker_base_image=None):+    location_map = {}+    # Map entry_point directory to the dst directory.+    entry_point_dir, _ = os.path.split(entry_point)+    if entry_point_dir == '':  # Current directory+        entry_point_dir = '.'+    location_map[entry_point_dir] = dst_dir++    # Place startup_script in the dst directory.+    _, startup_file_name = os.path.split(startup_script)+    location_map[startup_script] = os.path.join(dst_dir, startup_file_name)++    # Place requirements_txt in the dst directory.+    if requirements_txt is not None:+        _, requirements_txt_name = os.path.split(+            requirements_txt)+        location_map[requirements_txt] = os.path.join(+            dst_dir, requirements_txt_name)++    # Place docker file in the root directory.+    docker_file = _create_docker_file(+        startup_script, chief_config, requirements_txt,+        dst_dir, docker_base_image)+    location_map[docker_file] = 'Dockerfile'+    return docker_file, location_map+++def get_docker_image(docker_registry, tar_file):+    docker_client = APIClient(version='auto')+    # create docker image from tarball+    image_tag = _build_docker_image(docker_registry, tar_file, docker_client)+    # push to the registry+    _publish_docker_image(image_tag, docker_client)+    return image_tag+++def _build_docker_image(docker_registry, tar_file, docker_client):

'run', 'MachineConfig', 'AcceleratorType', 'COMMON_MACHINE_CONFIGS' are public, rest all are utilities structured into modules. I have used prefix '_' for module local functions and no prefix for the ones which are accessed across modules.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,+                 requirements_txt=None, dst_dir='/app/',+                 docker_base_image=None):+    location_map = {}+    # Map entry_point directory to the dst directory.+    entry_point_dir, _ = os.path.split(entry_point)+    if entry_point_dir == '':  # Current directory+        entry_point_dir = '.'+    location_map[entry_point_dir] = dst_dir++    # Place startup_script in the dst directory.+    _, startup_file_name = os.path.split(startup_script)+    location_map[startup_script] = os.path.join(dst_dir, startup_file_name)++    # Place requirements_txt in the dst directory.+    if requirements_txt is not None:+        _, requirements_txt_name = os.path.split(+            requirements_txt)+        location_map[requirements_txt] = os.path.join(+            dst_dir, requirements_txt_name)++    # Place docker file in the root directory.+    docker_file = _create_docker_file(+        startup_script, chief_config, requirements_txt,+        dst_dir, docker_base_image)+    location_map[docker_file] = 'Dockerfile'+    return docker_file, location_map+++def get_docker_image(docker_registry, tar_file):

Updated to tar_file_path.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,+                 requirements_txt=None, dst_dir='/app/',

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

  <h2>What is this repo?</h2> -This repository provides APIs that will allow you to easily go from debugging and training your TensorFlow code in a local environment to distributed training in the cloud.+This repository provides APIs that will allow to easily go from debugging and training your TensorFlow code in a local environment to distributed training in the cloud.

Done.

pavithrasv

comment created time in 4 days

Pull request review commenttensorflow/cloud

Initial commit.

+# Copyright 2020 Google LLC. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import json+import logging+import os+import random+import string+import sys+import tempfile++from . import machine_config++from docker import APIClient+from tensorflow.python.framework.versions import VERSION+++logger = logging.getLogger(__name__)+logging.basicConfig(level=logging.INFO)+++def get_file_map(entry_point, startup_script, chief_config,

Done.

pavithrasv

comment created time in 4 days

push eventpavithrasv/cloud

Pavithra Vijay

commit sha 63ba5971e78062e1ff0c8401e612bdd62f89fca4

Update deply_job docs.

view details

push time in 4 days

push eventpavithrasv/cloud

Pavithra Vijay

commit sha c4e769e004b4b41860186f355adf6337f8ab0d45

Add doc strings for all methods, optionally create wrapped_entry_point_script, update unique id generation.

view details

push time in 4 days

issue commenttensorflow/tensorflow

Why K.mean is used in tf.keras.losses.binary_crossentropy ?

Thank you @durandg12 . Historically, the loss functions have been expecting that inputs are at least 2D and computing mean on the last axis in order to support sample weighting correctly.

y_true: Ground truth values. shape = [batch_size, d0, .. dN].
y_pred: The predicted values. shape = [batch_size, d0, .. dN].
sample_weight: Optional sample_weight acts as a coefficient for the metric. If a scalar is provided, then the metric is simply scaled by the given value. If sample_weight is a tensor of size [batch_size], then the metric for each sample of the batch is rescaled by the corresponding element in the sample_weight vector. If the shape of sample_weight is [batch_size, d0, .. dN-1] (or can be broadcasted to this shape), then each metric element of y_pred is scaled by the corresponding value of sample_weight. (Note on dN-1: all metric functions reduce by 1 dimension, usually the last axis (-1)).

If your input labels are [batch_size, d0] the result from the functions will be [batch_size] ie. one loss value per sample. This applies to binary, categorical and sparse categorical crossentropy functions.

tf.keras.losses.binary_crossentropy([[0], [1]], [[0.3], [0.8]]) <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.3566748 , 0.22314338], dtype=float32)>

tf.keras.losses.categorical_crossentropy([[0, 1, 0], [0, 0, 1]], [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]) <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.05129331, 2.3025851 ], dtype=float32)>

tf.keras.losses.sparse_categorical_crossentropy([1, 2], [[0.05, 0.95, 0], [0.1, 0.8, 0.1]]) <tf.Tensor: shape=(2,), dtype=float32, numpy=array([0.05129344, 2.3025851 ], dtype=float32)>

In all the three cases above, the # samples = 2.

Hope this clears the confusion between the cross entropy functions.

durandg12

comment created time in 4 days

issue commenttensorflow/tensorflow

Dense does not flatten inputs with rank >2 and behaves exactly like TimeDistributed(Dense)

The change to TimeDistributed docs have also been submitted. Thank you!

durandg12

comment created time in 6 days

fork pavithrasv/keras-tuner

Hyperparameter tuning for humans

fork in 6 days

issue commenttensorflow/tensorflow

Dense does not flatten inputs with rank >2 and behaves exactly like TimeDistributed(Dense)

They are the same, TimeDistributed doesn't just apply to Dense layers but i see that the main example in the TimeDistributed docs is using Dense layer. i'll update that.

durandg12

comment created time in 7 days

issue commenttensorflow/tensorflow

Dense does not flatten inputs with rank >2 and behaves exactly like TimeDistributed(Dense)

Commit https://github.com/tensorflow/tensorflow/commit/2e6a3c58e4b96cac864f244e4886ef00b3184986#diff-5fb1fa5fa46d0ec9a01d5a60b7d8acc8

durandg12

comment created time in 8 days

issue closedtensorflow/tensorflow

Dense does not flatten inputs with rank >2 and behaves exactly like TimeDistributed(Dense)

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.6
  • TensorFlow installed from (source or binary): from pip install
  • TensorFlow version (use command below): v2.0.0-beta0-16-g1d91213fe7 2.0.0-beta1
  • Python version: v3.6.7:6ec5cf24b7, Oct 20 2018, 03:02:14

Describe the current behavior A note in Dense documentation says that

Note: If the input to the layer has a rank greater than 2, then it is flattened prior to the initial dot product with kernel.

I don't see this happening in real life. Instead, Dense behaves on a 3-rank tensor as it would behave if it was wrapped in a TimeDistributed layer, making me question the utility of TimeDistributed at all.

Describe the expected behavior Dense should flatten its input like the documentation says. In the first example bellow, the shape of the kernel weights of dense should be (5 * 3, 2) = (15, 2) instead of (3, 2), which is the shape of dense2 (as expected in the case of dense2).

Code to reproduce the issue

First example:

import tensorflow as tf
import numpy as np

print('Using Tensorflow version {} (git version {})'.format(tf.version.VERSION, tf.version.GIT_VERSION))

tf.random.set_seed(12)
np.random.seed(12)

init = tf.keras.initializers.GlorotUniform(seed=12)

inp = tf.constant(np.random.normal(0, 1, (1, 5, 6)))
inp = tf.cast(inp, dtype=tf.float32)

gru = tf.keras.layers.GRU(3, return_sequences=True)(inp)
print(gru.shape)
#(1, 5, 3)

dense = tf.keras.layers.Dense(2, kernel_initializer=init, bias_initializer=init)
print(dense(gru))
#tf.Tensor(
#[[[ 1.5456871  -0.5280464 ]
#  [ 0.11647969 -0.20553198]
#  [ 0.58126366 -0.16031623]
#  [-0.22882831 -0.22649539]
#  [ 0.62777793 -0.32470667]]], shape=(1, 5, 2), dtype=float32)

for w in dense.weights:
    print(w.shape)
#(3, 2) instead of (5 * 3, 2) if Dense indeed flattened its input
#(2,)

tddense = tf.keras.layers.TimeDistributed(dense)
print(tddense(gru))
#tf.Tensor(
#[[[ 1.5456871  -0.5280464 ]
#  [ 0.11647969 -0.20553198]
#  [ 0.58126366 -0.16031623]
#  [-0.22882831 -0.22649539]
#  [ 0.62777793 -0.32470667]]], shape=(1, 5, 2), dtype=float32)
# if Dense kernel had shape (15, 2), this should result in the following error:
# InvalidArgumentError: Matrix size-incompatible: In[0]: [5,3], In[1]: [15,2] [Op:MatMul]
# but instead what we get is the same output
# than without TimeDistributed, without error

dense2 = tf.keras.layers.Dense(2, kernel_initializer=init, bias_initializer=init)
tddense = tf.keras.layers.TimeDistributed(dense2)
print(tddense(gru))
#tf.Tensor(
#[[[ 1.5456871  -0.5280464 ]
#  [ 0.11647969 -0.20553198]
#  [ 0.58126366 -0.16031623]
#  [-0.22882831 -0.22649539]
#  [ 0.62777793 -0.32470667]]], shape=(1, 5, 2), dtype=float32)

for w in dense2.weights:
    print(w.shape)
#(3, 2) as expected
#(2,)

Second example, with a rank even larger than 3:

import tensorflow as tf

print('Using Tensorflow version {} (git version {})'.format(tf.version.VERSION, tf.version.GIT_VERSION))

inp = tf.keras.Input(shape=(10, 25, 25, 3))
dense_layer1 = tf.keras.layers.Dense(78)
x = dense_layer1(inp)
print('Output shape without TimeDistributed:')
print(x.shape)

dense_layer2 = tf.keras.layers.Dense(78)
y=tf.keras.layers.TimeDistributed(dense_layer2)(inp)
print('Output shape with TimeDistributed:')
print(y.shape)

print('Weight shapes without TimeDistributed:')
for weight in dense_layer1.trainable_weights:
    if len(weight.shape) == 2:
        print('    kernel shape:')
    else:
        print('    bias shape:')
    print(weight.shape)
    
print('Weight shapes with TimeDistributed:')
for weight in dense_layer2.trainable_weights:
    if len(weight.shape) == 2:
        print('    kernel shape:')
    else:
        print('    bias shape:')
    print(weight.shape)

which outputs is:

Using Tensorflow version 2.0.0-beta1 (git version v2.0.0-beta0-16-g1d91213fe7)
Output shape without TimeDistributed:
(None, 10, 25, 25, 78)
Output shape with TimeDistributed:
(None, 10, 25, 25, 78)
Weight shapes without TimeDistributed:
    kernel shape:
(3, 78)
    bias shape:
(78,)
Weight shapes with TimeDistributed:
    kernel shape:
(3, 78)
    bias shape:
(78,)

We see, in this example, that Dense and TimeDistributed(Dense) behave the same in that they only touch to the last dimension of the input.

closed time in 8 days

durandg12

issue commenttensorflow/tensorflow

Dense does not flatten inputs with rank >2 and behaves exactly like TimeDistributed(Dense)

The note is incorrect, we will update the note to reflect the code behavior.

durandg12

comment created time in 8 days

MemberEvent

PR opened tensorflow/cloud

Initial commit.

Adds run API. Adds support for workflow where a python program can be passed as an entry_point to run.

+2435 -1

0 comment

24 changed files

pr created time in 8 days

push eventpavithrasv/cloud

Pavithra Vijay

commit sha 90b6bdca3de7df385884bbaa1aaae5fe0fcb8972

Initial commit.

view details

push time in 8 days

create barnchtensorflow/cloud

branch : master

created branch time in 8 days

issue commenttensorflow/addons

make F1-score usable with keras

Thank you @PhilipMay for working on this. Please feel free to send a PR to the tensorflow repo directly and skip the migration step since this is a metric we want in the main repo.

tillmo

comment created time in 10 days

issue commenttensorflow/tensorflow

Training fails when a multi-output Keras model has one output without a loss function

@mmilosav you are right, This is a duplicate of #36044

tomwphillips

comment created time in 10 days

issue commenttensorflow/tensorflow

Training fails when a multi-output Keras model has one output without a loss function

@tomwphillips this is by design. You do not need to feed target data for the output for which there is no loss function during training. You can pass a dictionary with just the other output like {'output_b': ...}.

tomwphillips

comment created time in 11 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with thee design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++| keras-team/keras   | tensorflow/keras |+:------------------- |:------------------------------------------- |+|Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.|+|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.|+|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here|++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target the `tf-nightly` pip+package as dependency. This means any change to TF will take at most 24+hours to be reflected on the Keras side.+* The Keras code will be mirrored to a Google-internal code repository via Google-internal +tools within a very short time window after each change.+The Google-internal CI tests will run on HEAD for both Keras and TF code.+* The CI build for the repository on GitHub might break when it sees a+new version of `tf-nightly`, if  certain behavior has been changed and wasn't+caught by unit tests. We have  observed a few similar cases with+[tf/addons](https://github.com/tensorflow/addons).+We hope this can be reduced by stronger+unit test coverage by Google internel systems, when both TF and Keras code are +tested at HEAD.+* pip package management. Keras will now follow the `tf-estimator` approach. +"pip install tensorflow" should also install Keras (from PyPI) as well.+There are more details for the pip package in the+[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC.++### Dependency Cleanup++As the high-level API of TensorFlow, Keras should have a direct dependency on+TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse +logic in the TF code that relies on Keras, which we should update/remove +when we split the repository.++The current usage of Keras from TensorFlow are:+* Unit tests, which should be converted to integration tests, or port the tests+to Keras repository.+* `feature_column`.+* Legacy `tf.layers` in v1 API.+* legacy RNN cells.+* TPU support code for `optimizer_v2`.+* SavedModel.+* TF Lite.++All Keras imports in integration tests can be changed to use dynamic import like below:++```python+try:+   from tensorflow.python.keras.engine import base_layer+except ImportError:+   tf.logging.error('keras is not installed, please pip install keras')+   base_layer = None+```++### Update Keras to only use public TF APIs++The current Keras code will still work if we do e.g.:+```python+from tensorflow.python.ops import array_ops++ones = array_ops.ones([2, 3])+```++However, since Keras is a separate repository, having it only use TF +public APIs will heavily reduce the chance of breakage caused by relying+on private methods or implementation details. We think this point is+critial to the health of the project. This also allows TF to change internal +implementation details without worrying about breaking Keras.++The converted code should look like e.g.:++```python+import tensorflow as tf++ones = tf.ones([2, 3])+```++During this conversion, we might notice that certain TF features used in Keras are+not public. A decision should be made on a case-by-case basis:++* Copy the functionality from TF to Keras.+* Replace the usage with another alternative TF public API.+* Make the functionality a new TF public API.++**Note that the open-source community is encouraged to contribute to this effort.**++### Two-stage change process++For any change that is affecting both TensorFlow and Keras, the change+will need to be split into two, one as a PR to the TF repo,+and the other as a PR to the Keras repo. Here are some common scenarios:++1. Adding a new feature to TensorFlow, and having Keras rely on it. Note that the +TF change needs to be submitted first, and the Keras PR needs to wait for the new TF+nightly to become available on PyPI.++Also note that any rollback of the TF PR will cause Keras to break, the+rollback sequence should be PR 33333 and then PR 22222 (see example below).+The Google-internal test for TF should catch the error if the rollback sequence+is not correct.++```python+# Existing scenario.+# PR 11111 (2 files updated)+# +++ tensorflow/python/ops/array_ops.py+def some_new_function(inputs):+   ...++# +++ tensorflow/python/keras/layers/core.py++class new_layer(Layer):++  def call(inputs):+     array_ops.some_new_function(inputs)+     ...+```++```python+# New scenario.+# PR 22222 (1 file updated)+# +++ tensorflow/python/ops/array_ops.py+@tf.export('some_new_function')+def some_new_function(inputs):+   ...++==================================+# PR 33333 (1 file updated)+# +++ tensorflow/python/keras/layers/core.py++class new_layer(Layer):++  def call(inputs):+     tf.some_new_function(inputs)+     ...+```++2. Changing the behavior of an existing TF API.++Note that the PR 22222 needs to be submitted with both the new and old+function since Google internal CI is still testing from HEAD.+The previous function can be +deleted after PR 33333 is submitted. Also note that this issue is caused by +Keras not using exclusively public TF API, but relying on TF implementation details.+Moving towards only using public APIs should reduce the likelihood of this kind of issue.++```python+# Existing scenario.+# PR 11111 (2 files updated)+# tensorflow/python/ops/array_ops.py+<<<+def existing_function(inputs):+    ...+>>>+def new_function(inputs, knob1=False, knob2=1):+    ...+# tensorflow/python/keras/layers/core.py++class existing_layer(Layer):++  def call(inputs):+<<<+    array_ops.existing_function(inputs)+>>>+    array_ops.new_function(+        inputs, +        knob1=True,+        knob2=3)+```++```python+# New scenario.+# PR 22222 (1 file updated)+# tensorflow/python/ops/array_ops.py+<<<+def existing_function(inputs):+   ...+>>>+def existing_function(inputs):+  return new_function(+    inputs, +    knob1=False,+    knob2=1)++def new_function(inputs, knob1, knob2=1):+    ...++==================================+# PR 33333 (1 file updated)+# tensorflow/python/keras/layers/core.py+class existing_layer(Layer):++  def call(inputs):+<<<+    array_ops.existing_function(inputs)+     ...+>>>+    array_ops.new_function(+        inputs, +        knob1=True,+        knob2=3)+```+++### Performance Implications++There may be some performance implications as we move towards only using+public TF APIs. We need to maintain a benchmark to ensure that there+is no performance regression.++### Dependencies+

Can you explain how namespacing will work without a circular dep? Keras depends on TF, will Keras continue to be under tf namespace as tf.keras?

qlzh727

comment created time in 12 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with thee design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++| keras-team/keras   | tensorflow/keras |+:------------------- |:------------------------------------------- |+|Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.|+|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.|+|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here|++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target the `tf-nightly` pip+package as dependency. This means any change to TF will take at most 24+hours to be reflected on the Keras side.+* The Keras code will be mirrored to a Google-internal code repository via Google-internal +tools within a very short time window after each change.+The Google-internal CI tests will run on HEAD for both Keras and TF code.+* The CI build for the repository on GitHub might break when it sees a+new version of `tf-nightly`, if  certain behavior has been changed and wasn't+caught by unit tests. We have  observed a few similar cases with+[tf/addons](https://github.com/tensorflow/addons).+We hope this can be reduced by stronger+unit test coverage by Google internel systems, when both TF and Keras code are +tested at HEAD.+* pip package management. Keras will now follow the `tf-estimator` approach. +"pip install tensorflow" should also install Keras (from PyPI) as well.+There are more details for the pip package in the+[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC.++### Dependency Cleanup++As the high-level API of TensorFlow, Keras should have a direct dependency on+TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse +logic in the TF code that relies on Keras, which we should update/remove +when we split the repository.++The current usage of Keras from TensorFlow are:+* Unit tests, which should be converted to integration tests, or port the tests+to Keras repository.+* `feature_column`.+* Legacy `tf.layers` in v1 API.+* legacy RNN cells.+* TPU support code for `optimizer_v2`.+* SavedModel.+* TF Lite.

I think we should add the aliased modules to this list? losses, metrics, initializers, optimizers ...

qlzh727

comment created time in 12 days

issue commenttensorflow/tensorflow

Simple keras model, Model.fit() does not learn unless experimental_run_tf_function=False at compile

Thank you @fcarsten for taking a look. We are doing a bunch of code refactoring internally and i think this use case in eager will be fixed as part of that. Will update this thread after I know when it is fixed in a tf-nightly release.

fcarsten

comment created time in 13 days

issue commenttensorflow/tensorflow

Need for more flexible Loss Function

Please feel free to send a PR to the Tensorflow Addons repository for this Loss.

ashutosh1919

comment created time in 15 days

issue commenttensorflow/tensorflow

Poor performance of model.fit and/or model.predict in TF 2.1.0-rc1

Thank you for taking a look Taylor!

gdudziuk

comment created time in 15 days

issue commenttensorflow/tensorflow

Tensorflow keras metrics cannot be used straight into the keras compile method

It is hard to get aggregated metrics on the whole dataset instead of batchwise With the stateful metrics you get the aggregated results across the entire dataset and not batchwise.

It is hard to isolate the metrics on training set and validation set Can you call evaluate separately for this use case?

gm-spacagna

comment created time in 15 days

issue commenttensorflow/tensorflow

tf.metrics.Mean* metrics miscalculated

@gsimko Thank you for the question. For metrics and losses, we expect that labels and predictions are at least 2D.

https://www.tensorflow.org/api_docs/python/tf/keras/metrics/MeanAbsoluteError?version=nightly#update_state

y_true: Ground truth values. shape = `[batch_size, d0, .. dN]`, except
        sparse loss functions such as sparse categorical crossentropy where
        shape = `[batch_size, d0, .. dN-1]`
y_pred: The predicted values. shape = `[batch_size, d0, .. dN]`

For example, let's say y_pred and y_true are of the shape [batch_size, d0], we compute the mean across the last axis in a sample -> this will give us one value per-sample -> [batch_size]. The second mean is computed across all samples across all batches.

gsimko

comment created time in 17 days

issue commenttensorflow/tensorflow

Feature request: RecallAtPrecision Metric

@tangbyron Please feel free to send us a PR. You can implement it the same way PrecisionAtRecall has been implemented. Not sure what you mean by you cannot use SensitivitySpecificityBase, you should be able to use it the same way it is in PrecisionAtRecall.

Thank you!

tangbyron

comment created time in 18 days

issue closedtensorflow/tensorflow

Tensorflow keras metrics cannot be used straight into the keras compile method

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu
  • TensorFlow installed from (source or binary): using pip
  • TensorFlow version (use command below): 2.1.0
  • Keras version: 2.3.1
  • Python version: 3.6.4

tensorflow.version.GIT_VERSION, tensorflow.version.VERSION ('v2.1.0-rc2-17-ge5bf8de', '2.1.0')

Describe the current behavior

I found an anomalous behavior when specifying tensorflow.keras.metrics directly into the Keras compile API:

from tensorflow.keras.metrics import Recall, Precision
model.compile(..., metrics=[Recall(), Precision()]

When looking at the history track the precision and recall plots at each epoch (using keras.callbacks.History) I observe very similar performances to both the training set and the validation set. The weirdest thing is that both Recall and Precision increase at each epoch while the loss is clearly not improving anymore.

I found the issue to be related to the statefulness of the Tensorflow metrics objects. Everytime you call the metric object it will append a new batch of data that get mixed with both training and validation data and cumulates at each epoch.

Describe the expected behavior

The expected behavior is that the metrics object should be stateless and do not depend on previous calls. Each time we calculate the metric (precision, recall or anything else), the function should only depend on the specified y_true and y_pred.

To workaround the issue we need to have either have Keras to be smart enough to re-instantiate the metric object at every call or to provide a tensorflow wrapper that is stateless. Maybe a decorator?

Code to reproduce the issue

recall=Recall()

y_train =[[0, 1, 0, 1],
         [1,0,0,0]]

y_train_pred=[[0.1,0.50001,0.4,0.7],
       [0.5,0.51,1,0]]

y_test =[[1, 1, 0, 0],
         [0,0,0,1]]

y_test_pred=[[0.1,0.80,0.8,0.9],
            [0.1,0.4,0.99,0]]


print(recall(y_train, y_train_pred))
print(recall(y_test, y_test_pred))
recall=Recall()
print(recall(y_test, y_test_pred))

recall=Recall()
print(recall(y_test, y_test_pred))
print(recall(y_train, y_train_pred))

Other info / logs The code above will print:

tf.Tensor(0.6666667, shape=(), dtype=float32)
tf.Tensor(0.5, shape=(), dtype=float32)
tf.Tensor(0.33333334, shape=(), dtype=float32)
tf.Tensor(0.33333334, shape=(), dtype=float32)
tf.Tensor(0.5, shape=(), dtype=float32)

As you can see the behavior is not stateless but is the concatenation of all of the apply calls since the object instantiation.

closed time in 20 days

gm-spacagna

issue commenttensorflow/tensorflow

Tensorflow keras metrics cannot be used straight into the keras compile method

@gm-spacagna Thank you for the issue.

For some of the metrics such as MSE we have stateful and stateless versions: stateful listed as classes here: https://www.tensorflow.org/api_docs/python/tf/keras/metrics stateless listed as functions: https://www.tensorflow.org/api_docs/python/tf/keras/metrics#functions

Usage with compile/fit API are always stateful. If you want to get batchwise values, you can write custom training loop using the train_on_batch API.

For metrics such as Precision/Recall there isn't really a stateless version. For standalone usage of these metrics, please use reset_state API for clearing the state between batches.

gm-spacagna

comment created time in 20 days

pull request commenttensorflow/tensorflow

added 3 macro multilabel metrics

For new metrics we recommend that you add them to the TensorFlow addons repository first. Based on usage/demand we can later move them to the core repo.

SumNeuron

comment created time in 20 days

issue commenttensorflow/tensorflow

Use keras .fit() method to train adversarial models

Adding @omalleyt12 who is working on refactoring code to make it easier for users to write custom training loops.

kristofgiber

comment created time in 21 days

issue commenttensorflow/tensorflow

MultiLabel Metric Support

Adding this to contributions welcome list. Please feel free to add the multi-label option to the APIs and send me a PR.

SumNeuron

comment created time in 21 days

issue commenttensorflow/tensorflow

Metrics for multi-label classification for using with tf.keras

Thank you, i see you have opened another issue for the pending features, will reply on that.

Abhijit-2592

comment created time in 21 days

issue closedtensorflow/tensorflow

TF2 Keras functional API error

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04.6 LTS
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d38 2.0.0 (most recent from pip)
  • Python version: Python 3.6.9
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: CPU version
  • GPU model and memory:

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior

Show OperatorNotAllowedInGraphError error.

Describe the expected behavior

Show model summary.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

import tensorflow as tf

tf.keras.backend.clear_session()

class Manager(tf.keras.Model):
    def __init__(self):
        super(Manager, self).__init__()

        self.inputs = tf.keras.Input(shape=(None,))

model = Manager()
model.build(input_shape=(16,16)).summary()

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/network.py", line 630, in build
    if input_shape and not self.inputs:
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 765, in __bool__
    self._disallow_bool_casting()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 534, in _disallow_bool_casting
    self._disallow_in_graph_mode("using a `tf.Tensor` as a Python `bool`")
  File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 523, in _disallow_in_graph_mode
    " this function with @tf.function.".format(task))
tensorflow.python.framework.errors_impl.OperatorNotAllowedInGraphError: using a `tf.Tensor` as a Python `bool` is not allowed in Graph execution. Use Eager execution or decorate this function with @tf.function.

closed time in 21 days

lfuszara1

issue commenttensorflow/tensorflow

TF2 Keras functional API error

inputs is a property on the tf.keras.Model class. Also you wouldn't need to create Input layer in the sub-classed model. It is more for the keras Functional API.

Please checkout the following guide on sub-classed models: https://www.tensorflow.org/guide/keras/custom_layers_and_models#building_models

lfuszara1

comment created time in 21 days

issue commenttensorflow/tensorflow

keras model fit() crash with large batch_size and 0 validation_split

@linhx13 Thank you for the issue. From the gist it looks like the issue is independent of validation_split=0, batch_size. I removed your custom loss and metric to make sure that the model works first and there are errors in the model. It would be great if you can provide a minimal repro, if there is still an issue here. Closing for now, please feel free to reopen if required.

linhx13

comment created time in 21 days

issue closedtensorflow/tensorflow

keras model fit() crash with large batch_size and 0 validation_split

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS 7
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): tf-gpu 1.13.1
  • Python version: 3.6.2
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version: 10/7.4.1
  • GPU model and memory: GTX 1080Ti, 10G

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior Raise an error when fitting tf.keras model when the training dataset size less then batch_size and validation_split is 0.0. If using a batch_size less then dataset size or setting validation_split, the fitting is good. Using original keras, fitting is good in any case. Error is :

Traceback (most recent call last):
  File "crf_tf.py", line 123, in <module>
    model.fit(x, y, batch_size=16, epochs=50, validation_split=0.0)
  File "/opt/userhome/ichongxiang/.conda/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 880, in fit
    validation_steps=validation_steps)
  File "/opt/userhome/ichongxiang/.conda/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 329, in model_iteration
    batch_outs = f(ins_batch)
  File "/opt/userhome/ichongxiang/.conda/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 3076, in __call__
    run_metadata=self.run_metadata)
  File "/opt/userhome/ichongxiang/.conda/envs/py36/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1439, in __call__
    run_metadata_ptr)
  File "/opt/userhome/ichongxiang/.conda/envs/py36/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 2
         [[{{node metrics/crf_accuracy/ArithmeticOptimizer/ReorderCastLikeAndValuePreserving_bool_Squeeze}}]]
         [[{{node crf/cond/Maximum}}]]

Describe the expected behavior Fit successfully.

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem.

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
# import keras
# from keras import backend as K
import numpy as np


class CRF(keras.layers.Layer):

    def __init__(self, num_tags, **kwargs):
        super(CRF, self).__init__(**kwargs)
        self.num_tags = num_tags
        self.input_spec = keras.layers.InputSpec(min_ndim=3)
        self.supports_masking = True

    def get_config(self):
        config = {
            'num_tags': self.num_tags,
        }
        base_config = super(CRF, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def build(self, input_shape):
        assert len(input_shape) == 3
        if input_shape[-1] is None:
            raise ValueError('The last dimension of the inputs to `CRF` '
                             'should be defined. Found `None`.')
        if input_shape[-1] != self.num_tags:
            raise ValueError('The last dimension of the input shape must be equal to output'
                             ' shape. Use a linear layer if needed.')
        self.transitions = self.add_weight(name='transitions',
                                           shape=[self.num_tags,
                                                  self.num_tags],
                                           initializer="glorot_uniform",
                                           trainable=True)
        self.built = True

    def call(self, inputs, mask=None):
        seq_lens = get_seq_lens(inputs, mask)
        viterbi_sequence, _ = tf.contrib.crf.crf_decode(inputs,
                                                        self.transitions,
                                                        seq_lens)
        outputs = K.one_hot(viterbi_sequence, self.num_tags)
        return K.in_train_phase(inputs, outputs)

    def compute_output_shape(self, input_shape):
        return input_shape[:2] + (self.num_tags,)

    def compute_mask(self, inputs, mask=None):
        if mask is not None:
            return K.any(mask, axis=1)
        return mask


def get_seq_lens(inputs, mask=None):
    if mask is not None:
        return K.sum(K.cast(mask, dtype='int32'), axis=-1)
    else:
        shape = K.int_shape(inputs)
        return K.ones(shape[:-1], dtype='int32') * shape[-1]


def crf_loss(y_true, y_pred):
    crf, idx = y_pred._keras_history[:2]
    inputs = crf.get_input_at(idx)
    mask = crf.get_input_mask_at(idx)
    seq_lens = get_seq_lens(inputs, mask)
    y_true = K.cast(K.argmax(y_true, axis=-1), dtype='int32')
    log_likelihood, crf.transitions = \
        tf.contrib.crf.crf_log_likelihood(y_pred,
                                          y_true,
                                          seq_lens,
                                          transition_params=crf.transitions)
    return K.mean(-log_likelihood)


def crf_accuracy(y_true, y_pred):
    crf, idx = y_pred._keras_history[:2]
    inputs = crf.get_input_at(idx)
    mask = crf.get_input_mask_at(idx)
    seq_lens = get_seq_lens(inputs, mask)
    viterbi_sequence, _ = tf.contrib.crf.crf_decode(inputs,
                                                    crf.transitions,
                                                    seq_lens)
    y_true = K.cast(K.argmax(y_true, -1), dtype='int32')
    judge = K.cast(K.equal(viterbi_sequence, y_true), K.floatx())
    if mask is None:
        return K.mean(judge)
    else:
        mask = K.cast(mask, K.floatx())
        return K.sum(judge * mask) / K.sum(mask)


num_words = 20
num_features = 100
num_tags = 5

inputs = keras.layers.Input(shape=(None,))
embedding = keras.layers.Embedding(10, num_features, mask_zero=True)(inputs)
scores = keras.layers.TimeDistributed(keras.layers.Dense(num_tags))(embedding)
crf = CRF(num_tags)
outputs = crf(scores)
model = keras.models.Model(inputs, outputs)

model.summary()

x = np.array([[1, 2, 3, 4, 0, 0], [4, 5, 6, 0, 0, 0]])
y = np.array([[1, 3, 4, 2, 0, 0], [2, 1, 3, 0, 0, 0]])
y = np.eye(num_tags)[y]

print(x)
print(x.shape)
print(y)
print(y.shape)

model.compile(optimizer="adam",
              loss=crf_loss,
              metrics=[crf_accuracy])

model.fit(x, y, batch_size=16, epochs=50, validation_split=0.0)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 21 days

linhx13

issue closedtensorflow/tensorflow

tf.keras: predict, fit, predict = old result

Windows 10 / Python 3.7 / TF 1.13

m = model.predict(inputs)
print(model.fit(inputs, outputs, batch_size=inputs.shape[0], verbose=1)) # OK: loss: 16.0302
print(model.predict(inputs) - m) # all = 0

Synchronization of the weights for the last call predict() does not work.

closed time in 21 days

Roffild

issue commenttensorflow/tensorflow

tf.keras: predict, fit, predict = old result

Closing the issue as there is not enough information to repro the issue, please feel free to re-open if required.

Roffild

comment created time in 21 days

issue closedtensorflow/tensorflow

Very unhelpful error msg building keras models with while loops

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution: Linux Ubuntu 16.04:
  • TensorFlow installed from (source or binary): pip
  • TensorFlow version (use command below): v1.12.1-9365-gff401a6 1.15.0-dev20190821
  • Python version: 3.6.8
  • CUDA/cuDNN version: 10.0 / ??
  • GPU model and memory: Quadro 2gb

Describe the current behavior _create_keras_history_helper is making building networks much more pleasant in general by forgoing the need to wrap everything in Lambda layers. It's failing for while loops without Lambda wrapping, giving a very unhelpful error message. tensorflow.python.framework.errors_impl.InvalidArgumentError: A cross-device loop must have a pivot predicate: while/while_context

Describe the expected behavior Indicate the source of the problem/possible resolution.

Code to reproduce the issue

import tensorflow as tf

def cond(i, x):
    return tf.reduce_all(x < 10)

def body(i, x):
    return i + 1, x + i

x = tf.keras.layers.Input(shape=(), dtype=tf.float32)
inc = tf.while_loop(cond, body, [tf.constant(0, dtype=tf.float32), x])
# the following fixes things
# inc = tf.keras.layers.Lambda(lambda x: tf.while_loop(
#     cond, body, [tf.constant(0, dtype=tf.float32), x]))(x)

model = tf.keras.Model(inputs=x, outputs=inc)  # <- error occurs here

Other info / logs Traceback:

Traceback (most recent call last):
  File "loop.py", line 23, in <module>
    model = tf.keras.Model(inputs=x, outputs=inc)  # <- error occurs here
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 147, in __init__
    super(Model, self).__init__(*args, **kwargs)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 164, in __init__
    self._init_graph_network(*args, **kwargs)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/training/tracking/base.py", line 457, in _method_wrapper
    result = method(self, *args, **kwargs)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 267, in _init_graph_network
    base_layer_utils.create_keras_history(self._nested_outputs)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 184, in create_keras_history
    _, created_layers = _create_keras_history_helper(tensors, set(), [])
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 231, in _create_keras_history_helper
    layer_inputs, processed_ops, created_layers)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/base_layer_utils.py", line 229, in _create_keras_history_helper
    constants[i] = backend.function([], op_input)([])
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3473, in __call__
    self._make_callable(feed_arrays, feed_symbols, symbol_vals, session)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/keras/backend.py", line 3410, in _make_callable
    callable_fn = session._make_callable_from_options(callable_opts)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1505, in _make_callable_from_options
    return BaseSession._Callable(self, callable_options)
  File ".../.anaconda2/envs/tf-nightly-gpu/lib/python3.6/site-packages/tensorflow_core/python/client/session.py", line 1460, in __init__
    session._session, options_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: A cross-device loop must have a pivot predicate: while/while_context

closed time in 21 days

jackd

issue commenttensorflow/tensorflow

Very unhelpful error msg building keras models with while loops

Can you wrap the while loop in a lambda layer?

inc = tf.keras.layers.Lambda(lambda i: tf.while_loop(cond, body, [tf.constant(0, dtype=tf.float32), i]))(x)

By default if TF ops are used in a tf.keras model, without having been wrapped in a tf.keras layer, we try to wrap them ourselves. This will work only for use cases where we can backtrack to the inputs of the model. If your custom op has a control dependency for example, this automatic wrapping will not work.

jackd

comment created time in 21 days

issue closedtensorflow/tensorflow

tf keras base layer issue for input_tensors/output_tensors in 1.14.0

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10.0.17763
  • TensorFlow installed from (source or binary): pip install
  • TensorFlow version (use command below): 1.14.0
  • Python version: 3.6.8

Describe the current behavior In tensorflow keras, the input_tensors, output_tensors, output_shapes of class Node was a list in tensorflow 1.13.1, even if it only contains one tensor. Now the behavior changes in 1.14.0, these variables are a single tensor (not a list any more) if there is one single element. We are developing based on tf keras, then this behavior is not backward compatible .

Describe the expected behavior Can we change them back to list for the single tensor case?

Code to reproduce the issue

from tensorflow.python import keras            
model = keras.Sequential()
model.add(keras.layers.Dense(5, input_shape=(4,), activation='sigmoid'))
model.add(keras.layers.Dense(3, input_shape=(5,), use_bias=True))
model.compile('sgd', 'mse')

def extract_inbound_nodes(layer):
     return layer.inbound_nodes if hasattr(layer, 'inbound_nodes') else layer._inbound_nodes

for l_ in model.layers:
   for node_ in extract_inbound_nodes(l_):
       assert isinstance(node_.output_tensors, list)
       assert isinstance(node_.input_tensors, list)
       assert isinstance(node_.output_shapes, list)

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

closed time in 21 days

jiafatom

issue commenttensorflow/tensorflow

tf.keras: predict, fit, predict = old result

Is this still an issue? i do not see any error or predict calls in the colab.

Roffild

comment created time in 21 days

pull request commenttensorflow/models

Update keras-cifar-main to user sparse data and loss function.

Submitted this change internally.

pavithrasv

comment created time in 25 days

PR closed tensorflow/models

Reviewers
Update keras-cifar-main to user sparse data and loss function. cla: yes

keras-cifar-main synthetic data generates sparse data, whereas the real data generates one-hot labels. The real data and model loss/metric are correct but the synthetic data is wrong with respect to this model parameters. We can convert the synthetic data to return one-hot labels but this synthetic data is shared across other models where sparse data is used. I figured it is best to convert keras-cifar-main to use sparse data as well.

+4 -26

11 comments

1 changed file

pavithrasv

pr closed time in 25 days

issue commenttensorflow/tensorflow

tf.losses.mean_squared_error returns a list in tensorflow '2.0.0-rc1'

In TF 1.x we had different loss functions in tf.losses and tf.keras.losses, we unified them in TF 2.0. We went the tf.keras.losses way for the functions where the inputs are expected to be at least 2D and we mean across the last dimension.

alibeyram

comment created time in a month

issue closedtensorflow/tensorflow

tf.keras.losses.MeanSquaredError doesnt support scalar inputs

I am using tensorflow version 2.2.0-dev20200119. tf.keras.losses.MeanSquaredError can't handle input scalar values properly. The following code can reproduce the problem.

#!/usr/bin/python3
import tensorflow as tf;
tf.keras.losses.MeanSquaredError()(1,0);

closed time in a month

breadbread1984

issue commenttensorflow/tensorflow

tf.keras.losses.MeanSquaredError doesnt support scalar inputs

Hi @breadbread1984, the labels and predictions are expected to be at least 2D for the metric: https://www.tensorflow.org/api_docs/python/tf/keras/metrics/MeanSquaredError?version=nightly#update_state

y_true: Ground truth values. shape = [batch_size, d0, .. dN]. y_pred: The predicted values. shape = [batch_size, d0, .. dN].

breadbread1984

comment created time in a month

fork pavithrasv/build

Build-related tools for TensorFlow

fork in a month

issue closedtensorflow/tensorflow

Why doesn't ```tf.keras.losses.binary_crossentropy``` raise error

x = np.arange(10,dtype=np.float64).reshape(10,1)
#x.shape = (10,1)

y = np.arange(10,dtype=np.float64)
#y.shape = (10,)

tf.keras.losses.binary_crossentropy(y_true=y, y_pred=x)
#this line does't raise error

tf.keras.metrics.BinaryAccuracy()(y_true=y, y_pred=x)
#this line neither

tf.keras.metrics.Precision()(y_true=y, y_pred=x)
#this line raise an error

I think binary_crossentropy and BinaryAccuracy should raise an ValueError like tf.keras.metrics.Precision:

ValueError: Shapes (128, 1) and (128,) are incompatible

closed time in a month

DachuanZhao

issue commenttensorflow/tensorflow

Why doesn't ```tf.keras.losses.binary_crossentropy``` raise error

This is fixed now in : https://github.com/tensorflow/tensorflow/commit/ba8a0c934147fcf2a879f349677fc11676c73835#diff-1d3c0e76cc08b7d6e2e3a6ab89965a5c

DachuanZhao

comment created time in a month

pull request commenttensorflow/tensorflow

Print kernel sizes too

@vmiheer So your test will need to go in vis_utils_test.py. What you can do is

  1. pip install tensorflow
  2. create a separate python test module and put your test in that and make sure it passes
  3. move this test into vis_utils_test.py with required updates
  4. commit and let the PR process run the test now in TF source code

If you want to directly put your test in vis_utils_test.py which is part of TF source code, you will need to build TF from source.

vmiheer

comment created time in a month

issue commenttensorflow/tensorflow

Why doesn't ```tf.keras.losses.binary_crossentropy``` raise error

Precision metric should actually not be raising an error. Have a change out to fix this.

DachuanZhao

comment created time in a month

PR closed tensorflow/tensorflow

Reviewers
Update BinaryAccuracy for assert cla: yes comp:keras size:XS

fixes #35490 Raises value error if y_true and y_pred have different shapes for computing BinaryAccuracy metric.

+2 -0

1 comment

1 changed file

ymodak

pr closed time in a month

pull request commenttensorflow/tensorflow

Update BinaryAccuracy for assert

Thank you! This fix is incorrect as there are use cases where the shapes can be different but we can squeeze or expand dimensions to match the shapes. The right fix should be in precision/recall. I have that fix out already, will be submitted in the next day or two.

ymodak

comment created time in a month

pull request commenttensorflow/models

Update keras-cifar-main to user sparse data and loss function.

This is blocking a bug fix change, would like to submit that by end of this week if possible. Thank you!

pavithrasv

comment created time in a month

issue closedtensorflow/tensorflow

Tensorflow 2.0 : Combining model.add_loss and keras losses function in training doesn't work

I read about TensorFlow 2.0 tutorial in VAE section. I follow the tutorial but the model doesn't work as expected despite running the notebook directly from given Google Colab. The result actually is the same as in the tutorial (i.e. loss value is very similar) but if you look at the output you'll see that the model can't reconstruct the input at all (i.e. output the same image for all inputs). This seems to be a mistake from the tutorial itself when combining model.add_loss() and keras.losses.

Original code

I changed MSE loss to BinaryCrossentropy but the result is still the same.

Later I tried compute the BinaryCrossentropy loss explicitly in my forward pass then use model.add_loss() in addition with the KL-divergence loss

Use only model.add_loss() to calculate the loss

This way the model can actually learn the data and the output seems good enough.

So I have a question about model.add_loss() and losses as a function that takes (y_true, y_pred) (i.e. keras.losses). The updated code works only if it can calculate losses in forward pass (e.g. kl-divergence or reconstruction loss), how can I combine model.add_loss() and keras.losses correctly in the case where the model need ground truth of the output (e.g. denoise VAE).

closed time in a month

51616

issue commenttensorflow/tensorflow

Tensorflow 2.0 : Combining model.add_loss and keras losses function in training doesn't work

Closing due to lack of activity, please re-open if you see this issue again.

51616

comment created time in a month

pull request commenttensorflow/models

Update keras-cifar-main to user sparse data and loss function.

did the convergence tests pass? can we merge this change?

pavithrasv

comment created time in a month

issue closedtensorflow/tensorflow

tf.keras.backend.zeros implementation ends up tracking tensors as well in graph mode

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OSX
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.12
  • Python version: 3.6.6

Describe the current behavior

Here is my custom layer:

class ReshapeLayer(Layer):
    def __init__(self, **kwargs):
        super(ReshapeLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        assert len(input_shape) >= 2
        super(ReshapeLayer, self).build(input_shape)

    def call(self, x):
        s = K.shape(x)
        # zeros_w = K.zeros((s[0], 1, s[2], s[3]), tf.float32) # does not work
        zeros_w = tf.zeros((s[0], 1, s[2], s[3]), tf.float32)
        r = K.concatenate([x, zeros_w], 1)

        s = K.shape(r)
       #  zeros_h = K.zeros((s[0], s[1], 1, s[3]), tf.float32)  # does not work
        zeros_h = tf.zeros((s[0], s[1], 1, s[3]), tf.float32)
        r = K.concatenate([r, zeros_h], 2)
        return r

    def compute_output_shape(self, input_shape):
        shape = tf.TensorShape(input_shape).as_list()
        shape[1] = shape[1] + 1
        shape[2] = shape[2] + 1
        return tf.TensorShape(shape)

Please note the commented lines i.e. K.zeros vs tf.zeros

In graph mode, if I use K.zeros even though the graph gets built, later on I get an exception with long stack trace (probably because this layer gets used many times in my network) that Tensor object does not have is_initialized property

AttributeError: 'Tensor' object has no attribute 'is_initialized'

K.zeros works in eager mode.

Usage of tf.zeros work fine in both graph and eager mode.

After debugging the tensorflow code I figured that towards the very end when keras tries to initialize the variables it sees some entries that are of type Tensor

Those entries are the ones generated by K.zeros.

I then looked at the implementation of K.zeros https://github.com/tensorflow/tensorflow/blob/a6d8ffae097d0132989ae4688d224121ec6d8f35/tensorflow/python/keras/backend.py#L1010

In that it clear says that it could return either a variable or tensor based on the input (i.e. shape). This is correct however it seems like that irrespective of the return value being a variable or tensor it still ends up tracking it (via track_variable) in graph mode.

# code of zeros in tf.keras.backend.py
with ops.init_scope():
    if dtype is None:
      dtype = floatx()
    tf_dtype = dtypes_module.as_dtype(dtype)
    v = array_ops.zeros(shape=shape, dtype=tf_dtype, name=name)
    if py_all(v.shape.as_list()):
      return variable(v, dtype=dtype, name=name)
    track_variable(v)
    return v
# code of track_variable in tf.keras.backend.py
def track_variable(v):
  """Tracks the given variable for initialization."""
  if context.executing_eagerly():
    return
  graph = v.graph if hasattr(v, 'graph') else ops.get_default_graph()
  if graph not in _GRAPH_VARIABLES:
    _GRAPH_VARIABLES[graph] = weakref.WeakSet()
  _GRAPH_VARIABLES[graph].add(v)

During debugging I can see that since the tensor is part of the collection how invoking is_initialized on it would result in error.

Based on the code flow I would think that if K.zeros is going to return a tensor then it should not track it (i.e add it to the variables collection).

Part of the stack trace

return get_session().run(tensors)  File "/Users/ksachdeva/mlenv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 469, in get_session
    _initialize_variables(session)  File "/Users/ksachdeva/mlenv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 731, in _initialize_variables
    [variables_module.is_variable_initialized(v) for v in candidate_vars])  File "/Users/ksachdeva/mlenv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 731, in <listcomp>
    [variables_module.is_variable_initialized(v) for v in candidate_vars])  File "/Users/ksachdeva/mlenv/lib/python3.6/site-packages/tensorflow/python/util/tf_should_use.py", line 189, in wrapped
    return _add_should_use_warning(fn(*args, **kwargs))

closed time in 2 months

ksachdeva

issue commenttensorflow/tensorflow

tf.keras.backend.zeros implementation ends up tracking tensors as well in graph mode

Yes, the fix for this is in and should be available in the next nightly. Thank you!

ksachdeva

comment created time in 2 months

PR closed tensorflow/tensorflow

Modified documentation for SparseCategoricalCrossentropy cla: yes comp:keras size:XS stalled stat:awaiting response

The documentation example for SparseCategoricalCrossentropy is mathematically inconsistent with a parallel example given for CategoricalCrossentropy and also is confusing. In particular the second element in the original SparseCategoricalCrossentropy example function call has component given by [.5, .89, .6] which is not normalized to be a probability (i.e. doesn't sum to 1.0). I modified it to be [.05, .89, .06] which does sum to 1.0 and also recomputed the loss to be 0.09458992, preserving all the precision from the function call. I hope this helps clarify things for users. Thanks.

+2 -2

3 comments

1 changed file

td2014

pr closed time in 2 months

pull request commenttensorflow/tensorflow

Modified documentation for SparseCategoricalCrossentropy

Closing this PR as the docs have been updated.

td2014

comment created time in 2 months

issue openedkeras-team/keras

loss functions converting predictions to tensor should infer dtype from predictions

<em>Please make sure that this is a Bug or a Feature Request and provide all applicable information asked by the template. If your issue is an implementation question, please ask your question on StackOverflow or on the Keras Slack channel instead of opening a GitHub issue.</em>

System information

  • Have I written custom code (as opposed to using example directory):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • TensorFlow backend (yes / no):
  • TensorFlow version:
  • Keras version:
  • Python version:
  • CUDA/cuDNN version:
  • GPU model and memory:

You can obtain the TensorFlow version with:
python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
You can obtain the Keras version with:
python -c 'import keras as k; print(k.version)'

Describe the current behavior
def mean_squared_error(y_true, y_pred): if not K.is_tensor(y_pred): y_pred = K.constant(y_pred) y_true = K.cast(y_true, y_pred.dtype) return K.mean(K.square(y_pred - y_true), axis=-1)

y_pred is always converted to float32 because of K.constant call.

Describe the expected behavior
pass dtype to the K.constant call

Code to reproduce the issue
Provide a reproducible test case that is the bare minimum necessary to generate the problem.

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

created time in 2 months

issue closedtensorflow/tensorflow

Custom loss function fails with sample_weight and batch_size > 1

System information

  • Have I written custom code: Yes
  • OS Platform and Distribution: Debian 9.9
  • TensorFlow installed from: conda (-c anaconda)
  • TensorFlow version: 1.14.0
  • Python version: 3.7.3
  • GPU model and memory: n/a - tested in CPU mode

Describe the current behavior

An error occurs when training an LSTM with a custom loss function, using sample_weight and batch_size > 1. The error does not occur if batch_size = 1, or if sample_weight = None.

Describe the expected behavior

I would expect custom loss functions to work irrespective of batch size and sample weights.

Code to reproduce the issue

Here’s a minimal example:

import numpy as np
import tensorflow as tf

batch_size = 32  # no problem if this is 1
sequence_len = 1
embedding_size = 100

x_train = np.random.randn(batch_size, sequence_len, embedding_size)
y_train = np.random.randn(batch_size, embedding_size)
sample_weight = np.random.randn(batch_size)  # no problem if this is None

train_input = tf.keras.Input(shape=(sequence_len, embedding_size),
                             batch_size=batch_size)

lstm_layer = tf.keras.layers.LSTM(200,
                                  return_sequences=False,
                                  )(train_input)

dense_layer = tf.keras.layers.Dense(embedding_size,
                                    )(lstm_layer)

model = tf.keras.models.Model(inputs=train_input, outputs=dense_layer)

model.summary()

# Custom loss function. This function could of course be replaced with
# tf.keras.losses.mean_squared_error, but I have a use case where I need a
# custom loss function.
class customLoss(tf.keras.losses.Loss):
    def call(self, y_true, y_pred):
        return tf.reduce_mean(tf.math.squared_difference(y_true, y_pred))

model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=0.001),
              loss=customLoss())

loss = model.train_on_batch(x_train,
                            y=y_train,
                            sample_weight=sample_weight)

Other info / logs

In #29026, @pavithrasv has pointed out that loss functions from tf.losses do not work with keras, and suggested to use loss functions from tf.keras.losses instead (thanks again!). Consequently, I thought that defining a custom loss function using the tf.keras.losses.Loss base class should be possible. (Please note that in my actual use case I have a more complex custom loss function for which I need some math operations from tf.math.)

Traceback:

Traceback (most recent call last):
  File "/home/john/PhD/GitLab/literary_lstm/bug_minimal_example_03.py", line 38, in <module>
    sample_weight=sample_weight)
  File "/home/john/miniconda3/envs/py_tf/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1175, in train_on_batch
    outputs = self.train_function(ins)  # pylint: disable=not-callable
  File "/home/john/miniconda3/envs/py_tf/lib/python3.7/site-packages/tensorflow/python/keras/backend.py", line 3292, in __call__
    run_metadata=self.run_metadata)
  File "/home/john/miniconda3/envs/py_tf/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1458, in __call__
    run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[0], expected a dimension of 1, got 32
	 [[{{node loss_1/dense_loss/weighted_loss/Squeeze}}]]

closed time in 2 months

ingo-m

issue commenttensorflow/tensorflow

Custom loss function fails with sample_weight and batch_size > 1

Thank for reporting this @ingo-m . The issue is because of how the custom loss is expected to be implemented. The call in Loss class is expected to return per-sample loss values.

Please take a look at the documentation for y_true, y_pred, sample_weight here: https://www.tensorflow.org/api_docs/python/tf/keras/losses/Loss?version=nightly#call

call basically reduces y_true and y_pred with shape [batch_size, d0, .. dN] to loss values of shape [batch_size, d0, .. dN-1]. The reduce_mean is along the last dimension. To this result sample_weight that is broadcastable to shape [batch_size, d0, .. dN-1] is applied.

In your code updating the loss function like: return tf.reduce_mean(tf.math.squared_difference(y_true, y_pred), axis=-1) will fix the issue.

Hope this helps :) Closing out the issue, please feel free to add comments if you have more questions.

ingo-m

comment created time in 2 months

pull request commenttensorflow/tensorflow

Print kernel sizes too

Will you be able to add a test case for the use case described? The test is also to make sure this change is compatible with existing use case.

vmiheer

comment created time in 2 months

issue commenttensorflow/tensorflow

Cannot use dict base datasets with keras.Model.fit.

Assigning to Tom who will be working on this as part of ideal fit/compile change.

npuichigo

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Load saved custom Loss class

 def should_overwrite(filepath, overwrite):   return True  +def convert_output_metrics(metrics_config, custom_objects):+  from google3.third_party.tensorflow.python.keras import metrics as metrics_module  # pylint:disable=g-import-not-at-top

This can be removed. from tensorflow.python.keras import metrics as metrics_module # pylint:disable=g-import-not-at-top

thierryherrmann

comment created time in 2 months

issue commenttensorflow/tensorflow

'accuracy' and tf.metrics.get('accuracy') produce different results

I have updated the compile API docs to address this, will be in the next nightly. Thank you!

bersbersbers

comment created time in 2 months

issue closedtensorflow/tensorflow

'accuracy' and tf.metrics.get('accuracy') produce different results

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, see below
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OpenSUSE
  • TensorFlow installed from (source or binary): pip binary within pyenv
  • TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d38 2.0.0
  • Python version: 3.7.5

Describe the current behavior The same model behaves differently whether one uses 'accuracy' or tf.keras.metrics.get('accuracy') (see below).

Describe the expected behavior They should behave identically.

Code to reproduce the issue

"""Bug."""
# import keras
import numpy as np
import tensorflow.keras as keras

X = np.empty([10, 224, 224, 3])
Y = np.empty([10, 2])

MODEL = keras.applications.vgg16.VGG16(weights=None, classes=2)

MODEL.compile(optimizer=keras.optimizers.Adam(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
MODEL.fit(X, Y, epochs=10)

MODEL.compile(optimizer=keras.optimizers.Adam(),
              loss='categorical_crossentropy',
              metrics=[keras.metrics.get('accuracy')])
MODEL.fit(X, Y, epochs=10)

Example output:

Train on 10 samples
Epoch 1/10

10/10 [==============================] - 4s 389ms/sample - loss: inf - accuracy: 0.9000
Epoch 2/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 3/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 4/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 5/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 6/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 7/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 8/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 9/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Epoch 10/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.9000
Train on 10 samples
Epoch 1/10

10/10 [==============================] - 1s 131ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 2/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 3/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 4/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 5/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 6/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 7/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 8/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 9/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00
Epoch 10/10

10/10 [==============================] - 0s 8ms/sample - loss: nan - accuracy: 0.0000e+00

Other info / logs Closely related to #34088

closed time in 2 months

bersbersbers

issue closedtensorflow/tensorflow

BinaryCrossentropy incorrect partial reduction of loss when reduction='none'

System information custom code

  • OS Platform and Distribution ubuntu 18.04
  • TensorFlow installed from (source or binary): pip install tensorflow-gpu
  • TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d38 2.0.0
  • Python version: 3.6.9

Describe the current behavior BinaryCrossentropy does a reduction on the last dimension even if you pass in reduction='none'

Describe the expected behavior It should not do any reduction

Workaround use tf.nn.sigmoid_cross_entropy_with_logits()

Code to reproduce the issue

import numpy as np

import tensorflow as tf

def main():
    y = np.array([0., 0., 1., 1.]).reshape((1, 1, 1, -1))
    x = np.array([1., 1., 1., 0.]).reshape((1, 1, 1, -1))
    print(y.shape)
    print(x.shape)

    bce_reduce = tf.keras.losses.BinaryCrossentropy()
    loss = bce_reduce(y, x)
    print('correct: fully reduced loss (reduction=default)', loss.numpy())

    bce = tf.keras.losses.BinaryCrossentropy(reduction=tf.keras.losses.Reduction.NONE)
    loss = bce(y, x)
    print('incorrect: should be not be reduced (reduction=none): ', loss.numpy())  # Loss: 11.522857
    print('incorrect: reduced along last dimension:', loss.shape)

    # should be the same as:
    correct_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=x)
    print('correct: unreduced_loss (using sigmoid_cross_entropy_with_logits)', correct_loss)
    print('correct: unreduced_loss_shape', correct_loss.shape)


main()

output:

correct: fully reduced loss (reduction=default) 11.568711280822754
incorrect: should be not be reduced (reduction=none):  [[[11.56871128]]]
incorrect: reduced along last dimension: (1, 1, 1)
correct: unreduced_loss (using sigmoid_cross_entropy_with_logits) tf.Tensor([[[[1.31326169 1.31326169 0.31326169 0.69314718]]]], shape=(1, 1, 1, 4), dtype=float64)
correct: unreduced_loss_shape (1, 1, 1, 4)

closed time in 2 months

chahld
more