profile
viewpoint

alextp/pylda 30

An implementation of gibbs sampling for Latent Dirichlet Allocation

jamsjr/hard-versus-soft 4

Um artigo sobre a diferença entre hard reservation e soft reserrvation em escalonadores de tempo real

duckworthd/Topics 3

Implementations of Inference algorithms for Topic Models like Latent Dirichlet Allocation, Hierarchical Dirichlet Processes, and more!

alextp/scikit-learn 2

scikit-learn main repo

alextp/groupcache 1

groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

alextp/autograd 0

Efficiently computes derivatives of numpy code.

alextp/community 0

Stores documents used by the TensorFlow developer community

pull request commenttensorflow/tensorflow

[WIP] DLPack functions

@jermainewang this is a PR to the TF repo and we're not cutting any new releases of TF 1.x from this repo, so no need to support 1.x.

VoVAllen

comment created time in 7 days

Pull request review commenttensorflow/tensorflow

[WIP] DLPack functions

+#include "tensorflow/c/eager/dlpack.h"+#include "include/dlpack/dlpack.h"  // TF:dlpack+#include "tensorflow/c/eager/c_api_internal.h"+#include "tensorflow/c/tf_status_helper.h"+#include "tensorflow/core/framework/tensor.h"+#include "tensorflow/core/platform/casts.h"++#include "tensorflow/core/framework/tensor_reference.h"+#include "tensorflow/core/platform/logging.h"++namespace tensorflow {++using tensorflow::Tensor;+using tensorflow::TensorHandleInterface;++namespace {++struct TFDLMTensor {+  TensorReference* handle;+  DLManagedTensor tensor;+};++TensorHandle* GetTensorHandleFromTFEHandle(TFE_TensorHandle* h,+                                           TF_Status* status) {+  if (h == nullptr || !h->handle->IsValid(&status->status)) {+    status->status = tensorflow::errors::InvalidArgument(+        "The passed in handle is a nullptr");+    return nullptr;+  }+  tensorflow::TensorHandle* handle =+      tensorflow::down_cast<tensorflow::TensorHandleInterface*>(h->handle.get())+          ->Handle();++  if (handle->IsRemote()) {+    status->status = tensorflow::errors::InvalidArgument(+        "TFE_TensorHandleDevicePointer may not be called on a remote tensor "+        "handle.");+    return nullptr;+  }+  return handle;+}++const Tensor* GetTensorFromHandle(TFE_TensorHandle* h, TF_Status* status) {+  TensorHandle* handle = GetTensorHandleFromTFEHandle(h, status);++  if (handle->IsRemote()) {+    status->status = tensorflow::errors::InvalidArgument(+        "TFE_TensorHandleDevicePointer may not be called on a remote tensor "+        "handle.");+    return nullptr;+  }+  tensorflow::Device* device(absl::get<tensorflow::Device*>(handle->device()));+  if (device != nullptr) {+    status->status = device->Sync();+    if (!status->status.ok()) {+      return nullptr;+    }+  }+  const tensorflow::Tensor* tensor;+  status->status = handle->Tensor(&tensor);+  if (!status->status.ok()) {+    return nullptr;+  }+  return tensor;+};++void deleter(DLManagedTensor* arg) {+  TFDLMTensor* owner = static_cast<TFDLMTensor*>(arg->manager_ctx);+  owner->handle->Unref();+  delete owner;+}++DLDataType getDLDataType(TF_DataType data_type, TF_Status* status) {+  DLDataType dtype;+  dtype.lanes = 1;+  dtype.bits = TF_DataTypeSize(data_type) * 8;+  switch (data_type) {+    case TF_DataType::TF_FLOAT:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_DOUBLE:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_INT32:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_UINT8:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_INT16:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_STRING:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_COMPLEX64:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_COMPLEX64 is not supported by dlpack");+      break;+    case TF_DataType::TF_INT64:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_BOOL:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_QINT8:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QINT8 is not supported by dlpack");+      break;+    case TF_DataType::TF_QUINT8:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QUINT8 is not supported by dlpack");+      break;+    case TF_DataType::TF_QINT32:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QINT32 is not supported by dlpack");+      break;+    case TF_DataType::TF_BFLOAT16:+      dtype.code = DLDataTypeCode::kDLBfloat;+      break;+    case TF_DataType::TF_QINT16:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QINT16 is not supported by dlpack");+      break;+    case TF_DataType::TF_QUINT16:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QUINT16 is not supported by dlpack");+      break;+    case TF_DataType::TF_COMPLEX128:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_COMPLEX128 is not supported by dlpack");+      break;+    case TF_DataType::TF_HALF:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_RESOURCE:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_RESOURCE is not supported by dlpack");+      break;+    case TF_DataType::TF_VARIANT:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_VARIANT is not supported by dlpack");+      break;+    case TF_DataType::TF_UINT32:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_UINT64:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    default:+      status->status = tensorflow::errors::InvalidArgument(

I meant something like status->status = errors::InvalidArgument(DataTypeString(dtype), " cannot be used with dlpack") or something like that.

VoVAllen

comment created time in 7 days

Pull request review commenttensorflow/tensorflow

[WIP] DLPack functions

+#include "tensorflow/c/eager/dlpack.h"+#include "include/dlpack/dlpack.h"  // TF:dlpack+#include "tensorflow/c/eager/c_api_internal.h"+#include "tensorflow/c/tf_status_helper.h"+#include "tensorflow/core/framework/tensor.h"+#include "tensorflow/core/platform/casts.h"++#include "tensorflow/core/framework/tensor_reference.h"+#include "tensorflow/core/platform/logging.h"++namespace tensorflow {++using tensorflow::Tensor;+using tensorflow::TensorHandleInterface;++namespace {++struct TFDLMTensor {+  TensorReference* handle;+  DLManagedTensor tensor;+};++TensorHandle* GetTensorHandleFromTFEHandle(TFE_TensorHandle* h,+                                           TF_Status* status) {+  if (h == nullptr || !h->handle->IsValid(&status->status)) {+    status->status = tensorflow::errors::InvalidArgument(+        "The passed in handle is a nullptr");+    return nullptr;+  }+  tensorflow::TensorHandle* handle =+      tensorflow::down_cast<tensorflow::TensorHandleInterface*>(h->handle.get())+          ->Handle();++  if (handle->IsRemote()) {+    status->status = tensorflow::errors::InvalidArgument(+        "TFE_TensorHandleDevicePointer may not be called on a remote tensor "

Probably different error message? (as in, I think we hit here without calling TFE_TensorHandleDevicePointer).

Same below.

VoVAllen

comment created time in 8 days

Pull request review commenttensorflow/tensorflow

[WIP] DLPack functions

+#include "tensorflow/c/eager/dlpack.h"+#include "include/dlpack/dlpack.h"  // TF:dlpack+#include "tensorflow/c/eager/c_api_internal.h"+#include "tensorflow/c/tf_status_helper.h"+#include "tensorflow/core/framework/tensor.h"+#include "tensorflow/core/platform/casts.h"++#include "tensorflow/core/framework/tensor_reference.h"+#include "tensorflow/core/platform/logging.h"++namespace tensorflow {++using tensorflow::Tensor;+using tensorflow::TensorHandleInterface;++namespace {++struct TFDLMTensor {+  TensorReference* handle;+  DLManagedTensor tensor;+};++TensorHandle* GetTensorHandleFromTFEHandle(TFE_TensorHandle* h,+                                           TF_Status* status) {+  if (h == nullptr || !h->handle->IsValid(&status->status)) {+    status->status = tensorflow::errors::InvalidArgument(+        "The passed in handle is a nullptr");+    return nullptr;+  }+  tensorflow::TensorHandle* handle =+      tensorflow::down_cast<tensorflow::TensorHandleInterface*>(h->handle.get())+          ->Handle();++  if (handle->IsRemote()) {+    status->status = tensorflow::errors::InvalidArgument(+        "TFE_TensorHandleDevicePointer may not be called on a remote tensor "+        "handle.");+    return nullptr;+  }+  return handle;+}++const Tensor* GetTensorFromHandle(TFE_TensorHandle* h, TF_Status* status) {+  TensorHandle* handle = GetTensorHandleFromTFEHandle(h, status);++  if (handle->IsRemote()) {+    status->status = tensorflow::errors::InvalidArgument(+        "TFE_TensorHandleDevicePointer may not be called on a remote tensor "+        "handle.");+    return nullptr;+  }+  tensorflow::Device* device(absl::get<tensorflow::Device*>(handle->device()));+  if (device != nullptr) {+    status->status = device->Sync();+    if (!status->status.ok()) {+      return nullptr;+    }+  }+  const tensorflow::Tensor* tensor;+  status->status = handle->Tensor(&tensor);+  if (!status->status.ok()) {+    return nullptr;+  }+  return tensor;+};++void deleter(DLManagedTensor* arg) {+  TFDLMTensor* owner = static_cast<TFDLMTensor*>(arg->manager_ctx);+  owner->handle->Unref();+  delete owner;+}++DLDataType getDLDataType(TF_DataType data_type, TF_Status* status) {+  DLDataType dtype;+  dtype.lanes = 1;+  dtype.bits = TF_DataTypeSize(data_type) * 8;+  switch (data_type) {+    case TF_DataType::TF_FLOAT:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_DOUBLE:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_INT32:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_UINT8:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_INT16:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_STRING:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_COMPLEX64:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_COMPLEX64 is not supported by dlpack");+      break;+    case TF_DataType::TF_INT64:+      dtype.code = DLDataTypeCode::kDLInt;+      break;+    case TF_DataType::TF_BOOL:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_QINT8:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QINT8 is not supported by dlpack");+      break;+    case TF_DataType::TF_QUINT8:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QUINT8 is not supported by dlpack");+      break;+    case TF_DataType::TF_QINT32:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QINT32 is not supported by dlpack");+      break;+    case TF_DataType::TF_BFLOAT16:+      dtype.code = DLDataTypeCode::kDLBfloat;+      break;+    case TF_DataType::TF_QINT16:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QINT16 is not supported by dlpack");+      break;+    case TF_DataType::TF_QUINT16:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_QUINT16 is not supported by dlpack");+      break;+    case TF_DataType::TF_COMPLEX128:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_COMPLEX128 is not supported by dlpack");+      break;+    case TF_DataType::TF_HALF:+      dtype.code = DLDataTypeCode::kDLFloat;+      break;+    case TF_DataType::TF_RESOURCE:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_RESOURCE is not supported by dlpack");+      break;+    case TF_DataType::TF_VARIANT:+      status->status = tensorflow::errors::InvalidArgument(+          "TF_VARIANT is not supported by dlpack");+      break;+    case TF_DataType::TF_UINT32:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    case TF_DataType::TF_UINT64:+      dtype.code = DLDataTypeCode::kDLUInt;+      break;+    default:+      status->status = tensorflow::errors::InvalidArgument(

You could probably fold a lot of the unsupported cases into default by using DataTypeString to get the dtype name

VoVAllen

comment created time in 8 days

Pull request review commenttensorflow/tensorflow

[WIP] DLPack functions

+#include "tensorflow/c/eager/dlpack.h"+#include "include/dlpack/dlpack.h"  // TF:dlpack+#include "tensorflow/c/eager/c_api_internal.h"+#include "tensorflow/c/tf_status_helper.h"+#include "tensorflow/core/framework/tensor.h"+#include "tensorflow/core/platform/casts.h"++#include "tensorflow/core/framework/tensor_reference.h"+#include "tensorflow/core/platform/logging.h"++namespace tensorflow {++using tensorflow::Tensor;

We don't need these using directives inside the namespace we're in.

VoVAllen

comment created time in 8 days

Pull request review commenttensorflow/tensorflow

[WIP] DLPack functions

 filegroup(     srcs = [         "c_api.h",         "c_api_experimental.h",+        "dlpack.h",     ],     visibility = ["//tensorflow:__subpackages__"], ) ++cc_library(+    name = "dlpack",

These files don't seem to be in the PR?

VoVAllen

comment created time in 8 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with thee design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++| keras-team/keras   | tensorflow/keras |+:------------------- |:------------------------------------------- |+|Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.|+|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.|+|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here|++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target the `tf-nightly` pip+package as dependency. This means any change to TF will take at most 24+hours to be reflected on the Keras side.+* The Keras code will be mirrored to a Google-internal code repository via Google-internal +tools within a very short time window after each change.+The Google-internal CI tests will run on HEAD for both Keras and TF code.+* The CI build for the repository on GitHub might break when it sees a+new version of `tf-nightly`, if  certain behavior has been changed and wasn't+caught by unit tests. We have  observed a few similar cases with+[tf/addons](https://github.com/tensorflow/addons).+We hope this can be reduced by stronger+unit test coverage by Google internel systems, when both TF and Keras code are +tested at HEAD.+* pip package management. Keras will now follow the `tf-estimator` approach. +"pip install tensorflow" should also install Keras (from PyPI) as well.+There are more details for the pip package in the+[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC.++### Dependency Cleanup++As the high-level API of TensorFlow, Keras should have a direct dependency on+TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse +logic in the TF code that relies on Keras, which we should update/remove +when we split the repository.++The current usage of Keras from TensorFlow are:+* Unit tests, which should be converted to integration tests, or port the tests+to Keras repository.+* `feature_column`.+* Legacy `tf.layers` in v1 API.+* legacy RNN cells.+* TPU support code for `optimizer_v2`.+* SavedModel.+* TF Lite.++All Keras imports in integration tests can be changed to use dynamic import like below:

I have strong reservations about this type of cross-package circular dependency.

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with the design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++Pros:+1. Under the umbrella of Keras SIG, which hosts all other Keras related projects+like keras-application, KerasTuner etc.+1. Lots of existing followers on keras-team, who may not be easily migrated to +TF project.+1. Can't easily delete keras project, which already have tons of stars and +incoming reference links. Continued existence of external Keras code will create+confusion ("why is there tensorflow/keras AND keras-team/keras?").++Cons:+1. The repo isn't under the same organization as tensorflow, which makes it hard+to manage issues/PRs and references across the organization.+1. Existing issue/PR under the same org can be transferred easily, but not cross the different org. See [here](https://help.github.com/en/github/managing-your-work-on-github/transferring-an-issue-to-another-repository).++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target a stable PIP 

This has the disadvantage of slowing down keras's adoption of new TF features and APIs.

This document should clarify when would CI against tf stable be expected to break, etc.

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with thee design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++| keras-team/keras   | tensorflow/keras |+:------------------- |:------------------------------------------- |+|Under the umbrella of Keras SIG, which hosts all other Keras related projects like keras-application, KerasTuner etc.|Under the umbrella of tensorflow, which also hosts other TF related projects.|+|Lots of existing followers on keras-team, who may not be easily migrated to TF project.|No cross org repo management cost on GitHub. Could rely on a lot of existing setup in TensorFlow.|+|Can't easily delete keras project, which already have tons of stars and incoming reference links. Continued existence of external Keras code will create confusion ("why is there tensorflow/keras AND keras-team/keras?")|Issue/PR under the same org can be transferred easily, but not cross the different org. See here|++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target the `tf-nightly` pip+package as dependency. This means any change to TF will take at most 24+hours to be reflected on the Keras side.+* The Keras code will be mirrored to a Google-internal code repository via Google-internal +tools within a very short time window after each change.+The Google-internal CI tests will run on HEAD for both Keras and TF code.+* The CI build for the repository on GitHub might break when it sees a+new version of `tf-nightly`, if  certain behavior has been changed and wasn't+caught by unit tests. We have  observed a few similar cases with+[tf/addons](https://github.com/tensorflow/addons).+We hope this can be reduced by stronger+unit test coverage by Google internel systems, when both TF and Keras code are +tested at HEAD.+* pip package management. Keras will now follow the `tf-estimator` approach. +"pip install tensorflow" should also install Keras (from PyPI) as well.+There are more details for the pip package in the+[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC.++### Dependency Cleanup++As the high-level API of TensorFlow, Keras should have a direct dependency on+TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse +logic in the TF code that relies on Keras, which we should update/remove +when we split the repository.++The current usage of Keras from TensorFlow are:

+1.

This document needs a plan to break this dependency before it can be seriously considered.

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with the design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++Pros:+1. Under the umbrella of Keras SIG, which hosts all other Keras related projects+like keras-application, KerasTuner etc.+1. Lots of existing followers on keras-team, who may not be easily migrated to +TF project.+1. Can't easily delete keras project, which already have tons of stars and +incoming reference links. Continued existence of external Keras code will create+confusion ("why is there tensorflow/keras AND keras-team/keras?").++Cons:+1. The repo isn't under the same organization as tensorflow, which makes it hard+to manage issues/PRs and references across the organization.+1. Existing issue/PR under the same org can be transferred easily, but not cross the different org. See [here](https://help.github.com/en/github/managing-your-work-on-github/transferring-an-issue-to-another-repository).++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target a stable PIP +version of tensorflow package as dependency. It could either be (preferably in+this order):+  * a stable version+  * a release candidate version+  * a `tf-nightly` with explicit version.+Using a nightly version for testing should be motivated by the usage of a API+feature not present in the stable or pre-release version.+Depend on a floating `tf-nightly` could cause CI build to be instable, which has+been observed in other repository +[like tf-addons](https://github.com/tensorflow/addons/pull/912).+* The Keras code will be mirrored to a Google-internal code repository via+Google-internal tools within a very short time window after each change.+The Google-internal CI tests will run on HEAD for both Keras and TF code.+* The CI build for the repository on GitHub might break when it points to a+new version of `tf-nightly`, if certain behavior has been changed and wasn't+caught by unit tests. We have observed a few similar cases with+[tf/addons](https://github.com/tensorflow/addons).+We hope this can be reduced by stronger unit test coverage by Google internel+systems, when both TF and Keras code are tested at HEAD.+* pip package management. Keras will now follow the `tf-estimator` approach. +"pip install tensorflow" should also install Keras (from PyPI) as well.+There are more details for the pip package in the+[Improved pip package structure](https://github.com/tensorflow/community/pull/182) RFC.++### Dependency Cleanup++As the high-level API of TensorFlow, Keras should have a direct dependency on+TF low-level APIs, but not the other way around. Unfortunately, there is some existing reverse +logic in the TF code that relies on Keras, which we should update/remove +when we split the repository.++The current usage of Keras from TensorFlow are:+* Unit tests, which should be converted to integration tests, or port the tests+to Keras repository.+* `feature_column`, which uses Keras base layer and model.

These dependencies are very concerning.

Does this mean we need a third package (tf, tf-keras, and tf-stuff-that-depends-on-keras)? Where would the code for this third package live? How can we test its features?

Or are splitting all the things which depend on keras despite not being historically a part of the keras API into the keras repo?

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with the design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.+++## Design Proposal++### New location of the code++GitHub: the code will live at [keras-team/keras](https://github.com/keras-team/keras), +joining the other Keras SIG projects and replacing the current external Keras +codebase. `tf.Keras` will also replace Keras on PyPI.++Also considered: `tensorflow/keras`.++Pros:+1. Under the umbrella of Keras SIG, which hosts all other Keras related projects+like keras-application, KerasTuner etc.+1. Lots of existing followers on keras-team, who may not be easily migrated to +TF project.+1. Can't easily delete keras project, which already have tons of stars and +incoming reference links. Continued existence of external Keras code will create+confusion ("why is there tensorflow/keras AND keras-team/keras?").++Cons:+1. The repo isn't under the same organization as tensorflow, which makes it hard+to manage issues/PRs and references across the organization.+1. Existing issue/PR under the same org can be transferred easily, but not cross the different org. See [here](https://help.github.com/en/github/managing-your-work-on-github/transferring-an-issue-to-another-repository).++### Source of Truth++TensorFlow uses a Google-internal code repository as its source of truth. Every PR+submitted though GitHub is converted to a Google-internal change first,+submitted through the internal system, and then copied to GitHub as commits.+At the same time, PR is marked as merged with the corresponding commit hash.++Likewise, issue tracking and code review takes place through Google-internal tools.++For Keras, since we are trying to promote community engagement, we hope to use +GitHub as source of truth. This will have the following implications:++* We expect the majority of the code development/contribution from GitHub+and the dev tools / tests / scripts should focus on the GitHub development use+case. See below for more details.+* Keras CI/presubmit build for the GitHub repo should target a stable PIP 

(also important to highlight the release process for TF)

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with the design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.++

Specifically, I'd like to see this proposal cover alternative approaches to the problems we've outlined, including a single repository with independent continuous integration, a tf-python repository separate from a tf-core repository, etc.

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity

That to me is the main motivation which could justify splitting the repositories (the build time one just suggests we should split the continuous integration tooling, not the repositories).

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit

Isn't this the same as the build time section above?

qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of+TensorFlow, which is quite expensive to do for average users.+Having a separate repository will allow the Keras package to be built+without building TensorFlow. This should greatly improve the +velocity of open-source developers when they contribute to Keras code.++### Community Benefit++The difficulty of building TensorFlow from scratch in order to make a PR+to Keras code has been a significant source of issues:++* It discouraged contributions, since many external developers couldn't test+their changes and make sure they were correct.+* External developers would send unverified PRs, and Google reviewers spend time back +and forth, fixing the PR. Sometimes PR is just not moving forward because of the+lengthy feedback loop.++With the new standalone Keras repository, external contributors+should experience much shorter turn-around time when +building/testing Keras, since they don't need to build TensorFlow anymore.+This should  have a positive impact on building a vibrant open-source+developer community.++In addition, by getting the Keras team at Google to start developing Keras+using the same public tools and infrastructure as third-party developers,+we make the development process more transparent and more community-oriented.++### TensorFlow API modularity++There are other side-benefits if we split the repository. Currently, Keras+has to rely on a number of private TensorFlow APIs. However, a litmus test+of the quality of the public TensorFlow low-level APIs is that they should+be strictly sufficient to a higher-level API like Keras.+After splitting the repository, Keras will have to import TensorFlow and +rely exclusively on public APIs. If Keras still ends up using TensorFlow+private features, it  might be an indication of tight coupling of+implementation details. If certain private features are extensively used,+we might want to consider exposing them  as public low level API.++This design is also aligned with the design for+[Modular TensorFlow](https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md), +which splits the TensorFlow project into smaller components that are not+tightly coupled together.++

There are also other advantaged worth mentioning. For example a keras repository makes it easier to share some of the management with the keras SIG than the single shared TF repository.

We need to also consider drawbacks of this decision here.

Specifically:

  1. cross-repository changes will be more expensive to submit (and now most keras changes would cross the repository boundary)
  2. CI will be more complicated (we'll need to test a product of TF and keras versions as opposed to a single shared version). This is complicated further by pulling these two repositories inside google's version control system, for some of us.
  3. Previous experience splitting tf.estimator with the same type of argument justifying it has been considered a failure. How are we sure keras won't fall prey to the same issues?
qlzh727

comment created time in 13 days

Pull request review commenttensorflow/community

RFC: Standalone Keras Repository

+# Standalone Keras Repository++| Status        | Proposed |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [202](https://github.com/tensorflow/community/pull/202) |+| **Author(s)** | Qianli Zhu (scottzhu@google.com), Francois Chollet (fchollet@google.com) |+| **Sponsor**   | Karmel Allison (karmel@google.com) |+| **Updated**   | 2020-02-05                         |++## Objective++Move the Keras code from the TensorFlow main GitHub repository to its own+repository, with TensorFlow as a dependency.++## Motivation++### Build times++Building the open-source TensorFlow project end-to-end is an extensive exercise. +With a standard GCP instance, it might take more than one hour to finish the whole+build process (it might take longer with a Mac laptop). Although the local build +cache might help speed up the follow-up builds, the initial time cost is too +high for regular software development workflows. Internally, Google has a+distributed build and caching service, which Googlers heavily rely on,+that can build TensorFlow and run all Keras tests within 5 mins. Sadly,+we can't expose this to external contributors.++Currently, any contribution to Keras code will require building all of

These requirements around build time apply to all of TF's python APIs, not just Keras.

So to me this suggests we want a more aggressive separation between python TF and C++ TF's build and test systems, not anything specifically to do with keras.

For example, if we required that every TF python test passed with the previous nightly's pywrap_tensorflow.so across the TF python surface we could dramatically simplify the development for much more than keras.

I don't mean to delay / disrupt the keras migration process, which I think is a good idea for other reasons, I just don't want us to waste the opportunity to improve the development experience.

qlzh727

comment created time in 13 days

issue commenttensorflow/tensorflow

tf.range + for x,y in dataset issue

The lowered ops are currently placed on the wrong device. If we had device assignments for the ops before/after each lowered op when we added the lowered op we could add them on a device which would minimize communication.

On Tue, Feb 11, 2020 at 10:39 AM Saurabh Saxena notifications@github.com wrote:

@alextp https://github.com/alextp I don't understand your suggestion. IIUC we must place the lowered control flow ops so we will still need update the placement logic?

@ezhulenev https://github.com/ezhulenev probably understands the placement logic better. Looking at the example here it seems we need to recognize patterns like Input(CPU) -> Enter -> Merge -> Switch -> Identity -> Consumer (CPU) and if so place the entire chain(-s in case of nesting) on CPU?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/34519?email_source=notifications&email_token=AAABHROLCIIYZQXBITFQNRLRCLWF5A5CNFSM4JQPHXJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELNSMCI#issuecomment-584787465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRLRLIXZULXO45B4XMTRCLWF5ANCNFSM4JQPHXJQ .

--

  • Alex
SSSxCCC

comment created time in 15 days

issue commenttensorflow/tensorflow

tf.range + for x,y in dataset issue

@sanjoy I agree; sadly TF doesn't let you specify a kernel which doesn't look at its tensor content (and I don't know how to fold that information in to the current placer algorithm).

@lindong28 @jsimsa should we add a registry of variant-ops-whose-outputs-cannot-be-copied and seed that with the tf.data ops then?

@saxenasaurabh maybe we can avoid this placing business altogether if we can delay inlining the control flow functions until after placement?

SSSxCCC

comment created time in 15 days

pull request commenttensorflow/tensorflow

Add loss scale optimizer for v1 optimizers

This really shouldn't happen. Can you find a colab example to reproduce or something like that?

MattConley

comment created time in 15 days

PR closed tensorflow/tensorflow

Reviewers
Bug in LSTMBlockCell gpu implementation cla: no size:XL

Fixed bug in cuda kernel of LSTMBlockCell, also implemented test which checks LSTMCell and LSTMBlockCell compatibility on cpu and gpu.

+36 -2

2 comments

3 changed files

vladbataev

pr closed time in 16 days

pull request commenttensorflow/tensorflow

Bug in LSTMBlockCell gpu implementation

This is not a backwards-compatible change. Also we're not releasing new 1.15 versions, just patch releases with security fixes, so please rebase this PR onto master.

vladbataev

comment created time in 16 days

pull request commenttensorflow/tensorflow

Add loss scale optimizer for v1 optimizers

Is this with eager or with graph? Could it be the persistent gradient tape?

On Sun, Feb 9, 2020 at 9:40 PM x10000year notifications@github.com wrote:

@alextp https://github.com/alextp another problem is that, when i use dynamic loss scale, the model training consumes significantly more gpu memory, which leads to OOM when the batch size is big. do you have any idea? thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/27655?email_source=notifications&email_token=AAABHRIKBZLL566OO4G4RF3RCDSFRA5CNFSM4HEM4LS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELHJNQA#issuecomment-583964352, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRLCB7GKZ4DDWEMQL7DRCDSFRANCNFSM4HEM4LSQ .

--

  • Alex
MattConley

comment created time in 16 days

issue commenttensorflow/tensorflow

Support a generic API for modifying loss / gradients in Keras

I'm not saying a single layer, just trying to orthogonalize things a bit more.

For example, stuff related to gradient computation can be done in the gradient tape; this includes the LossScalingGradientTape and Reed's proposal of an AllReduceGradientTape (which might be necessary for loss scaling).

I can't tell if gradient reduction is better handled inside the gradient tape or outside in the body of the training loop (since it's just a call to reduce).

On Mon, Feb 3, 2020 at 1:32 PM omalleyt12 notifications@github.com wrote:

apply_gradients will unscale gradients, but not scale loss

@reedwm https://github.com/reedwm If you look at the Optimizer.apply_gradients method I proposed above, it does not modify the user-supplied gradients. The Optimizer.apply_gradients method I am proposing is status quo.

Because they're orthogonal they belong in a separate layer where they can be changed without risking a change to the other concerns. So I think we'd be better off if we had separate things in the code for separate concerns.

@alextp https://github.com/alextp What would this separate layer look like? It seems like we'd need an object that wraps the Optimizer, does some preprocessing, and then passes the gradients to the Optimizer. But this object would need to have the same API as Optimizer.minimize and/or Optimizer.apply_gradients, so why not just have it be an optimizer?

And if we don't wrap the Optimizer, then how do we support Optimizer.minimize?

Implementing DistributedOptimizer, LossScalingOptimizer, etc via composition seems like a good way to separate the parameter updates from the other logic

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/36398?email_source=notifications&email_token=AAABHROLUS34NFXZK2NKOLTRBCEPJA5CNFSM4KOQZZF2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKVPFHA#issuecomment-581628572, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRIFPQNMNNEQ4CKPVDTRBCEPJANCNFSM4KOQZZFQ .

--

  • Alex
tgaddair

comment created time in 23 days

issue commenttensorflow/tensorflow

Support a generic API for modifying loss / gradients in Keras

I'm thinking more of "what code has to change when users switch optimizers". And for today's definition of sgd-style optimizers we often use in deep learning, what needs to change is how the gradients are applied. This is the difference between SGD and Adam, etc.

Whether gradients are reduced or not is a policy decision which belongs at a layer other than the optimizer. For example, in some distributed systems it makes sense to do many local steps before reducing the gradients (if RPCs are slow/expensive); in other cases it makes sense to never reduce gradients at all and instead just occasionally average the weights of different replicas (like in federated learning).

These concerns (of what distributed training setup precisely we want to implement) are completely orthogonal to the concerns of how to actually perform the parameter update (if there's momentum, if there's some adaptation, maybe a second-order method like in kfac, etc). Because they're orthogonal they belong in a separate layer where they can be changed without risking a change to the other concerns.

Pushing extra complexity in the optimizer is unfortunate because it makes it harder to implement new optimizers or modify existing ones (and we're still regularly getting papers which make important changes to how optimizers do their gradient updates).

So while I agree that pushing this complexity into the optimizer class it self makes the training loop super simple, I think this complexity really belongs in the training loop, as it's unrelated to how exactly the weights are being modified given the gradients. Pushing this complexity into the optimizers also means users have to write optimizers when they want to do things which don't feel like changes to the optimizer, like normalizing gradients.

So I think we'd be better off if we had separate things in the code for separate concerns. The optimizer class would deal with apply_gradients, and other classes would deal with loss scaling, gradient reduction, etc. This way a full solution can be composed out of many small individually simple pieces as opposed to looking simple in the surface but relying on complicated hard-to-integrate pieces, which have to be modified by inheritance, and which have many orthogonal unrelated methods.

tgaddair

comment created time in 23 days

issue commenttensorflow/tensorflow

Support a generic API for modifying loss / gradients in Keras

Sorry I'm so late to this thread.

Going back to the original proposal, I agree there is a need to customize what happens in the training loop in the steps listed (before computing gradients, after computing gradients, before applying gradients, and maybe the conditional should apply).

I just wish this extension was done outside the Optimizer class. Ideally the optimizer classes should just concern themselves with applying the actual SGD update; the only reason things like gradient reduction, loss scaling, and others have been pushed into the optimizers is that optimizers were a tempting extension point (specially with TF's optimizer.minimize API and keras's get_updates API being a little too broad).

Can we add these four methods (or small variations thereof) to the keras callback class and allow users to use callbacks to drive fp16 loss scaling, distributed gradient aggregation, and friends, while pushing complexity out of optimizers so they can look as clean as the math in the research papers?

tgaddair

comment created time in 23 days

issue commenttensorflow/tensorflow

keras freezing at last step in the first epoch

@omalleyt12 do you know what's going on here?

divyag11

comment created time in a month

pull request commenttensorflow/tensorflow

fix the issue of py_func

Yes the user would need to do this conversion.

On Tue, Jan 28, 2020 at 6:18 PM Leslie-Fang notifications@github.com wrote:

@Leslie-Fang commented on this pull request.

In tensorflow/python/ops/script_ops.py https://github.com/tensorflow/tensorflow/pull/36283#discussion_r372159272 :

@@ -116,6 +119,10 @@ def _convert(self, value, dtype): # TODO(akshayka): Make it possible to return a list of both Tensors and # Nones from an EagerPyFunc. return constant_op.constant(0.0, dtype=dtype)

  • if sparse_tensor.is_sparse(value):
  •  value = sparse_ops.sparse_tensor_to_dense(value)
    

Thanks @alextp https://github.com/alextp If we return the sparse_tensor as a list of 3 eager_tensors. Does the user needs convert the 3 eager_tensors back into the sparse_tensor? Do you have any suggestions?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/36283?email_source=notifications&email_token=AAABHRLEACXXUCI7RL32SZTRADROJA5CNFSM4KMQ4UE2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCTM7ETY#discussion_r372159272, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNLNMVKCAA3ZYRYHH3RADROJANCNFSM4KMQ4UEQ .

--

  • Alex
Leslie-Fang

comment created time in a month

Pull request review commenttensorflow/tensorflow

fix the issue of py_func

 def _convert(self, value, dtype):       # TODO(akshayka): Make it possible to return a list of both Tensors and       # Nones from an EagerPyFunc.       return constant_op.constant(0.0, dtype=dtype)+    if sparse_tensor.is_sparse(value):+      value = sparse_ops.sparse_tensor_to_dense(value)

The EagerPyFunc has to return eager tensors; sparsetensor is not a tensor, it's a python type with 3 tensors under it. If you return the 3 underlying tensors things should be fine

Leslie-Fang

comment created time in a month

issue commenttensorflow/tensorflow

Expand registered kernels for variable ops on GPU

Replacing the registration to include TF_CALL_GPU_ALL_TYPES SGTM. @sanjoy for FYI.

dirktheeng

comment created time in a month

issue commenttensorflow/tensorflow

Very bad performance using Gradient Tape

Please file a separate bug for your problem.

On Sun, Jan 26, 2020 at 10:33 AM Xiaokang Wang notifications@github.com wrote:

Sorry for late reply, @robieta https://github.com/robieta thanks, it worked. And as of now I'm getting better performance using tf.function(55.secs) as compared to tf.keras .fit(64.8 secs), probably because .fit does some initialization before training starts. But unfortunately in eager execution (without tf.function) its taking 270 secs. And if we compare this result with MXNet and Pytorch, it turns out Tensorflow eager execution is over 8x slower than MXNet and 7.5x slower than Pytorch. I've no problem using tf.function but there are some tricky parts over using it, that I don't like, for instance my version of train function definition using tf.function doesn't work, while your's works like champ. As of now I've got the working solution so I'm closing the issue after next reply, but it'd be great if eager execution is a bit faster than what it is.

Thanks for help.

Hi, I was also training a tf.keras model. The loss reduced very quickly when calling model.fit but it plateaued very quickly when I updated the weights manually using tf.GradientTape(). Could I know how you figured out the reason?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/30596?email_source=notifications&email_token=AAABHRKRVI5GO3CNNJL57QLQ7XJQHA5CNFSM4IBANHP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJ52S6Q#issuecomment-578529658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRLIJW7NCL3LPFGQW2TQ7XJQHANCNFSM4IBANHPQ .

--

  • Alex
braindotai

comment created time in a month

pull request commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple and export it

@pooyadavoodi you might be running it on the wrong directory (it assumes it'll be run from within the root directory)

pooyadavoodi

comment created time in a month

pull request commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple and export it

Backward compatibility doesn't require that we cannot change the default value, just that the new default value has the same meaning as the old default value. So Sanjoy's suggestion is what needs to happen her.e

On Thu, Jan 23, 2020 at 5:32 PM Sanjoy Das notifications@github.com wrote:

@sanjoy commented on this pull request.

In tensorflow/python/compiler/tensorrt/trt_convert.py https://github.com/tensorflow/tensorflow/pull/35198#discussion_r370439586 :

@@ -951,7 +971,7 @@ def init(self, input_saved_model_dir=None, input_saved_model_tags=None, input_saved_model_signature_key=None,

  •           conversion_params=DEFAULT_TRT_CONVERSION_PARAMS):
    
  •           conversion_params=TrtConversionParams()):
    

We can't change the default value to None due to backward compatibility.

I may be missing some Python subtleties, but can you default it to None and then within the function do

if conversion_params is None: conversion_params = TrtConversionParams()

?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35198?email_source=notifications&email_token=AAABHRJEHMBJ22ZATNO3TTLQ7JAKRA5CNFSM4J37OG22YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCS42WVI#discussion_r370439586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNF3KVVAWGFLFWP7LLQ7JAKRANCNFSM4J37OG2Q .

--

  • Alex
pooyadavoodi

comment created time in a month

Pull request review commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple and export it

 def __init__(self,                input_saved_model_dir=None,                input_saved_model_tags=None,                input_saved_model_signature_key=None,-               conversion_params=DEFAULT_TRT_CONVERSION_PARAMS):+               conversion_params=TrtConversionParams()):

This still uses a mutable object as the default value of the function. That's quite dangerous as this object can be inadvertently overwritten.

tf-api-owners prefer keeping TrtConversionParams immutable and using a replace method or, if you do want it mutable, pass None here and create a default instance in the body of the function

See https://docs.quantifiedcode.com/python-anti-patterns/correctness/mutable_default_value_as_argument.html for a reference on why mutable default arguments are dangerous.

pooyadavoodi

comment created time in a month

Pull request review commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple and export it

 def supported_precision_modes(): # so it can produce reasonable performance results with the default. DEFAULT_TRT_MAX_WORKSPACE_SIZE_BYTES = 1 << 30 -# TrtConversionParams encapsulates the parameters that are used for TF-TRT-# conversion.-TrtConversionParams = collections.namedtuple(-    "TrtConversionParams",-    [--        # A template RewriterConfig proto used to create a TRT-enabled-        # RewriterConfig. If None, it will use a default one.-        "rewriter_config_template",--        # The maximum GPU temporary memory which the TRT engine can use at-        # execution time. This corresponds to the 'workspaceSize' parameter of-        # nvinfer1::IBuilder::setMaxWorkspaceSize().-        "max_workspace_size_bytes",--        # One of TrtPrecisionMode.supported_precision_modes().-        "precision_mode",--        # The minimum number of nodes required for a subgraph to be replaced by-        # TRTEngineOp.-        "minimum_segment_size",--        # Whether to generate dynamic TRT ops which will build the TRT network-        # and engine at run time.-        # i.e. Since TensorRT version < 6.0 does not support dynamic dimensions-        # other than the batch dimension, when the TensorFlow graph has a-        # non-batch dimension of dynamic size, we would need to enable this-        # option. This option should be set to True in TF 2.0.-        "is_dynamic_op",--        # Max number of cached TRT engines for dynamic TRT ops.-        # Created TRT engines for a dynamic dimension are cached.-        # This is the maximum number of engines that can be cached.-        # If the number of cached engines is already at max but none of them-        # supports the input shapes, the TRTEngineOp will fall back to run the-        # original TF subgraph that corresponds to the TRTEngineOp.-        "maximum_cached_engines",--        # This argument is ignored if precision_mode is not INT8. If set to-        # True, a calibration graph will be created to calibrate the missing-        # ranges. The calibration graph must be converted to an inference graph-        # by running calibration with calibrate(). If set to False, quantization-        # nodes will be expected for every tensor in the graph (exlcuding those-        # which will be fused). If a range is missing, an error will occur.-        # Please note that accuracy may be negatively affected if there is a-        # mismatch between which tensors TRT quantizes and which tensors were-        # trained with fake quantization.-        "use_calibration",--        # Max size for the input batch.-        # This parameter is only effective when is_dynamic_op=False which-        # is not supported in TF 2.0.-        "max_batch_size",-    ])--DEFAULT_TRT_CONVERSION_PARAMS = TrtConversionParams(-    rewriter_config_template=None,-    max_workspace_size_bytes=DEFAULT_TRT_MAX_WORKSPACE_SIZE_BYTES,-    precision_mode=TrtPrecisionMode.FP32,-    minimum_segment_size=3,-    is_dynamic_op=True,-    maximum_cached_engines=1,-    use_calibration=True,-    max_batch_size=1)++@tf_export("experimental.tensorrt.ConversionParams", v1=[])+class TrtConversionParams(object):+  """ A class to encapsulate parameters that are used for TF-TRT conversion."""++  def __init__(self,+               rewriter_config_template=None,+               max_workspace_size_bytes=DEFAULT_TRT_MAX_WORKSPACE_SIZE_BYTES,+               precision_mode=TrtPrecisionMode.FP32,+               minimum_segment_size=3,+               is_dynamic_op=True,+               maximum_cached_engines=1,+               use_calibration=True,+               max_batch_size=1):+    """Initialize TrtConversionParams.++    Args:+      rewriter_config_template: a template RewriterConfig proto used to create a+        TRT-enabled RewriterConfig. If None, it will use a default one.+      max_workspace_size_bytes: the maximum GPU temporary memory which the TRT+        engine can use at execution time. This corresponds to the+        'workspaceSize' parameter of nvinfer1::IBuilder::setMaxWorkspaceSize().+      precision_mode: one of TrtPrecisionMode.supported_precision_modes().+      minimum_segment_size: the minimum number of nodes required for a subgraph+        to be replaced by TRTEngineOp.+      is_dynamic_op: whether to generate dynamic TRT ops which will build the+        TRT network and engine at run time. i.e. Since TensorRT version < 6.0+        does not support dynamic dimensions other than the batch dimension,+        when the TensorFlow graph has a non-batch dimension of dynamic size,+        we would need to enable this option. This option should be set to True+        in TF 2.0.+      maximum_cached_engines: max number of cached TRT engines for dynamic TRT+        ops. Created TRT engines for a dynamic dimension are cached. This is+        the maximum number of engines that can be cached. If the number of+        cached engines is already at max but none of them supports the input+        shapes, the TRTEngineOp will fall back to run the original TF subgraph+        that corresponds to the TRTEngineOp.+      use_calibration: this argument is ignored if precision_mode is not INT8.+        If set to True, a calibration graph will be created to calibrate the+        missing ranges. The calibration graph must be converted to an inference+        graph by running calibration with calibrate(). If set to False,+        quantization nodes will be expected for every tensor in the graph+        (exlcuding those which will be fused). If a range is missing, an error+        will occur. Please note that accuracy may be negatively affected if+        there is a mismatch between which tensors TRT quantizes and which+        tensors were trained with fake quantization.+      max_batch_size: max size for the input batch. This parameter is only+        effective when is_dynamic_op=False which is not supported in TF 2.0.+    """+    self.rewriter_config_template = rewriter_config_template+    self.max_workspace_size_bytes = max_workspace_size_bytes+    self.precision_mode = precision_mode+    self.minimum_segment_size = minimum_segment_size+    self.is_dynamic_op = is_dynamic_op+    self.maximum_cached_engines = maximum_cached_engines+    self.use_calibration = use_calibration+    self.max_batch_size = max_batch_size++  def _replace(self,+               rewriter_config_template=None,+               max_workspace_size_bytes=None,+               precision_mode=None,+               minimum_segment_size=None,+               is_dynamic_op=None,+               maximum_cached_engines=None,+               use_calibration=None,+               max_batch_size=None,+    """Set value of class data members only if they are passed as arguments.++    We need this function for backward compatibility with NamedTuple+    TrtConversionParams which forced users to use the following syntax to+    set parameters:+      DEFAULT_TRT_CONVERSION_PARAMS._replace(...)+    """+    trt_conversion_params = TrtConversionParams()+    for k, v in vars().items():+      if v and (k != "self"):+        setattr(trt_conversion_params, k, v)+    return trt_conversion_params+

Can this class have a str or repr method so the API pbtxt file doesn't end up with python IDs in it?

pooyadavoodi

comment created time in a month

Pull request review commenttensorflow/tensorflow

Change TrtConversionParams to class from NamedTuple and export it

 class TrtGraphConverterV2(object):   1. FP32/FP16 precision       ```python-     params = DEFAULT_TRT_CONVERSION_PARAMS._replace(+     params = tf.experimental.tensorrt.ConversionParams(

this module doesn't "import tensorflow as tf" so it can't use tf.experimental as a symbol (I don't think this code even runs?)

pooyadavoodi

comment created time in a month

pull request commenttensorflow/tensorflow

Fix tf.range failure when `limit` is type of `tf.int32` and `dtype` is `tf.int64`

Right, but this case should be an error; tf ops are not supposed to do silent type promotion.

On Mon, Jan 13, 2020 at 12:57 PM Yong Tang notifications@github.com wrote:

@yongtang commented on this pull request.

In tensorflow/python/ops/math_ops.py https://github.com/tensorflow/tensorflow/pull/35821#discussion_r366023997 :

@@ -1487,9 +1487,12 @@ def range(start, limit=None, delta=1, dtype=None, name="range"): # pylint: disa start, limit = 0, start

with ops.name_scope(name, "Range", [start, limit, delta]) as name:

  • start = ops.convert_to_tensor(start, dtype=dtype, name="start")
  • limit = ops.convert_to_tensor(limit, dtype=dtype, name="limit")
  • delta = ops.convert_to_tensor(delta, dtype=dtype, name="delta")
  • if not isinstance(start, ops.Tensor):
  •  start = ops.convert_to_tensor(start, dtype=dtype, name="start")
    
  • if not isinstance(limit, ops.Tensor):
  •  limit = ops.convert_to_tensor(limit, dtype=dtype, name="limit")
    
  • if not isinstance(delta, ops.Tensor):
  •  delta = ops.convert_to_tensor(delta, dtype=dtype, name="delta")
    

Thanks @alextp https://github.com/alextp for the review. In #35710 https://github.com/tensorflow/tensorflow/issues/35710 the case was that the dtype passed along with tf.range(start, limit, delta, dtype) is different from the tensor of the start/limit/delta. In other words, the following:

tf.range(tf.constant(4, dtype=tf.int32), dtype=tf.int64)

In that case, convert_to_tensor will try to convert tf.constant(4, dtype=tf.int32) to dtype=tf.int64 and that is returning an error. That is the issue from #35710 https://github.com/tensorflow/tensorflow/issues/35710 .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35821?email_source=notifications&email_token=AAABHRO42RDZFWIHLQJVZZTQ5TIUBA5CNFSM4KGEOJB2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRSIVXI#discussion_r366023997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRITXW4FHYM7UMJ5LCDQ5TIUBANCNFSM4KGEOJBQ .

--

  • Alex
yongtang

comment created time in a month

Pull request review commenttensorflow/tensorflow

Fix tf.range failure when `limit` is type of `tf.int32` and `dtype` is `tf.int64`

 def range(start, limit=None, delta=1, dtype=None, name="range"):  # pylint: disa       assert all(arg.dtype in dtype_hierarchy for arg in [start, limit, delta])       inferred_dtype = max([arg.dtype for arg in [start, limit, delta]],                            key=dtype_hierarchy.index)--      start = cast(start, inferred_dtype)-      limit = cast(limit, inferred_dtype)-      delta = cast(delta, inferred_dtype)+    else:+      inferred_dtype = dtype+    # Always try perform a cast even start/limit/delta are already tensors.

We don't want to do this; TF likes to fail loudly when mixing dtypes as it's not obvious what is the user intent (and casts can always be inserted at the call site)

yongtang

comment created time in a month

Pull request review commenttensorflow/tensorflow

Fix tf.range failure when `limit` is type of `tf.int32` and `dtype` is `tf.int64`

 def range(start, limit=None, delta=1, dtype=None, name="range"):  # pylint: disa     start, limit = 0, start    with ops.name_scope(name, "Range", [start, limit, delta]) as name:-    start = ops.convert_to_tensor(start, dtype=dtype, name="start")-    limit = ops.convert_to_tensor(limit, dtype=dtype, name="limit")-    delta = ops.convert_to_tensor(delta, dtype=dtype, name="delta")+    if not isinstance(start, ops.Tensor):+      start = ops.convert_to_tensor(start, dtype=dtype, name="start")+    if not isinstance(limit, ops.Tensor):+      limit = ops.convert_to_tensor(limit, dtype=dtype, name="limit")+    if not isinstance(delta, ops.Tensor):+      delta = ops.convert_to_tensor(delta, dtype=dtype, name="delta")

convert_to_tensor already is a no-op if the input is a tensor, so what is this doing?

yongtang

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,+    `group_by_window`, `ignore_errors`, `interleave`, `list_files`, `map`,+    `range`, `repeat`, `padded_batch`, `prefetch`, `shuffle`, `SSTableDataset`,+    `TextLineDataset`, `TFRecordDataset`, `unbatch`, `window`.++### Master and worker services++This section discusses the design for the master and worker services. These+services are used by the Python API to provide distributed dataset processing,+and these services use the splitting API as a part of their implementation.++#### Master API++The master is responsible for registering datasets, generating and tracking+iteration and worker ids, and generating dataset splits for processing on+workers.++Below is a sketch of the Master API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Registers a dataset and returns an id for the dataset. If the dataset is+// already registered, its dataset id is returned.+int GetOrRegisterDataset(GraphDef dataset);++// Creates and returns `num_consumers` iterator ids which partition the+// specified dataset. This also creates an internal `iteration_id` used to+// track the overall dataset iteration. `num_tasks` defines how many tasks to+// create. If `num_tasks` is -1, it is up to the master to determine how many+// tasks to create.+list<int> CreateIterators(int dataset_id, int num_consumers,+                          int num_tasks);++// Returns the list of tasks processing data for `iterator_id`. Consumers query+// this to find which worker addresses to read data from.+list<TaskInfo> GetWorkersForiterator(int iterator_id);++///---- Methods called by input workers ----++// Registers a worker and returns its worker id.+int RegisterWorker(WorkerInfo worker_info);++// Requests the next splits to process on the given worker for the given+// iteration_id.+List<Split> GetSplits(int worker_id, int iteration_id);+```++#### Worker API++The worker is responsible for processing datasets and providing dataset elements+to consumers.++Below is a sketch of the Worker API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Gets the next element for the specified iterator_id.+list<Tensors> GetElement(iterator_id);++/// ---- Methods called by master ----++// Requests that the worker process the specified dataset. This will trigger the+// worker to start requesting splits from the master using the `iteration_id`.+void ProcessDataset(int dataset_id, int iteration_id, list<int> iterator_ids);+```++#### Visitation Guarantees++When iterating over a dataset, the tf.data service will process all input data+at least once, even in the presence of master or worker failures. If there are+no failures, all input data will be processed exactly once.++With determinstic execution enabled, the tf.data service provides an+exactly-once visitation guarantee even in the face of master or worker failures.++#### Determinism++Deterministic processing is a cornerstone of tf.data. Determinism is valuable+for debugging and experimentation. This section discusses how the tf.data+service will provide determinism.++To get deterministic behavior, the tf.data service will require three things:++1.  The dataset being distributed has deterministic output.+1.  The user sets `deterministic=True` when calling+    `tf.data.experimental.service.create_iteration`.+1.  The user specifies how many input tasks to use when calling+    `tf.data.experimental.service.create_iteration`.+1.  The consumers do not fail.++In the absence of failures, determinism is achieved by distributing splits+round-robin among `N` input workers and having input workers earmark every `ith`+element for consumer `i`.++To provide determinism even when servers fail, consumers can keep track of which+element index they have processed up to for each task. Input workers would+attach per-task element indices when they produce elements, so consumers can+ignore duplicate elements caused by worker restarts. We will use an analogous+mechanism to avoid re-processing the same split in case of master falure. Input+workers will track the split index of splits as they receive them, and ignore+duplicate splits.++#### Failure Recovery++The tf.data service can recover from master and worker failures while preserving+determinism and its at-least-once visitation guarantee. The master achieves this+by writing its unrecoverable state to a persistent journal, and taking+checkpoints of its recoverable state to improve recovery time. When workers+reconnect to a restarted master, they update the master with their state so that+the master can recover its knowledge of its workers.++The unrecoverable state includes++*   **Registered datasets**+*   **ID generators** for iteration ids, iterator ids, dataset ids, and worker+    ids.+*   **In-progress iteration state**:+    *   **dataset id** for the iterated dataset so that we can recover the+        iteration's split generator+    *   **iteration id**+    *   **participating worker ids**, so that we can send splits to the correct+        workers.++Recoverable state includes++*   **Split generators**: Recoverable from our information about in-progress+    iterations.+*   **Worker addresses**: Recoverable when workers reconnect.+*   **Worker loads**: Recoverable when workers reconnect.+*   **Assignment from splits to workers**: Recoverable when workers reconnect.+*   **Outstanding splits**: Recoverable by re-running split generators from+    their checkpoint state.++To improve recovery time, the master will periodically write checkpoints of its+split generators and outstanding splits, so that split generators don't need to+be run from the beginning during master recovery.++A concern with the above recovery strategy is that a master could transmit a+split before crashing, then restart and transmit the same split again. To avoid+this duplication, the master attaches a split index to every split it sends to a+worker. When workers reconnect, they inform the master of their latest split+index.++Workers have no unrecoverable state. If a worker crashes, a new worker can take+its place. It is up to the master to reassign splits from the crashed worker to+the new worker.++To improve worker recovery time, workers will periodically write checkpoints of+their iterators to directories named using their worker ids. When the restarted+worker connects, the master will tell it which iterator checkpoints to recover+from.++We will read and write this state through a MasterState interface which can be+implemented using various storage backends. For use cases that require fault+tolerance, the user must configure a fault-tolerant MasterState, e.g. Spanner+internally, Cloud Spanner in GCP, or etcd externally. If fault tolerance isn't+required, the user could configure state to be held in memory only.++#### Leadership Transfer++The master writes state to journal files so that the state can be recovered on+restart. It is possible that a new master could be brought up while the old+master is still running. If we aren't careful, this could result in corruption+of the journal as both masters try to write to it.++Ideally we could rely on a distributed coordination service such as ZooKeeper.+However, this would add a significant burden to users who don't have access to a+ZooKeeper cluster, and it would also require adding a new dependency on a+ZooKeeper client.++What TensorFlow does have is a FileSystem API. We will leverage this API to+perform leadership transfer as follows:++1.  The first master will create a file named "master_seqno_0". If it+    successfully creates the file, it will consider itself the leader.+1.  The leader master will check every N milliseconds that the "master_seqno"+    file it created still exists. If the file no longer exists, the master will+    cease operation immediately.+1.  When a master thinks it should be leader, it attempts to atomically rename+    the master_seqno_n file to master_seqno_n+1. If this succeeds, the master+    will wait (N + M) milliseconds, verify that its renamed file still exists,+    and begin acting as leader. This gives the previous leader time to notice+    the rename.++The above scheme relies on rename being atomic so that two masters don't both+succeed at renaming the same file. Users may opt to use a filesystem that+doesn't support atomic rename, but they do so at the (unlikely) risk of two+concurrently running masters thinking they are leader. Common filesystems such+as Posix and HDFS support atomic rename.

And does GCS support it too?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,+    `group_by_window`, `ignore_errors`, `interleave`, `list_files`, `map`,+    `range`, `repeat`, `padded_batch`, `prefetch`, `shuffle`, `SSTableDataset`,+    `TextLineDataset`, `TFRecordDataset`, `unbatch`, `window`.++### Master and worker services++This section discusses the design for the master and worker services. These+services are used by the Python API to provide distributed dataset processing,+and these services use the splitting API as a part of their implementation.++#### Master API++The master is responsible for registering datasets, generating and tracking+iteration and worker ids, and generating dataset splits for processing on+workers.++Below is a sketch of the Master API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Registers a dataset and returns an id for the dataset. If the dataset is+// already registered, its dataset id is returned.+int GetOrRegisterDataset(GraphDef dataset);++// Creates and returns `num_consumers` iterator ids which partition the+// specified dataset. This also creates an internal `iteration_id` used to+// track the overall dataset iteration. `num_tasks` defines how many tasks to+// create. If `num_tasks` is -1, it is up to the master to determine how many+// tasks to create.+list<int> CreateIterators(int dataset_id, int num_consumers,+                          int num_tasks);++// Returns the list of tasks processing data for `iterator_id`. Consumers query+// this to find which worker addresses to read data from.+list<TaskInfo> GetWorkersForiterator(int iterator_id);++///---- Methods called by input workers ----++// Registers a worker and returns its worker id.+int RegisterWorker(WorkerInfo worker_info);++// Requests the next splits to process on the given worker for the given+// iteration_id.+List<Split> GetSplits(int worker_id, int iteration_id);+```++#### Worker API++The worker is responsible for processing datasets and providing dataset elements+to consumers.++Below is a sketch of the Worker API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Gets the next element for the specified iterator_id.+list<Tensors> GetElement(iterator_id);++/// ---- Methods called by master ----++// Requests that the worker process the specified dataset. This will trigger the+// worker to start requesting splits from the master using the `iteration_id`.+void ProcessDataset(int dataset_id, int iteration_id, list<int> iterator_ids);+```++#### Visitation Guarantees++When iterating over a dataset, the tf.data service will process all input data+at least once, even in the presence of master or worker failures. If there are+no failures, all input data will be processed exactly once.++With determinstic execution enabled, the tf.data service provides an+exactly-once visitation guarantee even in the face of master or worker failures.++#### Determinism++Deterministic processing is a cornerstone of tf.data. Determinism is valuable+for debugging and experimentation. This section discusses how the tf.data+service will provide determinism.++To get deterministic behavior, the tf.data service will require three things:++1.  The dataset being distributed has deterministic output.+1.  The user sets `deterministic=True` when calling+    `tf.data.experimental.service.create_iteration`.+1.  The user specifies how many input tasks to use when calling+    `tf.data.experimental.service.create_iteration`.+1.  The consumers do not fail.++In the absence of failures, determinism is achieved by distributing splits+round-robin among `N` input workers and having input workers earmark every `ith`+element for consumer `i`.++To provide determinism even when servers fail, consumers can keep track of which+element index they have processed up to for each task. Input workers would+attach per-task element indices when they produce elements, so consumers can+ignore duplicate elements caused by worker restarts. We will use an analogous+mechanism to avoid re-processing the same split in case of master falure. Input+workers will track the split index of splits as they receive them, and ignore+duplicate splits.++#### Failure Recovery++The tf.data service can recover from master and worker failures while preserving+determinism and its at-least-once visitation guarantee. The master achieves this+by writing its unrecoverable state to a persistent journal, and taking+checkpoints of its recoverable state to improve recovery time. When workers+reconnect to a restarted master, they update the master with their state so that+the master can recover its knowledge of its workers.++The unrecoverable state includes++*   **Registered datasets**+*   **ID generators** for iteration ids, iterator ids, dataset ids, and worker+    ids.+*   **In-progress iteration state**:+    *   **dataset id** for the iterated dataset so that we can recover the+        iteration's split generator+    *   **iteration id**+    *   **participating worker ids**, so that we can send splits to the correct+        workers.++Recoverable state includes++*   **Split generators**: Recoverable from our information about in-progress+    iterations.+*   **Worker addresses**: Recoverable when workers reconnect.+*   **Worker loads**: Recoverable when workers reconnect.+*   **Assignment from splits to workers**: Recoverable when workers reconnect.+*   **Outstanding splits**: Recoverable by re-running split generators from+    their checkpoint state.++To improve recovery time, the master will periodically write checkpoints of its+split generators and outstanding splits, so that split generators don't need to+be run from the beginning during master recovery.++A concern with the above recovery strategy is that a master could transmit a+split before crashing, then restart and transmit the same split again. To avoid+this duplication, the master attaches a split index to every split it sends to a+worker. When workers reconnect, they inform the master of their latest split+index.++Workers have no unrecoverable state. If a worker crashes, a new worker can take+its place. It is up to the master to reassign splits from the crashed worker to+the new worker.++To improve worker recovery time, workers will periodically write checkpoints of+their iterators to directories named using their worker ids. When the restarted+worker connects, the master will tell it which iterator checkpoints to recover+from.++We will read and write this state through a MasterState interface which can be+implemented using various storage backends. For use cases that require fault+tolerance, the user must configure a fault-tolerant MasterState, e.g. Spanner+internally, Cloud Spanner in GCP, or etcd externally. If fault tolerance isn't+required, the user could configure state to be held in memory only.++#### Leadership Transfer++The master writes state to journal files so that the state can be recovered on+restart. It is possible that a new master could be brought up while the old+master is still running. If we aren't careful, this could result in corruption+of the journal as both masters try to write to it.++Ideally we could rely on a distributed coordination service such as ZooKeeper.+However, this would add a significant burden to users who don't have access to a+ZooKeeper cluster, and it would also require adding a new dependency on a+ZooKeeper client.++What TensorFlow does have is a FileSystem API. We will leverage this API to+perform leadership transfer as follows:++1.  The first master will create a file named "master_seqno_0". If it+    successfully creates the file, it will consider itself the leader.+1.  The leader master will check every N milliseconds that the "master_seqno"+    file it created still exists. If the file no longer exists, the master will+    cease operation immediately.+1.  When a master thinks it should be leader, it attempts to atomically rename+    the master_seqno_n file to master_seqno_n+1. If this succeeds, the master+    will wait (N + M) milliseconds, verify that its renamed file still exists,+    and begin acting as leader. This gives the previous leader time to notice+    the rename.++The above scheme relies on rename being atomic so that two masters don't both+succeed at renaming the same file. Users may opt to use a filesystem that+doesn't support atomic rename, but they do so at the (unlikely) risk of two+concurrently running masters thinking they are leader. Common filesystems such+as Posix and HDFS support atomic rename.++#### Caveats++This section calls out caveats that users will need to be aware of when using+the tf.data service.++-   Due to the nature of dataset splitting, elements will not be processed in+    the same order as they were in the pre-distributed dataset. If a dataset+    relies on the order of the input files, the user's assumptions will be+    violated when splitting causes each input worker to process only a subset of+    the input files.+-   If a dataset doesn't support splitting, it must be moved after the part of

I assume "dataset doesn't support splitting" means "if a particular dataset operation doesn't support splitting" here

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,+    `group_by_window`, `ignore_errors`, `interleave`, `list_files`, `map`,+    `range`, `repeat`, `padded_batch`, `prefetch`, `shuffle`, `SSTableDataset`,+    `TextLineDataset`, `TFRecordDataset`, `unbatch`, `window`.++### Master and worker services++This section discusses the design for the master and worker services. These+services are used by the Python API to provide distributed dataset processing,+and these services use the splitting API as a part of their implementation.++#### Master API++The master is responsible for registering datasets, generating and tracking+iteration and worker ids, and generating dataset splits for processing on+workers.++Below is a sketch of the Master API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Registers a dataset and returns an id for the dataset. If the dataset is+// already registered, its dataset id is returned.+int GetOrRegisterDataset(GraphDef dataset);++// Creates and returns `num_consumers` iterator ids which partition the+// specified dataset. This also creates an internal `iteration_id` used to+// track the overall dataset iteration. `num_tasks` defines how many tasks to+// create. If `num_tasks` is -1, it is up to the master to determine how many+// tasks to create.+list<int> CreateIterators(int dataset_id, int num_consumers,+                          int num_tasks);++// Returns the list of tasks processing data for `iterator_id`. Consumers query+// this to find which worker addresses to read data from.+list<TaskInfo> GetWorkersForiterator(int iterator_id);++///---- Methods called by input workers ----++// Registers a worker and returns its worker id.+int RegisterWorker(WorkerInfo worker_info);++// Requests the next splits to process on the given worker for the given+// iteration_id.+List<Split> GetSplits(int worker_id, int iteration_id);+```++#### Worker API++The worker is responsible for processing datasets and providing dataset elements+to consumers.++Below is a sketch of the Worker API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Gets the next element for the specified iterator_id.+list<Tensors> GetElement(iterator_id);++/// ---- Methods called by master ----++// Requests that the worker process the specified dataset. This will trigger the+// worker to start requesting splits from the master using the `iteration_id`.+void ProcessDataset(int dataset_id, int iteration_id, list<int> iterator_ids);+```++#### Visitation Guarantees++When iterating over a dataset, the tf.data service will process all input data+at least once, even in the presence of master or worker failures. If there are

Can you expand a bit on the tradeoffs of not enforcing "exactly once" when failures are present?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,+    `group_by_window`, `ignore_errors`, `interleave`, `list_files`, `map`,+    `range`, `repeat`, `padded_batch`, `prefetch`, `shuffle`, `SSTableDataset`,+    `TextLineDataset`, `TFRecordDataset`, `unbatch`, `window`.++### Master and worker services++This section discusses the design for the master and worker services. These+services are used by the Python API to provide distributed dataset processing,+and these services use the splitting API as a part of their implementation.++#### Master API++The master is responsible for registering datasets, generating and tracking+iteration and worker ids, and generating dataset splits for processing on+workers.++Below is a sketch of the Master API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Registers a dataset and returns an id for the dataset. If the dataset is+// already registered, its dataset id is returned.+int GetOrRegisterDataset(GraphDef dataset);++// Creates and returns `num_consumers` iterator ids which partition the+// specified dataset. This also creates an internal `iteration_id` used to+// track the overall dataset iteration. `num_tasks` defines how many tasks to+// create. If `num_tasks` is -1, it is up to the master to determine how many+// tasks to create.+list<int> CreateIterators(int dataset_id, int num_consumers,+                          int num_tasks);++// Returns the list of tasks processing data for `iterator_id`. Consumers query+// this to find which worker addresses to read data from.+list<TaskInfo> GetWorkersForiterator(int iterator_id);++///---- Methods called by input workers ----++// Registers a worker and returns its worker id.+int RegisterWorker(WorkerInfo worker_info);++// Requests the next splits to process on the given worker for the given+// iteration_id.+List<Split> GetSplits(int worker_id, int iteration_id);+```++#### Worker API++The worker is responsible for processing datasets and providing dataset elements+to consumers.++Below is a sketch of the Worker API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Gets the next element for the specified iterator_id.+list<Tensors> GetElement(iterator_id);++/// ---- Methods called by master ----++// Requests that the worker process the specified dataset. This will trigger the+// worker to start requesting splits from the master using the `iteration_id`.+void ProcessDataset(int dataset_id, int iteration_id, list<int> iterator_ids);+```++#### Visitation Guarantees++When iterating over a dataset, the tf.data service will process all input data+at least once, even in the presence of master or worker failures. If there are+no failures, all input data will be processed exactly once.++With determinstic execution enabled, the tf.data service provides an+exactly-once visitation guarantee even in the face of master or worker failures.++#### Determinism++Deterministic processing is a cornerstone of tf.data. Determinism is valuable+for debugging and experimentation. This section discusses how the tf.data+service will provide determinism.++To get deterministic behavior, the tf.data service will require three things:++1.  The dataset being distributed has deterministic output.+1.  The user sets `deterministic=True` when calling+    `tf.data.experimental.service.create_iteration`.+1.  The user specifies how many input tasks to use when calling+    `tf.data.experimental.service.create_iteration`.+1.  The consumers do not fail.++In the absence of failures, determinism is achieved by distributing splits+round-robin among `N` input workers and having input workers earmark every `ith`+element for consumer `i`.++To provide determinism even when servers fail, consumers can keep track of which+element index they have processed up to for each task. Input workers would+attach per-task element indices when they produce elements, so consumers can+ignore duplicate elements caused by worker restarts. We will use an analogous+mechanism to avoid re-processing the same split in case of master falure. Input+workers will track the split index of splits as they receive them, and ignore+duplicate splits.++#### Failure Recovery++The tf.data service can recover from master and worker failures while preserving+determinism and its at-least-once visitation guarantee. The master achieves this+by writing its unrecoverable state to a persistent journal, and taking+checkpoints of its recoverable state to improve recovery time. When workers+reconnect to a restarted master, they update the master with their state so that+the master can recover its knowledge of its workers.++The unrecoverable state includes++*   **Registered datasets**+*   **ID generators** for iteration ids, iterator ids, dataset ids, and worker+    ids.+*   **In-progress iteration state**:+    *   **dataset id** for the iterated dataset so that we can recover the+        iteration's split generator+    *   **iteration id**+    *   **participating worker ids**, so that we can send splits to the correct+        workers.++Recoverable state includes++*   **Split generators**: Recoverable from our information about in-progress+    iterations.+*   **Worker addresses**: Recoverable when workers reconnect.+*   **Worker loads**: Recoverable when workers reconnect.+*   **Assignment from splits to workers**: Recoverable when workers reconnect.+*   **Outstanding splits**: Recoverable by re-running split generators from+    their checkpoint state.++To improve recovery time, the master will periodically write checkpoints of its+split generators and outstanding splits, so that split generators don't need to+be run from the beginning during master recovery.++A concern with the above recovery strategy is that a master could transmit a+split before crashing, then restart and transmit the same split again. To avoid+this duplication, the master attaches a split index to every split it sends to a+worker. When workers reconnect, they inform the master of their latest split+index.++Workers have no unrecoverable state. If a worker crashes, a new worker can take+its place. It is up to the master to reassign splits from the crashed worker to+the new worker.++To improve worker recovery time, workers will periodically write checkpoints of+their iterators to directories named using their worker ids. When the restarted+worker connects, the master will tell it which iterator checkpoints to recover+from.++We will read and write this state through a MasterState interface which can be+implemented using various storage backends. For use cases that require fault+tolerance, the user must configure a fault-tolerant MasterState, e.g. Spanner+internally, Cloud Spanner in GCP, or etcd externally. If fault tolerance isn't+required, the user could configure state to be held in memory only.++#### Leadership Transfer++The master writes state to journal files so that the state can be recovered on+restart. It is possible that a new master could be brought up while the old+master is still running. If we aren't careful, this could result in corruption+of the journal as both masters try to write to it.++Ideally we could rely on a distributed coordination service such as ZooKeeper.+However, this would add a significant burden to users who don't have access to a+ZooKeeper cluster, and it would also require adding a new dependency on a+ZooKeeper client.++What TensorFlow does have is a FileSystem API. We will leverage this API to+perform leadership transfer as follows:++1.  The first master will create a file named "master_seqno_0". If it+    successfully creates the file, it will consider itself the leader.+1.  The leader master will check every N milliseconds that the "master_seqno"+    file it created still exists. If the file no longer exists, the master will+    cease operation immediately.+1.  When a master thinks it should be leader, it attempts to atomically rename+    the master_seqno_n file to master_seqno_n+1. If this succeeds, the master+    will wait (N + M) milliseconds, verify that its renamed file still exists,+    and begin acting as leader. This gives the previous leader time to notice+    the rename.++The above scheme relies on rename being atomic so that two masters don't both+succeed at renaming the same file. Users may opt to use a filesystem that+doesn't support atomic rename, but they do so at the (unlikely) risk of two+concurrently running masters thinking they are leader. Common filesystems such+as Posix and HDFS support atomic rename.++#### Caveats++This section calls out caveats that users will need to be aware of when using+the tf.data service.++-   Due to the nature of dataset splitting, elements will not be processed in

I think it's better to document an actual guarantee, such that the order within each split must be consistent with the original global order of the source dataset, but no promises are made around ordering across splits.

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,+    `group_by_window`, `ignore_errors`, `interleave`, `list_files`, `map`,+    `range`, `repeat`, `padded_batch`, `prefetch`, `shuffle`, `SSTableDataset`,+    `TextLineDataset`, `TFRecordDataset`, `unbatch`, `window`.++### Master and worker services++This section discusses the design for the master and worker services. These+services are used by the Python API to provide distributed dataset processing,+and these services use the splitting API as a part of their implementation.++#### Master API++The master is responsible for registering datasets, generating and tracking+iteration and worker ids, and generating dataset splits for processing on+workers.++Below is a sketch of the Master API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Registers a dataset and returns an id for the dataset. If the dataset is+// already registered, its dataset id is returned.+int GetOrRegisterDataset(GraphDef dataset);++// Creates and returns `num_consumers` iterator ids which partition the+// specified dataset. This also creates an internal `iteration_id` used to+// track the overall dataset iteration. `num_tasks` defines how many tasks to+// create. If `num_tasks` is -1, it is up to the master to determine how many+// tasks to create.+list<int> CreateIterators(int dataset_id, int num_consumers,+                          int num_tasks);++// Returns the list of tasks processing data for `iterator_id`. Consumers query+// this to find which worker addresses to read data from.+list<TaskInfo> GetWorkersForiterator(int iterator_id);++///---- Methods called by input workers ----++// Registers a worker and returns its worker id.+int RegisterWorker(WorkerInfo worker_info);++// Requests the next splits to process on the given worker for the given+// iteration_id.+List<Split> GetSplits(int worker_id, int iteration_id);+```++#### Worker API++The worker is responsible for processing datasets and providing dataset elements+to consumers.++Below is a sketch of the Worker API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Gets the next element for the specified iterator_id.+list<Tensors> GetElement(iterator_id);++/// ---- Methods called by master ----++// Requests that the worker process the specified dataset. This will trigger the+// worker to start requesting splits from the master using the `iteration_id`.+void ProcessDataset(int dataset_id, int iteration_id, list<int> iterator_ids);+```++#### Visitation Guarantees++When iterating over a dataset, the tf.data service will process all input data+at least once, even in the presence of master or worker failures. If there are+no failures, all input data will be processed exactly once.++With determinstic execution enabled, the tf.data service provides an+exactly-once visitation guarantee even in the face of master or worker failures.++#### Determinism++Deterministic processing is a cornerstone of tf.data. Determinism is valuable+for debugging and experimentation. This section discusses how the tf.data+service will provide determinism.++To get deterministic behavior, the tf.data service will require three things:++1.  The dataset being distributed has deterministic output.+1.  The user sets `deterministic=True` when calling

Why requiring determinism to be opt-in instead of making non-determinism opt-in like the rest of the tf.data API?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,

from_tensor_slices seems like a weird fit for the service model, as that would require bouncing the tensors to remote workers and back AFAICT

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce+correct results. Datasets which require multiple input datasets such as `zip`+are also difficult to support, since we don't have a good way of aligning the+splits of multiple input datasets. Users who rely on these unsupported datasets+will need to move those datasets to come after the distributed part of their+pipeline.++Initially, we will support splitting for the following dataset sources and+transformations:++*   `batch`, `CsvDataset`, `dense_to_sparse_batch`, `filter`,+    `FixedLengthRecordDataset`, `flat_map`, `from_tensor_slices`,+    `group_by_window`, `ignore_errors`, `interleave`, `list_files`, `map`,+    `range`, `repeat`, `padded_batch`, `prefetch`, `shuffle`, `SSTableDataset`,+    `TextLineDataset`, `TFRecordDataset`, `unbatch`, `window`.++### Master and worker services++This section discusses the design for the master and worker services. These+services are used by the Python API to provide distributed dataset processing,+and these services use the splitting API as a part of their implementation.++#### Master API++The master is responsible for registering datasets, generating and tracking+iteration and worker ids, and generating dataset splits for processing on+workers.++Below is a sketch of the Master API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Registers a dataset and returns an id for the dataset. If the dataset is+// already registered, its dataset id is returned.+int GetOrRegisterDataset(GraphDef dataset);++// Creates and returns `num_consumers` iterator ids which partition the+// specified dataset. This also creates an internal `iteration_id` used to+// track the overall dataset iteration. `num_tasks` defines how many tasks to+// create. If `num_tasks` is -1, it is up to the master to determine how many+// tasks to create.+list<int> CreateIterators(int dataset_id, int num_consumers,+                          int num_tasks);++// Returns the list of tasks processing data for `iterator_id`. Consumers query+// this to find which worker addresses to read data from.+list<TaskInfo> GetWorkersForiterator(int iterator_id);++///---- Methods called by input workers ----++// Registers a worker and returns its worker id.+int RegisterWorker(WorkerInfo worker_info);++// Requests the next splits to process on the given worker for the given+// iteration_id.+List<Split> GetSplits(int worker_id, int iteration_id);+```++#### Worker API++The worker is responsible for processing datasets and providing dataset elements+to consumers.++Below is a sketch of the Worker API. This API is not public and is subject to+change.++```cpp+/// ---- Methods called by consumers ----++// Gets the next element for the specified iterator_id.+list<Tensors> GetElement(iterator_id);++/// ---- Methods called by master ----++// Requests that the worker process the specified dataset. This will trigger the+// worker to start requesting splits from the master using the `iteration_id`.+void ProcessDataset(int dataset_id, int iteration_id, list<int> iterator_ids);+```++#### Visitation Guarantees++When iterating over a dataset, the tf.data service will process all input data+at least once, even in the presence of master or worker failures. If there are+no failures, all input data will be processed exactly once.++With determinstic execution enabled, the tf.data service provides an+exactly-once visitation guarantee even in the face of master or worker failures.++#### Determinism++Deterministic processing is a cornerstone of tf.data. Determinism is valuable+for debugging and experimentation. This section discusses how the tf.data+service will provide determinism.++To get deterministic behavior, the tf.data service will require three things:++1.  The dataset being distributed has deterministic output.+1.  The user sets `deterministic=True` when calling

I ask because I can think of some not-too-expensive ways of enforcing exactly-once, like if consumers request a <split_id, batch_within_split> instead of just "next batch please" in the RPC, and things like that.

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {+ public:+  virtual std::string DebugString() const = 0;+  // Methods to support being used as a Variant tensor.+  virtual std::string TypeName() const = 0;+  virtual void Encode(VariantTensorData* data) const = 0;+  virtual bool Decode(const VariantTensorData& data) = 0;+};+```++To iterate over splits for a dataset, we will use a new+`DatasetBase::MakeSplitGenerator()` method. This method creates a+`SplitGenerator`, which is responsible for generating all of the splits for the+dataset. We use an intermediate `SplitGenerator` object instead of generating+splits directly because there could be a large number of splits, and the+`SplitGenerator` gives us as way to tune split size in response to pipeline+performance.++```cpp+class SplitGenerator {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+  // Instructs the SplitGenerator to adjust the size of future splits by the+  // specified percent. 100% means no change, 50% means half-sized splits, and+  // 200% means double-sized splits. The SplitGenerator will make a best effort+  // to incorporate the feedback when creating splits.+  virtual void AdjustSplitSize(int percent) = 0;+};+```++It is tempting to process each split independently, but this would cause issues+when splits are small. tf.data pipelines need to populate internal buffers for+shuffling, prefetching, and batching. If we use a separate pipeline to process+each split, our shuffling will be lower quality, we will have performance jitter+as we keep needing to refill prefetch buffers from scratching, and we will+produce many more partial batches (each split might not even have enough data to+fill a full batch). To avoid these issues, we use a small number of tasks, where+each task processes many splits as a single pipeline.++To enable processing of multiple splits in a dataset, we will add an optional+`SplitProvider` field to the `IteratorContext` passed to+`IteratorBase::Initialize`. The `SplitProvider` produces splits which tell the+iterator what source data to iterate over. For example, if splits are+represented by filenames, and a SplitProvider produces `["file1", "file6",+"file11"]`, an iterator initialized by that `SplitProvider` should process those+three files only.++```cpp+class SplitProvider {+ public:+  virtual Status GetNext(std::unique_ptr<Split>* split,+                         bool* end_of_splits) = 0;+};+```++When processing datasets, tf.data service workers will use `SplitProvider`s+which provide splits by querying the tf.data service master for which splits to+process. A few splits will be prefetched to hide the latency of needing to+request a new split from the master.++#### Supported Datasets++Not all dataset sources and transformations are easily splittable. For example,+`take`, `skip`, and `scan` require a global view of the dataset to produce

How do take and skip require global views?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```+

An earlier discussion about this involved a public primitive to create a dataset from an iterator, which would also be useful internally when implementing create_iteration. Is that still in scope?

I ask because while mostly unrelated to the serve issue this would be very useful when dealing with the types of pipelines which are non-goals of the service project (stateful pipelines, pipelines with py_func, etc)

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))+  if consumer_index == 0:+    # The iteration object is a byte array which needs to be shared among all+    # consumers. Here we suppose there are broadcast_send and broadcast_recv+    # method available.+    iteration_id = tf.data.experimental.service.create_iteration(ds, address, 3)+    broadcast_send(iteration_id)+  else:+    iteration_id = broadcast_recv()+  it = tf.data.experimental.service.make_iterator(+      ds, iteration_id, consumer_index)+  for element in it:+    # process element++  Args:+    dataset: The dataset to begin iteration over.+    num_consumers: The number of consumers to divide the dataset between. Set+      this if you require determinism. If None, a single iterator id is returned,+      and any number of consumers can read from that iterator id. The data+      produced by the dataset will be fed to consumers on a first-come+      first-served basis.+    num_tasks: The number of tasks to use for processing. Tasks run for+      the duration of an epoch, and each worker should typically process a single+      task. Normally it is best to leave this as None so that the master can+      choose a reasonable number of tasks. Setting `num_tasks` is useful for+      producing deterministic results.+    deterministic: Whether the iteration should be performed+      deterministically. Fully deterministic output also requires setting+      `num_tasks` to a fixed number, and that the input dataset is itself+      deterministic.++  Returns:+    An iteration_id which can be used to created iterators via+      `tf.data.experimental.service.make_iterator`+  """++def tf.data.experimental.service.make_iterator(+    dataset, iteration, consumer_index):+  """Creates an iterator for reading from the specified dataset.++  Args:+    dataset: The dataset to read from.+    iteration: An iteration_id object generated by+      `tf.data.experimental.service.create_iteration`.+    consumer_index: The consumer index within the iteration to read from. If+      the iteration was created with `n` consumers, `consumers_index` must be+      less than `n`.++  Returns:+    A Python iterator which iterates over the dataset elements.+  """+```++### Dataset splitting API++To parallelize dataset processing, the tf.data service needs a way to split up+datasets. We will achieve this by adding a splitting API that allows source+datasets to express how they can be split.++Our goals for the API are++*   Performance: The splitting API can be used to performantly split and process+    datasets.+*   Extensibility: User-defined datasets can be split as long as they implement+    the splitting API.+*   Minimize Surprises: Users write their datasets as though they will not be+    split, so introducing splitting can easily lead to unexpected outcomes. To+    mitigate this, we will be conservative about which dataset transformations+    support splitting.++The API will be used internally by the tf.data service to distribute datasets.+It will be entirely in C++, and we don't currently have any plans to expose+splitting through Python.++The API focuses on producing and consuming `Split`s. A `Split` is a variant+Tensor that can be subclassed to represent arbitrary types of splitting.++```cpp+class Split {

This is very vague. What is stored in the variant?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service

How does create_iteration find the service?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture++Users could run tf.data workers embedded in their TensorFlow cluster, and also+run additional tf.data workers (and potentially the tf.data master) outside the+cluster. This allows for horizontal worker scaling, while still leveraging the+compute resources of the TensorFlow cluster for input processing.++### User-facing Python API++This API is how users will interact with the tf.data service from their Python+code.++```python+def tf.data.experimental.service.distribute(address):+  """Marks that a dataset should be processed by the tf.data service.++  ds = ... # dataset to distribute+  ds = ds.apply(tf.data.experimental.service.distribute(address))++  Args:+    address: The address of the tf.data service master.++  Returns:+    A function that can be passed to `dataset.apply()`.+  """++def tf.data.experimental.service.create_iteration(+    dataset, num_consumers=1, num_tasks=None, deterministic=False):+  """Begins distributed iteration over a dataset.++  It is expected that the dataset contains at least one `.distribute(address)`+  transformation, otherwise this method will print a warning and do nothing.++  `create_iteration` will first register the dataset with the tf.data service+  if it isn't already registered. It will then request the creation of+  `num_consumers` dataset iterators which divide the dataset `num_consumers`+  ways. The returned object can be used to read from one of the+  iterators using+  `tf.data.experimental.service.make_iterator(ds, obj, consumer_index)`.

Why not the usual python control flow we use in regular tf.data?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the

Weird sentence structure; was this an editing mistake?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing++Today, the tf.distribute API statically shards data across accelerators. This+can lead to suboptimal utilization because some shards may contain more data+than others. The tf.data service provides a mechanism for dynamically sharding,+reducing the data imbalance across accelerators.++### Visitation guarantees++Model accuracy can often be improved when each training sample is trained on+exactly once per epoch. The tf.data service can coordinate across workers to+provide this guarantee.++## Design Proposal++The tf.data service is a master-worker system which iterates through datasets,+producing outputs to be consumed by accelerators. The service is comprised of a+few components:++*   User-facing Python API for interacting with the tf.data service.+*   Dataset splitting API for determining how to split up datasets for parallel+    processing.+*   Master and worker gRPC services.++### Architecture++The tf.data service is comprised of master and worker gRPC services which could+be run in a couple of different configurations:++#### Glossary++**Master**: The single master coordinating the tf.data service.++**Worker**: A tf.data service worker which performs dataset processing and+provides dataset elements to consumers over RPC.++**Consumer**: A machine which consumes data from the tf.data service. The+consumer may be attached to a GPU or TPU, or use data for on-CPU training.++#### Separate Cluster Architecture++Each server is run on a separate host from the TensorFlow cluster. This+configuration gives users a way to provide horizontally scaling CPU for+processing their input pipelines and quickly feeding data to accelerators.++#### Embedded Cluster Architecture++without needing to provision additional compute resources. and gives all the+benefits of the tf.data service except for horizontal scaling.++#### Hybrid Architecture

I don't understand the conjunction of this and the "separate cluster architecture" above

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use+instead. This will also allow us to support feature requests that require+cross-worker coordination, such as dynamic sharding.++## User Benefit++### Input-bound models++Users with input-bound models can leverage the tf.data service to distribute+input processing across horizontally-scaling compute resources. This can improve+utilization for valuable accelerator resources, reducing total cost.++### Dynamic load balancing

How do we preserve determinism when we're doing dynamic load balancing?

aaudiber

comment created time in a month

Pull request review commenttensorflow/community

RFC: tf.data Service

+# Distributed tf.data service++| Status        | Proposed          |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [195](https://github.com/tensorflow/community/pull/195) |+| **Author(s)** | Andrew Audibert (aaudibert@google.com) Rohan Jain (rohanj@google.com) |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2019-01-09                                              |++## Objective++Provide an API and implementation of a tf.data service which can process tf.data+datasets in a distributed manner. The service can be run outside the TensorFlow+cluster or be exported as a gRPC service by TensorFlow servers.++Goals:++-   Enable horizontal scaling of dataset computation to improve performance of+    input-bound dataset pipelines.+-   Improve tf.data integration with the tf.distribute API. In particular,+    support dynamic sharding of data across multiple processes.+-   Provide visitation guarantees for distributed training jobs.++Non-goals:++-   Process non-dataset data.+-   Distribute datasets that rely on external / non-serializable state.+-   Support non-graph computation (e.g. py_function).++## Motivation++### Host machine input pipelines can't always keep up with accelerators.++Some input pipelines require significant resources to produce their data, e.g.+due to image transformations. When the host machine isn't powerful enough to+generate input data at the rate the attached accelerator(s) consume the data,+the accelerator(s) will idle. This slows down training time, and also wastes+valuable accelerator resources. The tf.data service solves this problem by using+N input workers to feed M accelerators. The number of input workers can be+scaled up or down as needed to keep up with the accelerators.++### Distributed training requires a distribution-aware input pipeline.++Today tf.data supports the tf.distribute API by providing mechanisms for+sharding, cloning, and re-batching. The tf.distribute API uses these primitives+to implement their own version of a distributed dataset. If distributed datasets+become a core feature of tf.data, tf.data can provide a public API for+tf.distribute (and users who wish to implement their own distribution) to use

This is really important, and I'd like to review the distributed data APIs independently from the service model.

aaudiber

comment created time in a month

issue commenttensorflow/tensorflow

Incorrect name_scope with tf.function decoration

This is working as intended. The name_scope in the function is rooted at the function graph, and not at the call site (otherwise we'd have to retrace the function every time it's called to make sure it's using the proper name_scope). At runtime you can tell the correct scoping by the nesting of the name of the call op and the ops inside the function graph.

emailweixu

comment created time in 2 months

issue closedtensorflow/tensorflow

Incorrect name_scope with tf.function decoration

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): v2.0.0-rc2-26-g64c3d38 2.0.0
  • Python version: 3.6
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:

Describe the current behavior If a function is decorated with tf.function, the name_scope is lost

Describe the expected behavior The name_scope should be same with or without tf.function decoration

Code to reproduce the issue

import tensorflow as tf

@tf.function
def f():
    with tf.name_scope("f") as scope:
        tf.print(scope)

def g():
    with tf.name_scope("g") as scope:
        tf.print(scope)

def main():
    with tf.name_scope("main"):
        f()   # expect to print "main/f/", actually get "f/"
        g()

if __name__ == "__main__":
    main()

The output is

f/
main/g/

closed time in 2 months

emailweixu

Pull request review commenttensorflow/community

RFC: tf.data Snapshot

+# tf.data Snapshot++| Status        | Proposed                                                |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [193](https://github.com/tensorflow/community/pull/193) |+| **Author(s)** | Frank Chen (frankchn@google.com), Rohan Jain            |+|               | (rohanj@google.com)                                     |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2020-01-07                                              |++## Objective++With ever faster accelerators available in Cloud and hyperparameter tuning+consuming larger chunks of accelerator time, TensorFlow users are increasingly+finding that they don’t have enough CPU resources to keep up with these+accelerators, leaving valuable accelerator resources idle.++To alleviate this problem, we are proposing a `snapshot` API within `tf.data`,+to allow users to transparently persist the output of their preprocessing+pipeline to disk, and materialize the pre-processed data on a different training+run.++This API enables repeated preprocessing steps to be consolidated, and allowing+re-use of already processed data, trading off disk storage and network bandwidth+for freeing up more valuable CPU resources and accelerator compute time.++## Motivation++Large TensorFlow users have indicated that they have complicated input+processing pipelines which saturate their CPUs before saturating their+accelerators (TPUs in particular). Since they often experiment with+hyperparameter tuning or tweaks to existing model without affecting their input+pipeline, they are asking for ways to avoid similar repeated preprocessing of+data by either saving a dataset or caching it to disk.++## User Benefit++Users will be able to transparently persist partially or fully processed data+from `tf.data` input pipelines to disk or Cloud storage systems, and materialize+the pre-processed data during subsequent runs from the same pipeline. This will+cut down on the input pipeline processing overheads during second and subsequent+runs.++## Design Proposal++We propose that we add a new `snapshot` transformation to tf.data. To illustrate+the usage of the transformation, we can start with some sample code:++```python+dataset = Dataset.list_files("/raw/data/*").shard(num_workers, i)+dataset = dataset.parallel_interleave(TFRecordDataset)+dataset = dataset.map(my_preprocessing_fn)+dataset = dataset.apply(tf.data.snapshot("/saved/data", options...))+dataset = dataset.repeat()++model = ...+model.fit(dataset)+```++As we can see, the end user simply has to add this transformation in order to+use this functionality. In essence, the transformation is similar to the+existing `tf.data.Dataset.cache`, with the key difference is being that, unlike+`cache`, `snapshot` is intended to re-used across different executions of the+same input pipelines.++### Proposed API++We are proposing the following API for the snapshot transformation.++```python+def snapshot(path,+             compression=None,+             reader_path_prefix=None,+             writer_path_prefix=None,+             shard_size_bytes=None,+             pending_snapshot_expiry_seconds=None,+             num_reader_threads=None,+             reader_buffer_size=None,+             num_writer_threads=None,+             writer_buffer_size=None,+             shuffle_on_read=None,+             shuffle_seed=None,+             mode=None,+             snapshot_name=None):+  pass  # Implementation goes here.+```++1.  `path`: Required. A directory where we want to save our snapshots and/or+    read from a previously saved snapshot.++2.  `compression`: Optional. The type of compression to apply to the snapshot+    written to disk. This will support `GZIP`, `SNAPPY` or None. Defaults to+    None.++3.  `reader_path_prefix`: Optional. A prefix to add to the path when reading+    from snapshots. This is useful for filesystems where configuration is passed+    in through the path. Defaults to None.++4.  `writer_path_prefix`: Optional. A prefix to add to the path when writing to+    snapshots. This is useful for filesystems where configuration is passed in+    through the path. Defaults to None.++5.  `shard_size_bytes`: Optional. The maximum size of each data file to be+    written by the snapshot dataset op. Defaults to 10 GiB.++6.  `pending_snapshot_expiry_seconds`: Optional. How long to wait (in seconds)+    before the snapshot op considers a previously unfinished snapshot to be+    stale and starts writing a snapshot from scratch again. Defaults to 86400+    seconds (1 day).++7.  `num_reader_threads`: Optional. Number of threads to parallelize reading+    from snapshot. Especially useful if compression is turned on since the+    decompression operation tends to be intensive. Defaults to 1. If > 1, then+    this might introduce non-determinism i.e. the order in which the elements+    are read from the snapshot are different from the order they're written.++8.  `reader_buffer_size`: Optional. Maximum number of elements we can prefetch+    reading from the snapshot. Defaults to 1. Increasing this might improve+    performance but will increase memory consumption.++9.  `num_writer_threads`: Optional. Number of threads to parallelize writing+    from snapshot. We'll open up `num_writer_threads` files and write to them in+    parallel. Especially useful if compression is turned on since the+    compression operation tends to be intensive. Defaults to 1. If > 1, then+    this might introduce non-determinism i.e. the order in which the elements+    are read from the upstream iterator are different from the order they're+    written.++10. `writer_buffer_size`: Optional. Maximum number of pipeline elements to fill+    up the buffer before writing them out using `num_writer_threads`.++11. `shuffle_on_read`: Optional. If this is True, then the order in which+    examples are produced when reading from a snapshot will be random. Defaults+    to False.++12. `shuffle_seed`: Optional. If shuffle_seed is set, the random number+    generator used for shuffling (when `shuffle_on_read` is turned on) is seeded+    by the given seed. Otherwise, it is seeded by a random seed that differs for+    every run.++13. `mode`: Optional. The mode at which snapshot should operate. Valid options+    are `auto`, `read`, `write`, and `passthrough`. The default mode is `auto`,+    where the snapshot op will automatically determine what mode to operate in.++    1.  `write` mode forces the snapshot transformation to write a new+        materialization to disk, regardless of whether a complete and valid+        materialization currently exists. In other words, we enter the **WRITE**+        state immediately.++    2.  `read` mode forces the snapshot transformation to read from the latest+        version of the materialization on disk, regardless of whether the data+        stored on disk is complete and valid. In other words, we enter the+        **READ** state immediately.++    3.  `passthrough` mode turns the snapshot transformation into a no-op. In+        other words, we enter the **PASSTHROUGH** state immediately.++    4.  `auto` retains the default behavior of snapshot. See the "Standard+        Kernel Workflow" section for the default behavior.++14. `snapshot_name`: Optional. If set, use the supplied string as a named+    snapshot name instead of introspecting the data pipeline and automatically+    generating a unique identifier for the specific data pipeline.++    1.  Instead of generating a new fingerprint of the input processing graph or+        and `run_id` (see the _Detailed Design_ section for details), we will+        use the `snapshot_name` to uniquely identify the snapshot.++### External API Guarantees++Externally, we guarantee that snapshots written by a particular version of+TensorFlow will be readable by that specific version of TensorFlow. Eventually,+we can also guarantee that snapshots written will be readable by all future+versions of TensorFlow.++We are not currently handling the case where workers do not go through the+entire training set at least once.++### Alternatives Considered++An alternative proposal for an API would be `save()` and `load()`, where the+saving and loading of the input pipeline would be made more explicit, avoiding+some of the logic needed in determining whether to snapshot or read from a+snapshot of a model.++The downside here would be that the user would have to split the preprocessing+and training into potentially different files, and users would be forced to+select whether to train or preprocess on their own, which is not good.+

We should consider that the semantics of save/load are much easier to explain to people specially when you consider the interaction of shuffle_on_read with other options earlier in the input pipeline.

frankchn

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: tf.data Snapshot

+# tf.data Snapshot++| Status        | Proposed                                                |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [193](https://github.com/tensorflow/community/pull/193) |+| **Author(s)** | Frank Chen (frankchn@google.com), Rohan Jain            |+|               | (rohanj@google.com)                                     |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2020-01-07                                              |++## Objective++With ever faster accelerators available in Cloud and hyperparameter tuning+consuming larger chunks of accelerator time, TensorFlow users are increasingly+finding that they don’t have enough CPU resources to keep up with these+accelerators, leaving valuable accelerator resources idle.++To alleviate this problem, we are proposing a `snapshot` API within `tf.data`,+to allow users to transparently persist the output of their preprocessing+pipeline to disk, and materialize the pre-processed data on a different training+run.++This API enables repeated preprocessing steps to be consolidated, and allowing+re-use of already processed data, trading off disk storage and network bandwidth+for freeing up more valuable CPU resources and accelerator compute time.++## Motivation++Large TensorFlow users have indicated that they have complicated input+processing pipelines which saturate their CPUs before saturating their+accelerators (TPUs in particular). Since they often experiment with+hyperparameter tuning or tweaks to existing model without affecting their input+pipeline, they are asking for ways to avoid similar repeated preprocessing of+data by either saving a dataset or caching it to disk.++## User Benefit++Users will be able to transparently persist partially or fully processed data+from `tf.data` input pipelines to disk or Cloud storage systems, and materialize+the pre-processed data during subsequent runs from the same pipeline. This will+cut down on the input pipeline processing overheads during second and subsequent+runs.++## Design Proposal++We propose that we add a new `snapshot` transformation to tf.data. To illustrate+the usage of the transformation, we can start with some sample code:++```python+dataset = Dataset.list_files("/raw/data/*").shard(num_workers, i)+dataset = dataset.parallel_interleave(TFRecordDataset)+dataset = dataset.map(my_preprocessing_fn)+dataset = dataset.apply(tf.data.snapshot("/saved/data", options...))+dataset = dataset.repeat()++model = ...+model.fit(dataset)+```++As we can see, the end user simply has to add this transformation in order to+use this functionality. In essence, the transformation is similar to the+existing `tf.data.Dataset.cache`, with the key difference is being that, unlike+`cache`, `snapshot` is intended to re-used across different executions of the+same input pipelines.++### Proposed API++We are proposing the following API for the snapshot transformation.++```python+def snapshot(path,+             compression=None,+             reader_path_prefix=None,+             writer_path_prefix=None,+             shard_size_bytes=None,+             pending_snapshot_expiry_seconds=None,+             num_reader_threads=None,+             reader_buffer_size=None,+             num_writer_threads=None,+             writer_buffer_size=None,+             shuffle_on_read=None,+             shuffle_seed=None,+             mode=None,+             snapshot_name=None):+  pass  # Implementation goes here.+```++1.  `path`: Required. A directory where we want to save our snapshots and/or+    read from a previously saved snapshot.++2.  `compression`: Optional. The type of compression to apply to the snapshot+    written to disk. This will support `GZIP`, `SNAPPY` or None. Defaults to+    AUTO.++3.  `reader_path_prefix`: Optional. A prefix to add to the path when reading+    from snapshots. This is useful for filesystems where configuration is passed+    in through the path. Defaults to None.++4.  `writer_path_prefix`: Optional. A prefix to add to the path when writing to+    snapshots. This is useful for filesystems where configuration is passed in+    through the path. Defaults to None.++5.  `shard_size_bytes`: Optional. The maximum size of each data file to be+    written by the snapshot dataset op. Defaults to AUTO.++6.  `pending_snapshot_expiry_seconds`: Optional. How long to wait (in seconds)+    before the snapshot op considers a previously unfinished snapshot to be+    stale and starts writing a snapshot from scratch again. Defaults to 86400+    seconds (1 day).++7.  `num_reader_threads`: Optional. Number of threads to parallelize reading+    from snapshot. Especially useful if compression is turned on since the+    decompression operation tends to be intensive. If > 1, then+    this might introduce non-determinism i.e. the order in which the elements+    are read from the snapshot are different from the order they're written.+    Defaults to AUTO. ++8.  `reader_buffer_size`: Optional. Maximum number of elements we can prefetch+    reading from the snapshot. Increasing this might improve+    performance but will increase memory consumption. Defaults to AUTO.++9.  `num_writer_threads`: Optional. Number of threads to parallelize writing+    from snapshot. We'll open up `num_writer_threads` files and write to them in+    parallel. Especially useful if compression is turned on since the+    compression operation tends to be intensive. If > 1, then+    this might introduce non-determinism i.e. the order in which the elements+    are read from the upstream iterator are different from the order they're+    written. Defaults to AUTO. ++10. `writer_buffer_size`: Optional. Maximum number of pipeline elements to fill+    up the buffer before writing them out using `num_writer_threads`. Defaults+    to AUTO.++11. `shuffle_on_read`: Optional. If this is True, then snapshot randomizes the+    order in which the snapshot files are read back. This emulates shuffling+    of the input files during a training run (e.g. when `Dataset.list_files` +    is called with `shuffle` turned on). Defaults to False.++12. `shuffle_seed`: Optional. If shuffle_seed is set, the random number+    generator used for shuffling (when `shuffle_on_read` is turned on) is seeded+    by the given seed. Otherwise, it is seeded by a random seed that differs for+    every run.++13. `mode`: Optional. The mode at which snapshot should operate. Valid options+    are `auto`, `read`, `write`, and `passthrough`. The default mode is `auto`,+    where the snapshot op will automatically determine what mode to operate in.++    1.  `write` mode forces the snapshot transformation to write a new+        materialization to disk, regardless of whether a complete and valid+        materialization currently exists. In other words, we enter the **WRITE**+        state immediately.++    2.  `read` mode forces the snapshot transformation to read from the latest+        version of the materialization on disk, regardless of whether the data+        stored on disk is complete and valid. In other words, we enter the+        **READ** state immediately.++    3.  `passthrough` mode turns the snapshot transformation into a no-op. In+        other words, we enter the **PASSTHROUGH** state immediately.++    4.  `auto` retains the default behavior of snapshot. See the "Standard+        Kernel Workflow" section for the default behavior.++14. `snapshot_name`: Optional. If set, use the supplied string as a named

This needs to clarify what happens if two snapshots are named the same. Can they collide?

frankchn

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: tf.data Snapshot

+# tf.data Snapshot++| Status        | Proposed                                                |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [193](https://github.com/tensorflow/community/pull/193) |+| **Author(s)** | Frank Chen (frankchn@google.com), Rohan Jain            |+|               | (rohanj@google.com)                                     |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2020-01-07                                              |++## Objective++With ever faster accelerators available in Cloud and hyperparameter tuning+consuming larger chunks of accelerator time, TensorFlow users are increasingly+finding that they don’t have enough CPU resources to keep up with these+accelerators, leaving valuable accelerator resources idle.++To alleviate this problem, we are proposing a `snapshot` API within `tf.data`,+to allow users to transparently persist the output of their preprocessing+pipeline to disk, and materialize the pre-processed data on a different training+run.++This API enables repeated preprocessing steps to be consolidated, and allowing+re-use of already processed data, trading off disk storage and network bandwidth+for freeing up more valuable CPU resources and accelerator compute time.++## Motivation++Large TensorFlow users have indicated that they have complicated input+processing pipelines which saturate their CPUs before saturating their+accelerators (TPUs in particular). Since they often experiment with+hyperparameter tuning or tweaks to existing model without affecting their input+pipeline, they are asking for ways to avoid similar repeated preprocessing of+data by either saving a dataset or caching it to disk.++## User Benefit++Users will be able to transparently persist partially or fully processed data+from `tf.data` input pipelines to disk or Cloud storage systems, and materialize+the pre-processed data during subsequent runs from the same pipeline. This will+cut down on the input pipeline processing overheads during second and subsequent+runs.++## Design Proposal++We propose that we add a new `snapshot` transformation to tf.data. To illustrate+the usage of the transformation, we can start with some sample code:++```python+dataset = Dataset.list_files("/raw/data/*").shard(num_workers, i)+dataset = dataset.parallel_interleave(TFRecordDataset)+dataset = dataset.map(my_preprocessing_fn)+dataset = dataset.apply(tf.data.snapshot("/saved/data", options...))+dataset = dataset.repeat()++model = ...+model.fit(dataset)+```++As we can see, the end user simply has to add this transformation in order to+use this functionality. In essence, the transformation is similar to the+existing `tf.data.Dataset.cache`, with the key difference is being that, unlike+`cache`, `snapshot` is intended to re-used across different executions of the+same input pipelines.++### Proposed API++We are proposing the following API for the snapshot transformation.++```python+def snapshot(path,+             compression=None,+             reader_path_prefix=None,+             writer_path_prefix=None,+             shard_size_bytes=None,+             pending_snapshot_expiry_seconds=None,+             num_reader_threads=None,+             reader_buffer_size=None,+             num_writer_threads=None,+             writer_buffer_size=None,+             shuffle_on_read=None,+             shuffle_seed=None,+             mode=None,+             snapshot_name=None):+  pass  # Implementation goes here.+```++1.  `path`: Required. A directory where we want to save our snapshots and/or+    read from a previously saved snapshot.++2.  `compression`: Optional. The type of compression to apply to the snapshot+    written to disk. This will support `GZIP`, `SNAPPY` or None. Defaults to+    AUTO.++3.  `reader_path_prefix`: Optional. A prefix to add to the path when reading+    from snapshots. This is useful for filesystems where configuration is passed+    in through the path. Defaults to None.++4.  `writer_path_prefix`: Optional. A prefix to add to the path when writing to+    snapshots. This is useful for filesystems where configuration is passed in+    through the path. Defaults to None.++5.  `shard_size_bytes`: Optional. The maximum size of each data file to be+    written by the snapshot dataset op. Defaults to AUTO.++6.  `pending_snapshot_expiry_seconds`: Optional. How long to wait (in seconds)

Why do we need this?

frankchn

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: tf.data Snapshot

+# tf.data Snapshot++| Status        | Proposed                                                |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [193](https://github.com/tensorflow/community/pull/193) |+| **Author(s)** | Frank Chen (frankchn@google.com), Rohan Jain            |+|               | (rohanj@google.com)                                     |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2020-01-07                                              |++## Objective++With ever faster accelerators available in Cloud and hyperparameter tuning+consuming larger chunks of accelerator time, TensorFlow users are increasingly+finding that they don’t have enough CPU resources to keep up with these+accelerators, leaving valuable accelerator resources idle.++To alleviate this problem, we are proposing a `snapshot` API within `tf.data`,+to allow users to transparently persist the output of their preprocessing+pipeline to disk, and materialize the pre-processed data on a different training+run.++This API enables repeated preprocessing steps to be consolidated, and allowing+re-use of already processed data, trading off disk storage and network bandwidth+for freeing up more valuable CPU resources and accelerator compute time.++## Motivation++Large TensorFlow users have indicated that they have complicated input+processing pipelines which saturate their CPUs before saturating their+accelerators (TPUs in particular). Since they often experiment with+hyperparameter tuning or tweaks to existing model without affecting their input+pipeline, they are asking for ways to avoid similar repeated preprocessing of+data by either saving a dataset or caching it to disk.++## User Benefit++Users will be able to transparently persist partially or fully processed data+from `tf.data` input pipelines to disk or Cloud storage systems, and materialize+the pre-processed data during subsequent runs from the same pipeline. This will+cut down on the input pipeline processing overheads during second and subsequent+runs.++## Design Proposal++We propose that we add a new `snapshot` transformation to tf.data. To illustrate+the usage of the transformation, we can start with some sample code:++```python+dataset = Dataset.list_files("/raw/data/*").shard(num_workers, i)+dataset = dataset.parallel_interleave(TFRecordDataset)+dataset = dataset.map(my_preprocessing_fn)+dataset = dataset.apply(tf.data.snapshot("/saved/data", options...))+dataset = dataset.repeat()++model = ...+model.fit(dataset)+```++As we can see, the end user simply has to add this transformation in order to+use this functionality. In essence, the transformation is similar to the+existing `tf.data.Dataset.cache`, with the key difference is being that, unlike+`cache`, `snapshot` is intended to re-used across different executions of the+same input pipelines.++### Proposed API++We are proposing the following API for the snapshot transformation.++```python+def snapshot(path,+             compression=None,+             reader_path_prefix=None,+             writer_path_prefix=None,+             shard_size_bytes=None,+             pending_snapshot_expiry_seconds=None,+             num_reader_threads=None,+             reader_buffer_size=None,+             num_writer_threads=None,+             writer_buffer_size=None,+             shuffle_on_read=None,+             shuffle_seed=None,+             mode=None,+             snapshot_name=None):+  pass  # Implementation goes here.+```++1.  `path`: Required. A directory where we want to save our snapshots and/or+    read from a previously saved snapshot.++2.  `compression`: Optional. The type of compression to apply to the snapshot+    written to disk. This will support `GZIP`, `SNAPPY` or None. Defaults to+    AUTO.++3.  `reader_path_prefix`: Optional. A prefix to add to the path when reading

This and writer_path_prefix feel like very specific hacks I would not like to see persisted throughout the API.

frankchn

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: tf.data Snapshot

+# tf.data Snapshot++| Status        | Proposed                                                |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [193](https://github.com/tensorflow/community/pull/193) |+| **Author(s)** | Frank Chen (frankchn@google.com), Rohan Jain            |+|               | (rohanj@google.com)                                     |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2020-01-07                                              |++## Objective++With ever faster accelerators available in Cloud and hyperparameter tuning+consuming larger chunks of accelerator time, TensorFlow users are increasingly+finding that they don’t have enough CPU resources to keep up with these+accelerators, leaving valuable accelerator resources idle.++To alleviate this problem, we are proposing a `snapshot` API within `tf.data`,+to allow users to transparently persist the output of their preprocessing+pipeline to disk, and materialize the pre-processed data on a different training+run.++This API enables repeated preprocessing steps to be consolidated, and allowing+re-use of already processed data, trading off disk storage and network bandwidth+for freeing up more valuable CPU resources and accelerator compute time.++## Motivation++Large TensorFlow users have indicated that they have complicated input+processing pipelines which saturate their CPUs before saturating their+accelerators (TPUs in particular). Since they often experiment with+hyperparameter tuning or tweaks to existing model without affecting their input+pipeline, they are asking for ways to avoid similar repeated preprocessing of+data by either saving a dataset or caching it to disk.++## User Benefit++Users will be able to transparently persist partially or fully processed data+from `tf.data` input pipelines to disk or Cloud storage systems, and materialize+the pre-processed data during subsequent runs from the same pipeline. This will+cut down on the input pipeline processing overheads during second and subsequent+runs.++## Design Proposal++We propose that we add a new `snapshot` transformation to tf.data. To illustrate+the usage of the transformation, we can start with some sample code:++```python+dataset = Dataset.list_files("/raw/data/*").shard(num_workers, i)+dataset = dataset.parallel_interleave(TFRecordDataset)+dataset = dataset.map(my_preprocessing_fn)+dataset = dataset.apply(tf.data.snapshot("/saved/data", options...))+dataset = dataset.repeat()++model = ...+model.fit(dataset)+```++As we can see, the end user simply has to add this transformation in order to+use this functionality. In essence, the transformation is similar to the+existing `tf.data.Dataset.cache`, with the key difference is being that, unlike+`cache`, `snapshot` is intended to re-used across different executions of the

Does cache provide any advantages over snapshot?

frankchn

comment created time in 2 months

Pull request review commenttensorflow/community

RFC: tf.data Snapshot

+# tf.data Snapshot++| Status        | Proposed                                                |+| :------------ | :------------------------------------------------------ |+| **RFC #**     | [193](https://github.com/tensorflow/community/pull/193) |+| **Author(s)** | Frank Chen (frankchn@google.com), Rohan Jain            |+|               | (rohanj@google.com)                                     |+| **Sponsor**   | Jiri Simsa (jsimsa@google.com)                          |+| **Updated**   | 2020-01-07                                              |++## Objective++With ever faster accelerators available in Cloud and hyperparameter tuning+consuming larger chunks of accelerator time, TensorFlow users are increasingly+finding that they don’t have enough CPU resources to keep up with these+accelerators, leaving valuable accelerator resources idle.++To alleviate this problem, we are proposing a `snapshot` API within `tf.data`,+to allow users to transparently persist the output of their preprocessing+pipeline to disk, and materialize the pre-processed data on a different training+run.++This API enables repeated preprocessing steps to be consolidated, and allowing+re-use of already processed data, trading off disk storage and network bandwidth+for freeing up more valuable CPU resources and accelerator compute time.++## Motivation++Large TensorFlow users have indicated that they have complicated input+processing pipelines which saturate their CPUs before saturating their+accelerators (TPUs in particular). Since they often experiment with+hyperparameter tuning or tweaks to existing model without affecting their input+pipeline, they are asking for ways to avoid similar repeated preprocessing of+data by either saving a dataset or caching it to disk.++## User Benefit++Users will be able to transparently persist partially or fully processed data+from `tf.data` input pipelines to disk or Cloud storage systems, and materialize+the pre-processed data during subsequent runs from the same pipeline. This will+cut down on the input pipeline processing overheads during second and subsequent+runs.++## Design Proposal++We propose that we add a new `snapshot` transformation to tf.data. To illustrate+the usage of the transformation, we can start with some sample code:++```python+dataset = Dataset.list_files("/raw/data/*").shard(num_workers, i)+dataset = dataset.parallel_interleave(TFRecordDataset)+dataset = dataset.map(my_preprocessing_fn)+dataset = dataset.apply(tf.data.snapshot("/saved/data", options...))+dataset = dataset.repeat()++model = ...+model.fit(dataset)+```++As we can see, the end user simply has to add this transformation in order to+use this functionality. In essence, the transformation is similar to the+existing `tf.data.Dataset.cache`, with the key difference is being that, unlike+`cache`, `snapshot` is intended to re-used across different executions of the+same input pipelines.++### Proposed API++We are proposing the following API for the snapshot transformation.++```python+def snapshot(path,+             compression=None,+             reader_path_prefix=None,+             writer_path_prefix=None,+             shard_size_bytes=None,+             pending_snapshot_expiry_seconds=None,+             num_reader_threads=None,+             reader_buffer_size=None,+             num_writer_threads=None,+             writer_buffer_size=None,+             shuffle_on_read=None,

Why does snapshotting include its own shuffling?

How does the snapshot shuffling interact with the shuffling performed in existing tf.data transformations like shuffle?

The way these options are presented now mean you have different behavior when loading from a snapshot than you had when not loading from a snapshot, which means the existing shuffling behavior is not guaranteed even if there is no shuffle_on_read (as reading from a snapshot cannot simulate shuffling files).

frankchn

comment created time in 2 months

issue closedtensorflow/tensorflow

A puzzling & fatal error occurred in the tf.matmal()

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04 and win 10
  • TensorFlow installed from (source or binary): Install in the conda integration environment conda create -n tf2-gpu tensorflow-gpu=2.0
  • TensorFlow version (use command below): tf-gpu 2.0 stable & tf-gpu 2.0 beta
  • Python version: 3.6
  • CUDA/cuDNN version: CUDA: 10.0.0130-0 cuDNN: 7.6.5
  • GPU model and memory: GeForce GTX 850M GeForce RTX 2070S

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with: 1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)" 2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior & expected behavior

The following error occurs when using tf.matmal() to compute the product of multidimensional tensors on the gpu.

import tensorflow as tf
import numpy as np

j = np.random.rand(10, 6, 1130, 16, 8)
k = np.random.rand(10, 6, 1130, 8, 1)
# with tf.device("CPU:0"):
j = tf.cast(j, dtype=tf.float32)
k = tf.cast(k, dtype=tf.float32)

a = tf.matmul(j, k)[9, 3]
b = tf.matmul(j[9], k[9])[3]
c = tf.matmul(j[9, 3], k[9, 3])

print(tf.reduce_all(tf.equal(a, b)))
print(tf.reduce_all(tf.equal(b, c)))

'''
tf.Tensor(False, shape=(), dtype=bool)  # The correct output would be True
tf.Tensor(True, shape=(), dtype=bool)
'''

This error does not occur while using the CPU.

...

with tf.device("CPU:0"):
    j = tf.cast(j, dtype=tf.float32)
    k = tf.cast(k, dtype=tf.float32)

    a = tf.matmul(j, k)[9, 3]
    b = tf.matmul(j[9], k[9])[3]
    c = tf.matmul(j[9, 3], k[9, 3])

    print(tf.reduce_all(tf.equal(a, b)))
    print(tf.reduce_all(tf.equal(b, c)))

'''
tf.Tensor(True, shape=(), dtype=bool)
tf.Tensor(True, shape=(), dtype=bool)
'''

This error will not occur even if you reduce the size of some dimension a bit.

We make the following changes:

# j = np.random.rand(10, 6, 1130, 16, 8)
# k = np.random.rand(10, 6, 1130, 8, 1)
j = np.random.rand(10, 6, 1129, 16, 8)  # 1130 --> 1129
k = np.random.rand(10, 6, 1129, 8, 1)

also use the gpu:

j = tf.cast(j, dtype=tf.float32)
k = tf.cast(k, dtype=tf.float32)

a = tf.matmul(j, k)[9, 3]
b = tf.matmul(j[9], k[9])[3]
c = tf.matmul(j[9, 3], k[9, 3])

print(tf.reduce_all(tf.equal(a, b)))
print(tf.reduce_all(tf.equal(b, c)))

'''
tf.Tensor(True, shape=(), dtype=bool)
tf.Tensor(True, shape=(), dtype=bool)
'''

I tested it on different GPU and OS , but got the same error.

When errors occur, I have compared the specific differences between the two methods ( a and b), and found that it is not the slight differences that cause the error.

print(tf.reduce_sum(a-b))
'''
tf.Tensor(-1.3804454e+38, shape=(), dtype=float32)
'''

closed time in 2 months

cantbeblank96

issue commenttensorflow/tensorflow

A puzzling & fatal error occurred in the tf.matmal()

I just ran against tf-nightly and it's fixed there.

cantbeblank96

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add support to cuDNN CTC loss

 tf_module {     name: "cross"

Now you can revert this file to make the API tests pass again

houtoms

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add support to cuDNN CTC loss

 tf_module {     name: "cosh"

Now you can revert this file to make the API tests pass again

houtoms

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add support to cuDNN CTC loss

 from tensorflow.python.util import nest from tensorflow.python.util.tf_export import tf_export +import os

The linter will complain; standard python imports need to be above all others

houtoms

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add support to cuDNN CTC loss

+op {+  graph_op_name: "CTCLossV2"

Can you add a line here saying "visibility: HIDDEN"; this will prevent the generation of a tf.ctc_loss_v2 API

houtoms

comment created time in 2 months

pull request commenttensorflow/tensorflow

Implement Hessian for sparse softmax cross entropy

Sorry, my mistake! Approving now.

MichaelKonobeev

comment created time in 2 months

pull request commenttensorflow/tensorflow

Implement Hessian for sparse softmax cross entropy

But what about third-order derivatives? Can you add a compute_gradient_error on the result of calling tf.gradients on tf.hessians?

On Wed, Jan 8, 2020 at 2:05 PM MichaelKonobeev notifications@github.com wrote:

@MichaelKonobeev commented on this pull request.

In tensorflow/python/ops/nn_grad.py https://github.com/tensorflow/tensorflow/pull/31700#discussion_r364466301 :

  • Second derivative is just softmax derivative w.r.t. logits.

  • softmax_grad = op.outputs[1]
  • grad = _BroadcastMul(grad_loss, softmax_grad)
  • logits = op.inputs[0]
  • if (grad_grad is not None
  •  and not getattr(grad_grad, "_is_zeros_tensor", False)):
    
  • softmax = nn_ops.softmax(logits)
  • grad += ((grad_grad - array_ops.squeeze(
  •    math_ops.matmul(
    
  •        array_ops.expand_dims(grad_grad, 1),
    
  •        array_ops.expand_dims(softmax, 2)),
    
  •    axis=1)) * softmax)
    
  • return grad, None

This None refers to the gradient wrt labels passed as the second input into the operation, isn't it?

Because we break out of fused implementation when Hessian is computed, I thought this should work properly when higher order derivatives are requested. I tested it locally now by adding compute_gradient_error for the result of tf.hessians similar to the testSecondGradient test case to be added with this PR and got error around 2.12e-8.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/31700?email_source=notifications&email_token=AAABHRLBQAJ7B65ELCJYLZTQ4ZEZRA5CNFSM4IMLTVRKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRDKTUA#discussion_r364466301, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNJCW55SWOVHCDMOJDQ4ZEZRANCNFSM4IMLTVRA .

--

  • Alex
MichaelKonobeev

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add doc summary for tf.strings.upper and tf.strings.lower.

 op {   graph_op_name: "StringUpper"+  description: "Converts each string in the input Tensor to lowercase."

This should read uppercase I think

pbaranay

comment created time in 2 months

issue commenttensorflow/tensorflow

why is tensorflow.map_fn slow, what is wrong with following code?

Does tf.vectorized_map work here instead of map_fn?

minhhg

comment created time in 2 months

pull request commenttensorflow/tensorflow

support convert eager tensor to graph tensor

The code still doesn't do the right thing after this change. If you want to be building graphs to use with sessions in tf2 you need to use with tf.compat.v1.Graph().as_default()

fsx950223

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Implement Hessian for sparse softmax cross entropy

 def IsZero(g):   @ops.RegisterGradient("SparseSoftmaxCrossEntropyWithLogits")-def _SparseSoftmaxCrossEntropyWithLogitsGrad(op, grad_0, _):+def _SparseSoftmaxCrossEntropyWithLogitsGrad(op, grad_loss, grad_grad):   """Gradient function for SparseSoftmaxCrossEntropyWithLogits."""-  # grad_0 is the backprop for cost, and we multiply it with the gradients+  # grad_loss is the backprop for cost, and we multiply it with the gradients   # (which is output[1])+  # grad_grad is the backprop for softmax gradient.   # There is no gradient for the labels   #-  # Currently there is no way to take the second derivative of this op-  # due to the fused implementation's interaction with tf.gradients(),-  # so we make sure we prevent silently incorrect results by raising-  # an error if the second derivative is requested via prevent_gradient.-  sparse_softmax_grad_without_gradient = array_ops.prevent_gradient(-      op.outputs[1],-      message="Currently there is no way to take the second "-      "derivative of sparse_softmax_cross_entropy_with_logits due to the fused "-      "implementation's interaction with tf.gradients()")-  return _BroadcastMul(grad_0, sparse_softmax_grad_without_gradient), None+  # Second derivative is just softmax derivative w.r.t. logits.+  softmax_grad = op.outputs[1]+  grad = _BroadcastMul(grad_loss, softmax_grad)++  logits = op.inputs[0]+  if (grad_grad is not None+      and not getattr(grad_grad, "_is_zeros_tensor", False)):+    softmax = nn_ops.softmax(logits)++    grad += ((grad_grad - array_ops.squeeze(+        math_ops.matmul(+            array_ops.expand_dims(grad_grad, 1),+            array_ops.expand_dims(softmax, 2)),+        axis=1)) * softmax)++  return grad, None

I think the None wrt grad_grad here just made this code silently return the wrong answer when taking third-order derivatives.

Either add a prevent_gradient or implement the third-order derivative.

MichaelKonobeev

comment created time in 2 months

issue commenttensorflow/tensorflow

[TF 2.0 API Docs] tf.custom_gradient

Sure, please send a PR!

On Fri, Jan 3, 2020 at 4:05 AM Saumil-Agarwal notifications@github.com wrote:

@tsbertalan https://github.com/tsbertalan @dynamicwebpaige https://github.com/dynamicwebpaige @alextp https://github.com/alextp Since the issue is still open, can I work on it and do the required changes in documentation?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/26270?email_source=notifications&email_token=AAABHRO2AEAY7EZOOOUAH7DQ34SZVA5CNFSM4G3ICUN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIA7TRY#issuecomment-570554823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHROYGTKNGE2RREF4AGDQ34SZVANCNFSM4G3ICUNQ .

--

  • Alex
tsbertalan

comment created time in 2 months

pull request commenttensorflow/tensorflow

Add usage example for tf.math.polyval

It means pylint doesn't like your code. The error message should give you details about what is wrong.

On Thu, Jan 2, 2020 at 1:11 PM Qwerty71 notifications@github.com wrote:

Hi, I just checked the build details, and it looked like it has failed the following tests:

  1. do_pylint PYTHON2: Python 2 pylint FAIL
  2. do_pylint PYTHON3: Python 3 pylint FAIL

There seems to be a problem with pylint - how can this be fixed?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35522?email_source=notifications&email_token=AAABHROABI4QTE3XBRKISPLQ3ZKBFA5CNFSM4KBZK5IKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH7N4RQ#issuecomment-570351174, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRLGOLJK5LRIVUPTPGTQ3ZKBFANCNFSM4KBZK5IA .

--

  • Alex
Qwerty71

comment created time in 2 months

issue commenttensorflow/tensorflow

ImageDataGenerator does not work with tpu

Pinging @jsimsa for tf.data expertise.

I think cases like this show we do need to handle sources we cannot move in tf.data; it should still be possible to do prefetching / buffering / etc on the TPU controller to hide most of the latency, assuming the generator produces data quickly enough.

Shiro-LK

comment created time in 2 months

issue commenttensorflow/tensorflow

Support Sparse Tensors in py_function

That doesn't make sense to me because SparseTensor is a composite: https://github.com/tensorflow/tensorflow/blob/bb45024ae9d3df0127d1c1056b08f25e60ba601c/tensorflow/python/framework/sparse_tensor.py#L48

So it's likely something else that is going on here.

novog

comment created time in 2 months

pull request commenttensorflow/tensorflow

Added the ability to multiply a SparseTensor with a Dense Matrix on the left side

Just the docstring.

On Thu, Jan 2, 2020 at 8:58 AM Archis Joglekar notifications@github.com wrote:

Thanks for the feedback, makes sense. I will choose the former.

Can you tell me exactly where to document the new meaning? At least the docstring, but is there anywhere else this should go?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/34624?email_source=notifications&email_token=AAABHRNRAMAKFLH2IOX73PLQ3YMMNA5CNFSM4JR5UYS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEH6Z6TQ#issuecomment-570269518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMYW4UTEY2DSKETKY3Q3YMMNANCNFSM4JR5UYSQ .

--

  • Alex
joglekara

comment created time in 2 months

pull request commenttensorflow/tensorflow

Add more documentation and a usage example for tf.math.sigmoid

Please make both changes; if you want to show the values approach 0 and 1 maybe add a line like

abs(sigmoid(10) - 1) < 0.001

etc

On Thu, Jan 2, 2020 at 8:57 AM Qwerty71 notifications@github.com wrote:

@Qwerty71 commented on this pull request.

In tensorflow/python/ops/math_ops.py https://github.com/tensorflow/tensorflow/pull/35541#discussion_r362547474 :

@@ -3212,6 +3218,15 @@ def sigmoid(x, name=None):

Returns: A Tensor with the same type as x. +

  • Usage Example:
  • y = tf.math.sigmoid(10.0)

Would you like me to make both changes you requested (change inputs, use fewer digits of precision on outputs) or only one? I was trying to demonstrate how the values approach 0 and 1.

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35541?email_source=notifications&email_token=AAABHRNF2YHLHEWSSXBBKD3Q3YMGXA5CNFSM4KCCGFEKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQQ3LEA#discussion_r362547474, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRLS3YALCZ6B4XSPOCLQ3YMGXANCNFSM4KCCGFEA .

--

  • Alex
Qwerty71

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add more documentation and a usage example for tf.math.sigmoid

 def sigmoid(x, name=None):    Returns:     A Tensor with the same type as `x`.+    +  Usage Example:++  >>> y = tf.math.sigmoid(10.0)

Use smaller numbers like 1 and -1 that let you use fewer digits of precision (so replace 0.99999546 with 0.9... etc)

Qwerty71

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Switch from where to where_v2 in matrix_exponential

 def matrix_exponential(input, name=None):  # pylint: disable=redefined-builtin         math_ops.reduce_sum(             math_ops.abs(matrix),             axis=array_ops.size(array_ops.shape(matrix)) - 2),-        axis=-1)+        axis=-1)[..., array_ops.newaxis, array_ops.newaxis]

I think expand_dims will read a little better than newaxis here, but up to you.

charmasaur

comment created time in 2 months

PR closed tensorflow/tensorflow

Reviewers
Fix tf.while_loop() first example awaiting review cla: yes comp:ops size:XS

Loop_vars currently is a scalar tensor

+1 -1

1 comment

1 changed file

LongOnly

pr closed time in 2 months

pull request commenttensorflow/tensorflow

Fix tf.while_loop() first example

After your change the loop will exit immediately instead of running for 10 iterations. We want an actual loop.

LongOnly

comment created time in 2 months

pull request commenttensorflow/tensorflow

Linspace for multi-dimensional tensors

Yes, remove that deletion, thanks.

hristo-vrigazov

comment created time in 2 months

pull request commenttensorflow/tensorflow

Added usage example for tf.image.rgb_to_yiq

Maybe the number of spaces is different? +Yash Katariya yashkatariya@google.com do you know?

On Fri, Dec 27, 2019 at 10:25 AM HotPotatoC notifications@github.com wrote:

The doctest seems to have failed

Expected: <tf.Tensor: shape=(1, 1, 3), dtype=float32, numpy=array([[[ 1.815 , -0.9..., 0.09...]]], dtype=float32)> ``` Got: <tf.Tensor: shape=(1, 1, 3), dtype=float32, numpy=array([[[ 1.815 , -0.91724455, 0.09962624]]], dtype=float32)>

Why is the '...' not working?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35293?email_source=notifications&email_token=AAABHRLFNMUGMBMATQIWCNTQ2ZCA7A5CNFSM4J5VZERKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHXSOIA#issuecomment-569321248, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPZ7BVEBGIUXFOIBS3Q2ZCA7ANCNFSM4J5VZERA .

--

  • Alex
HotPotatoC

comment created time in 2 months

pull request commenttensorflow/tensorflow

Document undocumented tf.strings methods

Yeah let's not do that

On Fri, Dec 27, 2019 at 10:24 AM Marçal Comajoan Cara < notifications@github.com> wrote:

@salcc commented on this pull request.

In tensorflow/core/api_def/base_api/api_def_StringLower.pbtxt https://github.com/tensorflow/tensorflow/pull/35437#discussion_r361720185 :

@@ -1,3 +1,13 @@ op { graph_op_name: "StringLower"

  • summary: "Lowercases strings in input"
  • description: <<END +Converts each uppercase character of each string in the input Tensor to +lowercase.

+>>> strings = ['Hello', 'TensorFlow'] +>>> tf.strings.lower(strings).numpy()

By the way, I used .numpy() because in other places is used to do the same thing, for example in https://github.com/tensorflow/tensorflow/blob/1e65730120aafc413e8c3dcddcf19cd8d184fe1b/tensorflow/core/api_def/base_api/api_def_StringLength.pbtxt#L30 ( https://www.tensorflow.org/api_docs/python/tf/strings/length?version=nightly ).

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35437?email_source=notifications&email_token=AAABHRL7KTWXGSTMJJJ2PCLQ2ZB53A5CNFSM4J7SMAZKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQJ4KMQ#discussion_r361720185, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPBF6E3LF67VKJJEPLQ2ZB53ANCNFSM4J7SMAZA .

--

  • Alex
salcc

comment created time in 2 months

pull request commenttensorflow/tensorflow

Added usage example for tf.image.rgb_to_yiq

No need to provide a print, just remove the "y = " and the doctest will work.

On Fri, Dec 27, 2019 at 9:38 AM HotPotatoC notifications@github.com wrote:

@HotPotatoC commented on this pull request.

In tensorflow/python/ops/image_ops_impl.py https://github.com/tensorflow/tensorflow/pull/35293#discussion_r361712205 :

@@ -2931,13 +2931,21 @@ def rgb_to_yiq(images): Outputs a tensor of the same shape as the images tensor, containing the YIQ value of the pixels. The output is only well defined if the value in images are in [0,1].

  • Usage Example:
  • x = tf.constant([[[1.0, 2.0, 3.0]]])

  • y = tf.image.rgb_to_yiq(x)

  • [[[ 1.815 -0.9... 0.09...]]]

should I provide a print(y.numpy()) ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35293?email_source=notifications&email_token=AAABHRPX54BEJG7UFWTSZ73Q2Y4RXA5CNFSM4J5VZERKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQJ2A4A#discussion_r361712205, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMOT6A7YUZYGD6LT3DQ2Y4RXANCNFSM4J5VZERA .

--

  • Alex
HotPotatoC

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Added usage example for tf.image.rgb_to_yiq

 def rgb_to_yiq(images):   Outputs a tensor of the same shape as the `images` tensor, containing the YIQ   value of the pixels.   The output is only well defined if the value in images are in [0,1].+  +  Usage Example:+    ```python+    >>> x = tf.constant([[[1.0, 2.0, 3.0]]])+    >>> y = tf.image.rgb_to_yiq(x)+    [[[ 1.815      -0.9...  0.09...]]]

This probably needs to read <tf.Tensor(..., numpy=[[[ 1.815 -0.9... 0.09...]]])>

HotPotatoC

comment created time in 2 months

pull request commenttensorflow/tensorflow

Update math_ops.py

Yes

On Fri, Dec 27, 2019 at 8:47 AM Ananya Gangavarapu notifications@github.com wrote:

@anigasan commented on this pull request.

In tensorflow/python/ops/math_ops.py https://github.com/tensorflow/tensorflow/pull/35229#discussion_r361703216 :

@@ -354,10 +353,6 @@ def multiply(x, y, name=None): # pylint: disable=missing-docstring A 'Tensor'. Has the same type as 'x' """

  • return gen_math_ops.mul(x, y, name)

When you say remove, do you mean lines 338 and 341?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35229?email_source=notifications&email_token=AAABHRJ4BUJGMBAY7UPBEE3Q2YWTBA5CNFSM4J4QFP7KYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQJXK7A#discussion_r361703216, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRKXL7E4VKMUVKSEYC3Q2YWTBANCNFSM4J4QFP7A .

--

  • Alex
anigasan

comment created time in 2 months

issue commenttensorflow/tensorflow

[Eager] Fix for determining input / output shape of the model prior to Model.fit()

@yourtheron please file a separate issue for your problem which looks unrelated to this one

titu1994

comment created time in 2 months

pull request commenttensorflow/tensorflow

Added usage example for tf.image.random_hue and tf.image.rgb_to_yiq

No you haven't. I still don't see the results of the operations in the doctest and I still don't see a seed.

HotPotatoC

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Document undocumented tf.strings methods

 op {   graph_op_name: "StringLower"+  summary: "Lowercases strings in `input`"+  description: <<END+Converts each uppercase character of each string in the input Tensor to+lowercase.++>>> strings = ['Hello', 'TensorFlow']+>>> tf.strings.lower(strings).numpy()

No need to have .numpy() in the documentation, we can print tensors (same below)

salcc

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Add usage example to tf.image.pad_to_bounding_box

 def pad_to_bounding_box(image, offset_height, offset_width, target_height,     `[target_height, target_width, channels]`  Usage Example:-    '''python-    import tensorflow as tf-    x = tf.random.normal(shape=(256, 256, 3))-    x = tf.image.pad_to_bounding_box(x, 2, 2, 260, 260)-    '''+>>> x = tf.random.normal(shape=(256, 256, 3))+>>> x = tf.image.pad_to_bounding_box(x, 2, 2, 260, 260)+<x.shape = TensorShape([Dimension(260), Dimension(260), Dimension(3)])>

It should read TensorShape([260, 260, 3]); did you run the example with tf1?

RichardXiao13

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Update math_ops.py

 def multiply(x, y, name=None):  # pylint: disable=missing-docstring     A 'Tensor'. Has the same type as 'x'   """   -  return gen_math_ops.mul(x, y, name)

It looks like this PR is completely removing the implementation of the function

anigasan

comment created time in 2 months

pull request commenttensorflow/tensorflow

Added Usage Example for Image.Transpose()

Yes, please. I feel like I'm reviewing 100 versions of the same thing all with the same problem.

Maybe tell the following to all submitters:

  1. Read https://www.tensorflow.org/community/contribute/docs_ref
  2. make sure there are no floating point numbers with too much precision (replace the trailing digits with ...)
  3. make sure there are no random numbers in the doctest (as otherwise it's flaky)

On Thu, Dec 26, 2019 at 7:54 PM Kilaru Yasaswi Sri Chandra Gandhi < notifications@github.com> wrote:

@kyscg commented on this pull request.

In tensorflow/python/ops/image_ops_impl.py https://github.com/tensorflow/tensorflow/pull/35373#discussion_r361575246 :

@@ -638,10 +638,10 @@ def transpose(image, name=None): If image was 3-D, a 3-D float Tensor of shape [width, height, channels]

  • Usage Example: ```python >>
  • import tensorflow as tf >>
  • x = tf.random.normal(shape=(100, 200, 3)) >>
  • tf.image.transpose(x) `
  • Usage Example: ```python
  • import tensorflow as tf >>

About the PR's, these are all student submissions from GCI-2019. Would you like me to check for duplicates and inform those students to redirect elsewhere?

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/35373?email_source=notifications&email_token=AAABHRJGJYNAL5H2L3BKXA3Q2V355A5CNFSM4J62OCSKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCQIQ6UQ#discussion_r361575246, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRN6JHRJ3BHFSUBDHMLQ2V355ANCNFSM4J62OCSA .

--

  • Alex
Vansh-Sethi

comment created time in 2 months

issue commenttensorflow/tensorflow

Bijectors crash tf2.1 autograph in distributed mirrored multi-gpu mode

This is an internal TF bug.

Can you give me a self-contained example to reproduce? Like, paste the code you need to trigger the error in a colab.research.google.com ?

olegmyrk

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Update math_ops.py with additional usage examples

 def multiply_no_nan(x, y, name=None):   For example:   >>> x = 0.0   >>> y = 34.0-  >>> tf.math.multiply_no_nan(0.0, 34.0)+  >>> tf.math.multiply_no_nan(x, y)

I think you should put a NaN as input so people understand that 0 * NaN = 0 for mul_no_nan

anigasan

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

Update math_ops.py with additional usage examples

 def log_sigmoid(x, name=None):      For example:   >>> tf.math.log_sigmoid(tf.constant(1.0, tf.float32))-  <tf.Tensor: shape=(), dtype=float32, numpy=-0.31326166>+  <tf.Tensor: shape=(), dtype=float32, numpy=-0.31>

If you don't add the ... after 0.31 the doctest will fail. Ellipsis are interpreted as wildcard matches.

anigasan

comment created time in 2 months

Pull request review commenttensorflow/tensorflow

[Intel MKL] Klocwork fix

 REGISTER_REWRITE(EagerOpRewriteRegistry::PRE_EXECUTION, MklEagerOpRewrite); // Constructor MklEagerOpRewrite::MklEagerOpRewrite(string name, string file, string line)     : EagerOpRewrite(name, file, line) {+  registered_kernels_map_ = std::unordered_map<std::string, bool>();

Then isn't it better to initialize it in the : line? Line, MklEagerOpRewrite::MklEagerOpRewrite(string name, string file, string line) : EagerOpRewrite(name, file, line), registered_kernel_map_() { ...

?

mahmoud-abuzaina

comment created time in 2 months

more