profile
viewpoint

alextp/pylda 30

An implementation of gibbs sampling for Latent Dirichlet Allocation

jamsjr/hard-versus-soft 4

Um artigo sobre a diferença entre hard reservation e soft reserrvation em escalonadores de tempo real

duckworthd/Topics 3

Implementations of Inference algorithms for Topic Models like Latent Dirichlet Allocation, Hierarchical Dirichlet Processes, and more!

alextp/scikit-learn 2

scikit-learn main repo

alextp/groupcache 1

groupcache is a caching and cache-filling library, intended as a replacement for memcached in many cases.

alextp/autograd 0

Efficiently computes derivatives of numpy code.

alextp/community 0

Stores documents used by the TensorFlow developer community

pull request commenttensorflow/tensorflow

Fix deprecation message from GatherV2Grad

Because if you don't have the CPU placement there's a risk we'll bounce the int64 shape tensor to a GPU and back, which is pretty slow.

On Sun, May 24, 2020 at 11:39 PM Gaurav Jain notifications@github.com wrote:

@jaingaurav commented on this pull request.

In tensorflow/python/ops/array_grad.py https://github.com/tensorflow/tensorflow/pull/39731#discussion_r429755494 :

@@ -641,7 +641,7 @@ def _GatherV2Grad(op, grad):

For axis 0 gathers, build an appropriately shaped IndexedSlices.

if axis_static == 0: if context.executing_eagerly():

  •  params_tail_shape = params_shape.cpu()[1:]
    
  •  params_tail_shape = array_ops.identity(params_shape)[1:]
    

@alextp https://github.com/alextp: Is this CPU placement necessary? I looked at the history of this code and couldn't quite figure out why the CPU placement is needed.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/39731#discussion_r429755494, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNVYZVQQ76KPIWFOYDRTIHCPANCNFSM4NGGYUUA .

--

  • Alex
yongtang

comment created time in 20 hours

Pull request review commenttensorflow/community

RFC: TensorFloat-32 support in TensorFlow

+# TensorFloat-32 in TensorFlow++| Status        | Proposed                                             |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [247](https://github.com/tensorflow/community/pull/247) |+| **Author(s)** | Reed Wanderman-Milne (reedwm@google.com)             |+| **Sponsor**   | Sanjoy Das (sanjoy@google.com)                 |+| **Updated**   | 2020-05-20                                           |++## Objective++Allow [TensorFloat-32](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format) to be used in TensorFlow to improve performance.++## Motivation++[NVIDIA Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/), an upcoming generation of NVIDIA GPUs announced at GTC 2020, introduces a new numeric format called TensorFloat-32, or TF32 for short.+TF32 has the range of float32/bfloat16 (i.e. 8 bits of exponent) and the precision of fp16 (i.e. 10 bits of mantissa).+It is not an in-memory format, but tensor cores natively support it as a computation format.+TF32 should not be thought of as an in-memory dtype but instead a computation mode that increases performance and decreases numeric precision for certain float32 operations.+NVIDIA has not found any cases where TF32 reduces the convergence of deep learning models.++Upcoming versions of cuDNN, cuBLAS, and other CUDA libraries will expose a mode of execution that has float32 inputs and outputs, but internally truncates float32 to TF32 and uses tensor cores.  This is expected to be sufficiently accurate to reach the same convergence as the “full” float32 mode of execution but significantly faster.  Each element still takes four bytes, so there is still a memory and performance penalty compared to using float16 or bfloat16.++As TF32 is only usable by tensor cores, it can only be used for matrix multiplications and other ops implemented in terms of matrix multiplications, such as convolutions. It is not used for pointwise ops or reductions.++TF32 will benefit users who run float32 models on Ampere GPUs, so we need an API to allow these users to enable TF32. ++## Design Proposal++In TensorFlow, TF32 can be enabled for supported ops on Ampere GPUs with the following call:++```python+tf.config.allow_tensor_float_32_execution(True)

Also the mixed precision API mostly changes the dtype of tensors, while tf32 doesn't affect tensor dtype (afaict) just the dtype of accumulators inside ops.

reedwm

comment created time in 6 days

Pull request review commenttensorflow/community

RFC: TensorFloat-32 support in TensorFlow

+# TensorFloat-32 in TensorFlow++| Status        | Proposed                                             |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [247](https://github.com/tensorflow/community/pull/247) |+| **Author(s)** | Reed Wanderman-Milne (reedwm@google.com)             |+| **Sponsor**   | Sanjoy Das (sanjoy@google.com)                 |+| **Updated**   | 2020-05-20                                           |++## Objective++Allow [TensorFloat-32](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format) to be used in TensorFlow to improve performance.++## Motivation++[NVIDIA Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/), an upcoming generation of NVIDIA GPUs announced at GTC 2020, introduces a new numeric format called TensorFloat-32, or TF32 for short.+TF32 has the range of float32/bfloat16 (i.e. 8 bits of exponent) and the precision of fp16 (i.e. 10 bits of mantissa).+It is not an in-memory format, but tensor cores natively support it as a computation format.+TF32 should not be thought of as an in-memory dtype but instead a computation mode that increases performance and decreases numeric precision for certain float32 operations.+NVIDIA has not found any cases where TF32 reduces the convergence of deep learning models.++Upcoming versions of cuDNN, cuBLAS, and other CUDA libraries will expose a mode of execution that has float32 inputs and outputs, but internally truncates float32 to TF32 and uses tensor cores.  This is expected to be sufficiently accurate to reach the same convergence as the “full” float32 mode of execution but significantly faster.  Each element still takes four bytes, so there is still a memory and performance penalty compared to using float16 or bfloat16.++As TF32 is only usable by tensor cores, it can only be used for matrix multiplications and other ops implemented in terms of matrix multiplications, such as convolutions. It is not used for pointwise ops or reductions.++TF32 will benefit users who run float32 models on Ampere GPUs, so we need an API to allow these users to enable TF32. ++## Design Proposal++In TensorFlow, TF32 can be enabled for supported ops on Ampere GPUs with the following call:++```python+tf.config.allow_tensor_float_32_execution(True)+```++The word "allow" emphasizes only certain devices (Ampere GPUs) and ops (such as matmuls and convolutions) will be affected. Once enabled, all local and remote Ampere GPUs use TF32 for supported float32 ops.++Passing `False` to `allow_tensor_float_32_execution` will disable TF32 if already enabled.++We call the function "allow_tensor_float_32_execution" instead of the more concise "allow_tf32_execution" because people may mistakenly interpret the phrase "tf32" to refer to TensorFlow instead of TensorFloat. ++The following can be used to query whether TF32 is enabled. The function returns a bool.++```python+tf.config.tensor_float_32_execution_allowed()+```++Since TF32 only affects Ampere GPUs, moving an op to a GPU can affect numerics. Grappler and other graph optimizations will not consider this, and will freely move ops between devices without regard to numeric stability. As a result, explicitly putting an op on the CPU does not ensure it will use the full float32 precision instead of TF32.++Since TensorFlow 2.3 will not support CUDA 11, which is required for TF32, this API will first be exposed in TensorFlow 2.4. However, Google Cloud will likely cherrypick CUDA 11 and this API into their version of 2.3, so they can offer TF32 support to their customers who use TensorFlow 2.3.+++### Turning TF32 on by default++Numerical studies by NVIDIA covering many common models suggest that TF32 is numerically robust for deep learning applications. In order to take advantage of these new accelerations in Ampere hardware for float32 models, we would like to enable TF32 by default. However, since the TensorFlow 2.4 release is still months away and we intend to use that time to further test and evaluate TF32, it is too early to decide in this RFC whether TF32 execution will be enabled or disabled by default. Here we begin a discussion by listing the most likely scenarios. Comments are also welcome. The scenarios are:++1. Turn it on by default in 2.4, the first release with the TF32 API.+2. Turn it on by default in 2.5, the second release with the TF32 API.+3. Do not turn it on by default.+++The advantage of (1) is that all Ampere float32 users get the performance benefit unless they opt out. Additionally, Ampere numerics will not be loosened in a new release: TensorFlow 2.4 will be the first release with Ampere support, and it will immediately default to TF32 being enabled. The disadvantage is that we cannot collect as much feedback from users before defaulting to TF32, because no stable version of TensorFlow will support TF32 but not have it enabled by default.++The advantage of (2) is that it allows users to test and give feedback on TF32 with a stable version of TensorFlow before we decide whether it should be default. The disadvantage is it’s possible we break Ampere users who relied on the full float32 precision in 2.4 when they upgrade to 2.5++The advantage of (3) is that a user’s model will never break due to using reduced precision, even if they upgrade from an earlier GPU to Ampere. The disadvantage is that many Ampere users would not get the performance benefit from TF32 as they would not know about the API to enable it.++Another advantage of turning on TF32 by default is that it makes TensorFlow’s behavior with GPUs more consistent with TPUs. TPUs internally use lower precision for float32 matmuls and convolutions, similar to how Ampere GPUs will use lower precision for float32 matmuls and convolutions if TF32 is enabled.++**If you know of any models whose accuracy may be impacted by TF32, please comment on this RFC.** Note that TF32 is equivalent to float32 except it has 10 bits of mantissa instead of 23 bits. It will initially be used only for matmuls and convolutions, but may be used for other ops in the future if they are implemented in terms of a matmul. Once TensorFlow 2.4 is released, you will be able to test the impact of TF32 on your models if you have Ampere GPUs. You will be able to test earlier if you use Tensorflow nightly packages, and even earlier if you build from source with CUDA 11 support.

You might want to indicate a way to receive private feedback about this too.

reedwm

comment created time in 6 days

Pull request review commenttensorflow/community

RFC: TensorFloat-32 support in TensorFlow

+# TensorFloat-32 in TensorFlow++| Status        | Proposed                                             |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #)|+| **Author(s)** | Reed Wanderman-Milne (reedwm@google.com)             |+| **Sponsor**   | Sanjoy Das (sanjoy@google.com)                 |+| **Updated**   | 2020-05/20                                           |++## Objective++Allow [TensorFloat-32](https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format) to be used in TensorFlow to improve performance.++## Motivation++NVIDIA Ampere, an upcoming generation of NVidia GPUs announced at GTC 2020, introduces a new numeric format called TensorFloat-32, or TF32 for short.+TF32 has the range of float32/bfloat16 (i.e. 8 bits of exponent) and the precision of fp16 (i.e. 10 bits of mantissa).+For the most part, it is not an in-memory format, but tensor cores natively support it as a computation format.+TF32 should not be thought of as an in-memory dtype but instead a computation mode that increases performance and decreases numeric precision for certain float32 operations.+Nvidia has not found any cases where TF32 reduces the convergence of deep learning models.++Upcoming versions of cuDNN, cuBLAS, and other CUDA libraries will expose a mode of execution that has float32 inputs and outputs, but internally truncates float32 to TF32 and uses tensor cores.  This is expected to be sufficiently accurate to reach the same convergence as the “full” float32 mode of execution but significantly faster.  Each element still takes four bytes, so there is still a memory and performance penalty compared to using float16 or bfloat16.++As TF32 is only usable by tensor cores, it can only be used for matrix multiplications and other ops implemented in terms of matrix multiplications, such as convolutions. It is not used for pointwise ops or reductions.++TF32 will benefit users who run float32 models on Ampere GPUs, so we need an API to allow these users to enable TF32. ++## Design Proposal++In TensorFlow, TF32 can be enabled for supported ops on Ampere GPUs with the following call:++```python+tf.config.allow_tensor_float_32_execution(True)+```++The word "allow" emphasizes only certain devices (Ampere GPUs) and ops (such as matmuls and convolutions) will be affected. Once enabled, all local and remote Ampere GPUs use TF32 for supported float32 ops.++Passing `False` to `allow_tensor_float_32_execution` will disable TF32 if already enabled.++We call the function "allow_tensor_float_32_execution" instead of the more concise "allow_tf32_execution" because people may mistakenly interpret the phrase "tf32" to refer to TensorFlow instead of TensorFloat. ++The following can be used to query whether TF32 is enabled. The function returns a bool.++```python+tf.config.tensor_float_32_execution_allowed()+```++Since TF32 only affects Ampere GPUs, moving an op to a GPU can affect numerics. Grappler and other graph optimizations will not consider this, and will freely move ops between devices without regard to numeric stability. As a result, explicitly putting an op on the CPU does not ensure it will use the full float32 precision instead of TF32.++Since TensorFlow 2.3 will not support CUDA 11, which is required for TF32, this API will first be exposed in TensorFlow 2.4. However, Google Cloud will likely cherrypick CUDA 11 and this API into their version of 2.3, so they can offer TF32 support to their customers who use TensorFlow 2.3.

Replacing "google cloud will likely" with "downstream repackagers of tensorflow (such as google cloud) are encouraged to" will make this read better

reedwm

comment created time in 6 days

Pull request review commenttensorflow/tensorflow

Update examples in docstring to use TF 2.x code

 def histogram_fixed_width_bins(values,    Examples: -  ```python-  # Bins will be:  (-inf, 1), [1, 2), [2, 3), [3, 4), [4, inf)-  nbins = 5-  value_range = [0.0, 5.0]-  new_values = [-1.0, 0.0, 1.5, 2.0, 5.0, 15]--  with tf.compat.v1.get_default_session() as sess:-    indices = tf.histogram_fixed_width_bins(new_values, value_range, nbins=5)-    variables.global_variables_initializer().run()-    sess.run(indices) # [0, 0, 1, 2, 4, 4]-  ```+  >>> # Bins will be:  (-inf, 1), [1, 2), [2, 3), [3, 4), [4, inf)+  ...+  >>> nbins = 5+  >>> value_range = [0.0, 5.0]+  >>> new_values = [-1.0, 0.0, 1.5, 2.0, 5.0, 15]+  >>> indices = tf.histogram_fixed_width_bins(new_values, value_range, nbins=5)+  >>> print(indices)

Apparently our internal linter is stupid and doesn't like print in comments (it warns about debug prints). Thankfully you can just replace this with indices and it'll work just the same.

Can you fix here and in the other print statements?

yongtang

comment created time in 6 days

pull request commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

That sounds good to me. Can you update the proposal?

On Wed, May 20, 2020 at 1:39 PM Sami Kama notifications@github.com wrote:

@samikama commented on this pull request.

In rfcs/20200505-transactional-fs.md https://github.com/tensorflow/community/pull/245#discussion_r428293780:

+With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve. + +## Design Proposal + +This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered. +Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient. + +Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope. + +### Extension to existing filesystem implementation + +Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions. + +```cpp +struct TransactionToken{

Something like

// Replace Env::Default()->(filesystems methods with filesystem// Env::Default()->CreateDir(dirname); FileSystem* FS=Env::Default()->StartTransactionForURI(fname); FS->CreateDir(dirname);//.....// Env::Default()->RenameFile(...) FS->RenameFile(....);//...// Env::Default()->DeleteDir(..) FS->DeleteDir(dirname); FS->EndTransaction();

Since with @mihaimaruseac https://github.com/mihaimaruseac said that C layer is always accessed through wrapper, Same logic can be implemented with plugins. But then the API should be extended a little bit so that users can continue an existing transaction.

This does not necessarily reduce the total lines of code that need to be changed but simplifies token tracing.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/245#discussion_r428293780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNU76R6A3MNJFUNUW3RSQ5YDANCNFSM4NAFXFCQ .

--

  • Alex
samikama

comment created time in 7 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{

I don't follow, how so?

samikama

comment created time in 7 days

issue commenttensorflow/tensorflow

[Feature Request] Dynamic RPC Address Resolution

@bramandia FYI

make1980

comment created time in 7 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{

That's a good point that the naive less intrusive approach I suggested won't work.

I don't have a rejoinder now, but I'll try to think of other non-intrusive or less intrusive approaches.

samikama

comment created time in 8 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{+  FileSystem* owner;+  void* id;+};+// C++ helper class for transaction scoping.+template <typename T>+class TokenScope{+  public:+  // Transaction name can be filename or directory name+    TokenScope(T* fs_,const string& transaction_name):fs(fs_){+      auto status=fs->StartTransaction(transaction_name,&token);+    }+    ~TokenScope(){+      fs->EndTransaction(token);+    }+    TokenScope(const TokenScope&) = delete;+    const std::unique_ptr<TransactionToken>* GetToken() const {return &token;}+  private:+    std::unique_ptr<TransactionToken> token;+    T* fs;+};+```++For a coarse granularity adding `StartTransaction` `EndTransaction` and `GetTransactionTokenForFile` methods will be sufficient. However this will prevent having multiple simultaneous transactions per file system and limit the flexibility. Thus we propose extending signature of each method with a unique pointer to `TransactionToken` structure, defaulting to `nullptr` for minimizing the impact on the existing code and allow incremental migration to implementation of transactions.++```cpp+class Filesystem {+  // Transaction Token API extensions+  virtual Status GetTransactionTokenForFile(const string& file_name,std::unique_ptr<TransactionToken>* token) = 0+  virtual Status StartTransaction(const string& transaction_name, std::unique_ptr<TransactionToken>* token) = 0;+  virtual Status EndTransaction(std::unique_ptr<TransactionToken>* token) = 0;++  // File creation+  virtual Status NewRandomAccessFile(const string& fname, std::unique_ptr<RandomAccessFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewWritableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewAppendableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewReadOnlyMemoryRegionFromFile(const string& fname, std::unique_ptr<ReadOnlyMemoryRegionFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Creating directories+  virtual Status CreateDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status RecursivelyCreateDir(const string& dirname), std::unique_ptr<TransactionToken>* token=nullptr;++  // Deleting+  virtual Status DeleteFile(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status DeleteDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status DeleteRecursively(const string& dirname, int64* undeleted_files, int64* undeleted_dirs, std::unique_ptr<TransactionToken>* token=nullptr);++  // Changing directory contents+  virtual Status RenameFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status CopyFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr);++  // Filesystem information+  virtual Status FileExists(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual bool FilesExist(const std::vector<string>& files, std::vector<Status>* status,std::unique_ptr<TransactionToken>* token=nullptr);+  virtual Status GetChildren(const string& dir, std::vector<string>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status Stat(const string& fname, FileStatistics* stat, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status IsDirectory(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr);+  virtual Status GetFileSize(const string& fname, uint64* file_size, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Globbing+  virtual Status GetMatchingPaths(const string& pattern, std::vector<string>* results, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Misc+  virtual void FlushCaches();+  virtual string TranslateName(const string& name) const;+};+```++Transaction token will be owned by the Filesystem and use of it after `EndTransaction` will be an invalid operation.++File classes can be modified to keep TransactionToken, assigned by the filesystem on their construction using given scope, or default scope if not given. Filesystems may ignore it if transaction at that level doesn't make sense.

If a file system is guaranteed to have no other users then transactions are meaningless; otherwise transactions are meaningful.

Like, even on my local disk if I write a model checkpoint with N files and another process goes and deletes file 1 while I'm writing file N I have an invalid checkpoint. Similarly if I start writing a checkpoint with N files and another process tries to read from my local FS and sees the half-written checkpoint it'll fail. Creating a no-op transaction token for my local filesystem would then let these issues go through as opposed to actually doing the atomic writes we will do in networked filesystems if we do have transactions.

I think my question was more about what the intent of the token is: is it "please try to do a transaction but don't bother if you can't" or is it "I want to be able to rely on these things being atomic and please tell me if something failed".

samikama

comment created time in 8 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{+  FileSystem* owner;+  void* id;+};+// C++ helper class for transaction scoping.+template <typename T>+class TokenScope{+  public:+  // Transaction name can be filename or directory name+    TokenScope(T* fs_,const string& transaction_name):fs(fs_){+      auto status=fs->StartTransaction(transaction_name,&token);+    }+    ~TokenScope(){+      fs->EndTransaction(token);+    }+    TokenScope(const TokenScope&) = delete;+    const std::unique_ptr<TransactionToken>* GetToken() const {return &token;}+  private:+    std::unique_ptr<TransactionToken> token;+    T* fs;+};+```++For a coarse granularity adding `StartTransaction` `EndTransaction` and `GetTransactionTokenForFile` methods will be sufficient. However this will prevent having multiple simultaneous transactions per file system and limit the flexibility. Thus we propose extending signature of each method with a unique pointer to `TransactionToken` structure, defaulting to `nullptr` for minimizing the impact on the existing code and allow incremental migration to implementation of transactions.++```cpp+class Filesystem {+  // Transaction Token API extensions+  virtual Status GetTransactionTokenForFile(const string& file_name,std::unique_ptr<TransactionToken>* token) = 0+  virtual Status StartTransaction(const string& transaction_name, std::unique_ptr<TransactionToken>* token) = 0;

std::unique_ptr<TransactionToken>* is a weird type (pointer to unique pointer to TransactionToken)

I think the signature should just be TransactionToken* and we should clearly separate methods which can create/destroy a TransactionToken from ones that just need a reference to it

samikama

comment created time in 12 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit

One question I have is how do we expect FSs with transactions to coexist with FSs without transactions.

Do we expect only user-written python code to be made aware of transactions, or do we also expect opkernel C++ code to be transaction-aware? I ask because we do a lot of IO inside opkernels today (summary writing, checkpointing, data reading, etc) and I'd like to understand whether we plan to modify all this to use transactions and how.

samikama

comment created time in 12 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{+  FileSystem* owner;+  void* id;+};+// C++ helper class for transaction scoping.+template <typename T>+class TokenScope{+  public:+  // Transaction name can be filename or directory name+    TokenScope(T* fs_,const string& transaction_name):fs(fs_){+      auto status=fs->StartTransaction(transaction_name,&token);+    }+    ~TokenScope(){+      fs->EndTransaction(token);+    }+    TokenScope(const TokenScope&) = delete;+    const std::unique_ptr<TransactionToken>* GetToken() const {return &token;}+  private:+    std::unique_ptr<TransactionToken> token;+    T* fs;+};+```++For a coarse granularity adding `StartTransaction` `EndTransaction` and `GetTransactionTokenForFile` methods will be sufficient. However this will prevent having multiple simultaneous transactions per file system and limit the flexibility. Thus we propose extending signature of each method with a unique pointer to `TransactionToken` structure, defaulting to `nullptr` for minimizing the impact on the existing code and allow incremental migration to implementation of transactions.++```cpp+class Filesystem {+  // Transaction Token API extensions+  virtual Status GetTransactionTokenForFile(const string& file_name,std::unique_ptr<TransactionToken>* token) = 0+  virtual Status StartTransaction(const string& transaction_name, std::unique_ptr<TransactionToken>* token) = 0;+  virtual Status EndTransaction(std::unique_ptr<TransactionToken>* token) = 0;++  // File creation+  virtual Status NewRandomAccessFile(const string& fname, std::unique_ptr<RandomAccessFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewWritableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewAppendableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewReadOnlyMemoryRegionFromFile(const string& fname, std::unique_ptr<ReadOnlyMemoryRegionFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Creating directories+  virtual Status CreateDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status RecursivelyCreateDir(const string& dirname), std::unique_ptr<TransactionToken>* token=nullptr;++  // Deleting+  virtual Status DeleteFile(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status DeleteDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status DeleteRecursively(const string& dirname, int64* undeleted_files, int64* undeleted_dirs, std::unique_ptr<TransactionToken>* token=nullptr);++  // Changing directory contents+  virtual Status RenameFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status CopyFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr);++  // Filesystem information+  virtual Status FileExists(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual bool FilesExist(const std::vector<string>& files, std::vector<Status>* status,std::unique_ptr<TransactionToken>* token=nullptr);+  virtual Status GetChildren(const string& dir, std::vector<string>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status Stat(const string& fname, FileStatistics* stat, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status IsDirectory(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr);+  virtual Status GetFileSize(const string& fname, uint64* file_size, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Globbing+  virtual Status GetMatchingPaths(const string& pattern, std::vector<string>* results, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Misc+  virtual void FlushCaches();+  virtual string TranslateName(const string& name) const;+};+```++Transaction token will be owned by the Filesystem and use of it after `EndTransaction` will be an invalid operation.++File classes can be modified to keep TransactionToken, assigned by the filesystem on their construction using given scope, or default scope if not given. Filesystems may ignore it if transaction at that level doesn't make sense.++```cpp+class RandomAccessFile {+  virtual Status Name(StringPiece* result) const;+  virtual Status Read(uint64 offset, size_t n, StringPiece* result, char* scratch) const = 0;+  private:+  TransactionToken token;+};++class WritableFile {+  virtual Status Name(StringPiece* result) const;+  virtual Status Append(StringPiece data) = 0;+  virtual Status Append(const absl::Cord& cord);+  virtual Status Tell(int64* position);+  virtual Status Close() = 0;+  virtual Status Flush() = 0;+  virtual Status Sync() = 0;+  private:+  TransactionToken token;+};++class ReadOnlyMemoryRegion {+  virtual const void* data() = 0;+  virtual uint64 length() = 0;+  private:+  TransactionToken token;+};+```++Then respective `Env` class methods needs to receive transaction tokens to relay on the file system. Arguments are defaulted to nullptr, indicating use of default transaction.+Transaction tokens should be taken from respective filesystems. Alternatively, they can be constructed with an `UNINITIALIZED` token and then respective filesystem can populate it.++```cpp+class Env {+  // Filesystem registration+  virtual Status GetFileSystemForFile(const string& fname, FileSystem** result);+  virtual Status GetRegisteredFileSystemSchemes(std::vector<string>* schemes);+  virtual Status RegisterFileSystem(const string& scheme, FileSystemRegistry::Factory factory);++  // Transaction Token related+  virtual Status GetTransactionTokenForFile(const string& file_name,std::unique_ptr<TransactionToken>* token) = 0+  virtual Status StartTransaction(const string& transaction_name, std::unique_ptr<TransactionToken>* token) = 0;+  virtual Status EndTransaction(std::unique_ptr<TransactionToken>* token) = 0;++  // Creating files, including memory mapped+  Status NewRandomAccessFile(const string& fname, std::unique_ptr<RandomAccessFile>* result, std::unique_ptr<TransactionToken>* token=nullptr);+  Status NewWritableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr);+  Status NewAppendableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr);+  Status NewReadOnlyMemoryRegionFromFile(const string& fname, std::unique_ptr<ReadOnlyMemoryRegionFile>* result, std::unique_ptr<TransactionToken>* token=nullptr);++  // Creating directories+  Status CreateDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr);+  Status RecursivelyCreateDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr);++  // Deleting+  Status DeleteFile(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr);+  Status DeleteDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr);+  Status DeleteRecursively(const string& dirname, int64* undeleted_files, int64* undeleted_dirs,std::unique_ptr<TransactionToken>* token=nullptr);++  // Changing directory contents+  Status RenameFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr);+  Status CopyFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr);++  // Filesystem information+  Status FileExists(const string& fname);+  bool FilesExist(const std::vector<string>& files, std::vector<Status>* status);+  Status GetChildren(const string& dir, std::vector<string>* result, std::unique_ptr<TransactionToken>* token=nullptr);+  Status Stat(const string& fname, FileStatistics* stat, std::unique_ptr<TransactionToken>* token=nullptr);+  Status IsDirectory(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr);+  Status GetFileSize(const string& fname, uint64* file_size, std::unique_ptr<TransactionToken>* token=nullptr);++  // Globbing+  virtual bool MatchPath(const string& path, const string& pattern, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status GetMatchingPaths(const string& pattern, std::vector<string>* results, std::unique_ptr<TransactionToken>* token=nullptr);++  // Misc+  Status FlushFileSystemCaches();+  string GetExecutablePath();+  virtual string GetRunfilesDir() = 0;+  bool LocalTempFilename(string* filename);+  bool CreateUniqueFileName(string* prefix, const string& suffix);+  virtual void GetLocalTempDirectories(std::vector<string>* list) = 0;+  static Env* Default();++  // Other methods of the class, not relevant here+};+```++Since `Env` resolves underlying filesytem from the URI, `StartTransaction` requires its argument to be similar to a URI that could be parsed to identify underlying file system.++For the new proposed filesystem plugin mechanism, two possible approaches exists. For `TF_RandomAccessFile`, `TF_WritableFile`, and `TF_ReadOnlyMemoryRegion` structures,++- Opaque pointers stay as is, thus no changes is needed in structures. Then each filesystem attach tokens to their own internal structures pointed by `void*`.

I prefer keeping the opaque pointers.

samikama

comment created time in 12 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{

One alternative, less intrusive, API modification is instead of passing transaction tokens to all FS methods optionally we can have a variant of GetFileSystemForFile which instead is like StartTransactionForFile which returns a FileSystem which, when dereferenced, commits the transaction. Implicitly then all operations made on this file system are a part of the transaction until committed.

WDYT?

samikama

comment created time in 12 days

Pull request review commenttensorflow/community

[RFC] Introducing Transactions extension to Modular Filesystems RFC

+# Transactional File Systems Support++| Status        | (Proposed / Accepted / Implemented / Obsolete)                                                |+| :------------ | :-------------------------------------------------------------------------------------------- |+| **RFC #**     | [NNN](https://github.com/tensorflow/community/pull/NNN) (update when you have community PR #) |+| **Author(s)** | Sami Kama (kamsami@amazon.com)                                                                |+| **Sponsor**   | Mihai Maruseac (mihaimaruseac@google.com)                                                     |+| **Updated**   | 2020-05-05                                                                                    |++## Objective++The aim of this RFC to extend filesystem access support to persistent storage that+provide transactional access and eventual consistency.++## Motivation++Current persistent storage implementation in Tensorflow relies on certain guarantees that+existing local file systems provide. But in the age of big data, local filesystems are not always+sufficient and/or prefetching the data to local system and then uploading after processing can be error-prone+and harder to implement for end users. Direct access to persistent storage that provides different gurantees is desirable.+Cloud storage solutions offered by Amazon, Google and others and databases can be examples of such persistent storages.+Moreover even though local file systems provide certain atomic-like transactions, they are on file level.+For the use cases like checkpointing, transactions are emulated through creating a temp directory, adding files to there and then+renaming/moving temp directory to final location. Such operations would also benefit from enhancements proposed in this RFC.++Transactions can also help with some filesystem access inconsistencies that can happen while reading and writing checkpoints. For example in while on thread reading files from a directory other may be modifying the underlying files. This could lead to reading an inconsistent or corrupt set of files by the reader. With transactions, each thread can have different transaction tokens and underlying file system can choose to postpone modification of files by redirecting them to a temporary location and then moving it in place when transactions end.++## User Benefit++With this extension proposal, users will have a more stable access to cloud storage systems as well as checkpointing redundancy can improve.++## Design Proposal++This RFC proposes to extend the [filesystem plugin rfc][filesystem_plugin] api with transaction markers. There can be different levels of transactions. First level can be global transactions, user starts a transaction scope. Any operations done within this scope will be grouped in this transaction. This is easiest to implement but have drawbacks such as only one transaction can be present at a time and different type of transactions may not always be reordered.+Alternative to this approach is having multiple transaction scopes. User can create a transaction token and pass it to filesystem operations for plugin to differentiate among independent transactions. This token can be per file or per directory level granularity. Even though per file would give most flexibility, per directory level of transaction detail is most likely sufficient.++Filesystem plugins may choose to ignore the transaction scopes or can delay the operations until the termination of transaction scope.++### Extension to existing filesystem implementation++Existing filesystem C++ api can easily be expanded by addition of three methods, an opaque structure and possibly a helper class to support transactions.++```cpp+struct TransactionToken{+  FileSystem* owner;+  void* id;+};+// C++ helper class for transaction scoping.+template <typename T>+class TokenScope{+  public:+  // Transaction name can be filename or directory name+    TokenScope(T* fs_,const string& transaction_name):fs(fs_){+      auto status=fs->StartTransaction(transaction_name,&token);+    }+    ~TokenScope(){+      fs->EndTransaction(token);+    }+    TokenScope(const TokenScope&) = delete;+    const std::unique_ptr<TransactionToken>* GetToken() const {return &token;}+  private:+    std::unique_ptr<TransactionToken> token;+    T* fs;+};+```++For a coarse granularity adding `StartTransaction` `EndTransaction` and `GetTransactionTokenForFile` methods will be sufficient. However this will prevent having multiple simultaneous transactions per file system and limit the flexibility. Thus we propose extending signature of each method with a unique pointer to `TransactionToken` structure, defaulting to `nullptr` for minimizing the impact on the existing code and allow incremental migration to implementation of transactions.++```cpp+class Filesystem {+  // Transaction Token API extensions+  virtual Status GetTransactionTokenForFile(const string& file_name,std::unique_ptr<TransactionToken>* token) = 0+  virtual Status StartTransaction(const string& transaction_name, std::unique_ptr<TransactionToken>* token) = 0;+  virtual Status EndTransaction(std::unique_ptr<TransactionToken>* token) = 0;++  // File creation+  virtual Status NewRandomAccessFile(const string& fname, std::unique_ptr<RandomAccessFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewWritableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewAppendableFile(const string& fname, std::unique_ptr<WritableFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status NewReadOnlyMemoryRegionFromFile(const string& fname, std::unique_ptr<ReadOnlyMemoryRegionFile>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Creating directories+  virtual Status CreateDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status RecursivelyCreateDir(const string& dirname), std::unique_ptr<TransactionToken>* token=nullptr;++  // Deleting+  virtual Status DeleteFile(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status DeleteDir(const string& dirname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status DeleteRecursively(const string& dirname, int64* undeleted_files, int64* undeleted_dirs, std::unique_ptr<TransactionToken>* token=nullptr);++  // Changing directory contents+  virtual Status RenameFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status CopyFile(const string& src, const string& target, std::unique_ptr<TransactionToken>* token=nullptr);++  // Filesystem information+  virtual Status FileExists(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual bool FilesExist(const std::vector<string>& files, std::vector<Status>* status,std::unique_ptr<TransactionToken>* token=nullptr);+  virtual Status GetChildren(const string& dir, std::vector<string>* result, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status Stat(const string& fname, FileStatistics* stat, std::unique_ptr<TransactionToken>* token=nullptr) = 0;+  virtual Status IsDirectory(const string& fname, std::unique_ptr<TransactionToken>* token=nullptr);+  virtual Status GetFileSize(const string& fname, uint64* file_size, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Globbing+  virtual Status GetMatchingPaths(const string& pattern, std::vector<string>* results, std::unique_ptr<TransactionToken>* token=nullptr) = 0;++  // Misc+  virtual void FlushCaches();+  virtual string TranslateName(const string& name) const;+};+```++Transaction token will be owned by the Filesystem and use of it after `EndTransaction` will be an invalid operation.++File classes can be modified to keep TransactionToken, assigned by the filesystem on their construction using given scope, or default scope if not given. Filesystems may ignore it if transaction at that level doesn't make sense.

I don't think saying that filesystems may ignore transaction tokes is very useful as this makes it impossible for code to work correctly on FSs with and without transaction tokens.

samikama

comment created time in 12 days

issue commenttensorflow/tensorflow

Tensorflow 2.0 Preview - tf.function error when wrapping a function that works

+Dan Moldovan mdan@google.com have you seen this before?

On Wed, May 13, 2020 at 4:50 PM bhack notifications@github.com wrote:

@alextp https://github.com/alextp Do you have tested if using @tf_function for a function that has internally partial it will go to trace also the function inside partial? Cause seems that it is not working. See tensorflow/addons#1830 (comment) https://github.com/tensorflow/addons/pull/1830#issuecomment-628284713

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/26091#issuecomment-628304623, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRIKATT2LWEFNXVW6HLRRMW4VANCNFSM4GZ74AYA .

--

  • Alex
botev

comment created time in 13 days

pull request commenttensorflow/tensorflow

Add gelu

That is used for the implementation selector. cc @qlzh727

WindQAQ

comment created time in 15 days

Pull request review commenttensorflow/community

Procedure and RFC template for Addons migrations

+# Migrate XXXXX from TensorFlow Addons to TensorFlow Core++| Status      | Proposed (Waiting for Sponsor)                                                                                           |+| :---------- | :------------------------------------------------------------------------------------------------- |+| **RFC #**   | TBD after PR |                                       |+| **Authors** | XXXXXXXXXX   |+| **Sponsor** | XXXXXXXXXX   |+| **Updated** | YYYY-MM-DD   |+| **Sponsorship Deadline** | YYYY-MM-DD (45 Days after submission)  |++## Rationale for Migration+* What are the use cases for the addon?+* OSS usage, H5 Index, etc.++## Historical Information+* Have there been signifiant issues reported to Addons that need to be adressed?+* When was it implemented in Addons?++## Implementation Details+* Link to implementation in Addons:+* Does this include custom-op kernels?+    * Are they CPU/GPU/TPU compatible?+* What is the pytest coverage of the addon?++## Changes to Implementation (If Needed)+```+Code snippet(s) showing proposed code changes+```+* Discussion on the rationale for changes++## Transition Plan+* Proposed landing place in tf-core+* Deprecation plan for Addons+    * Will we be able to alias the core implementation? (parameters must be exact match)

I'd like to leave the core implementation the freedom to drop parameters that are not frequently used, and rename other parameters which might have better names.

seanpmorgan

comment created time in 19 days

pull request commenttensorflow/tensorflow

Fixed recompute grad issue - no memory savings

@allenlavoie the comment by @pidajay makes me think we should delete that test. WDYT?

pidajay

comment created time in 19 days

Pull request review commenttensorflow/tensorflow

Fixed recompute grad issue - no memory savings

+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from tensorflow.keras import layers, optimizers+++def _get_big_cnn_model(img_dim, n_channels, num_partitions,+                       blocks_per_partition):+  """Creates a test model whose activations are significantly larger than model size."""+  model = tf.keras.Sequential()+  model.add(layers.Input(shape=(img_dim, img_dim, n_channels)))+  for _ in range(num_partitions):+    for _ in range(blocks_per_partition):+      model.add(layers.Conv2D(10, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(40, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(20, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+  model.add(layers.Flatten())+  model.add(layers.Dense(32, activation=tf.nn.relu))+  model.add(layers.Dense(10))+  return model+++def _get_split_cnn_model(img_dim, n_channels, num_partitions,+                         blocks_per_partition):+  """Creates a test model that is split into `num_partitions` smaller models"""+  models = [tf.keras.Sequential() for _ in range(num_partitions)]+  models[0].add(layers.Input(shape=(img_dim, img_dim, n_channels)))+  for i in range(num_partitions):+    model = models[i]+    if i > 0:+      last_shape = models[i - 1].layers[-1].output_shape+      model.add(layers.Input(shape=last_shape[1:]))+    for _ in range(blocks_per_partition):+      model.add(layers.Conv2D(10, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(40, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(20, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+  models[-1].add(layers.Flatten())+  models[-1].add(layers.Dense(32, activation=tf.nn.relu))+  models[-1].add(layers.Dense(10))+  return models+++def _compute_loss(logits, labels):+  return tf.reduce_mean(+      tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,+                                                     labels=labels))+++def _limit_gpu_memory():+  """Helper function to limit GPU memory for testing  """+  gpus = tf.config.experimental.list_physical_devices('GPU')+  if gpus:+    try:+      tf.config.experimental.set_virtual_device_configuration(+          gpus[0], [+              tf.config.experimental.VirtualDeviceConfiguration(+                  memory_limit=1024)+          ])+    except RuntimeError as e:+      print(e)+++def _get_dummy_data(img_dim, n_channels, batch_size):+  inputs = tf.ones([batch_size, img_dim, img_dim, n_channels])+  labels = tf.ones([batch_size], dtype=tf.int64)+  return inputs, labels+++def _train_no_recompute(n_steps):+  """Trains a single large model without gradient checkpointing."""+  _limit_gpu_memory()+  img_dim, n_channels, batch_size = 256, 1, 4+  x, y = _get_dummy_data(img_dim, n_channels, batch_size)+  model = _get_big_cnn_model(img_dim,+                             n_channels,+                             num_partitions=3,+                             blocks_per_partition=2)+  optimizer = optimizers.SGD()+  losses = []+  tr_vars = model.trainable_variables+  for _ in range(n_steps):+    with tf.GradientTape() as tape:+      logits = model(x)+      loss = _compute_loss(logits, y)+      losses.append(loss)+    grads = tape.gradient(loss, tr_vars)  # tr_vars+    optimizer.apply_gradients(zip(grads, tr_vars))+    del grads+  return losses+++def _train_with_recompute(n_steps):+  """Trains a single large model with gradient checkpointing using tf.recompute_grad."""+  _limit_gpu_memory()+  img_dim, n_channels, batch_size = 256, 1, 4+  x, y = _get_dummy_data(img_dim, n_channels, batch_size)+  # This model is the same model as _get_big_cnn_model but split into 3 parts.+  models = _get_split_cnn_model(img_dim,+                                n_channels,+                                num_partitions=3,+                                blocks_per_partition=2)+  model1, model2, model3 = models+  # Apply gradient checkpointing to the submodels using tf.recompute_grad.+  model1_re = tf.recompute_grad(model1)+  model2_re = tf.recompute_grad(model2)+  model3_re = tf.recompute_grad(model3)+  optimizer = optimizers.SGD()+  tr_vars = model1.trainable_variables + model2.trainable_variables + model3.trainable_variables+  losses = []+  for _ in range(n_steps):+    with tf.GradientTape() as tape:+      logits1 = model1_re(x)+      logits2 = model2_re(logits1)+      logits3 = model3_re(logits2)+      loss = _compute_loss(logits3, y)+      losses.append(loss)+      grads = tape.gradient(loss, tr_vars)  # tr_vars+      optimizer.apply_gradients(zip(grads, tr_vars))+      del grads+  return losses+++class GradientCheckpointTest(tf.test.TestCase):++  def test_raises_oom_exception(self):+    with self.assertRaises(Exception) as context:+      _train_no_recompute(1)+    self.assertTrue(+        context.exception.__class__.__name__ == 'ResourceExhaustedError')

Ah I missed it. This is good then. Thanks!

pidajay

comment created time in 20 days

Pull request review commenttensorflow/tensorflow

Fixed recompute grad issue - no memory savings

+# Copyright 2020 The TensorFlow Authors. All Rights Reserved.+#+# Licensed under the Apache License, Version 2.0 (the "License");+# you may not use this file except in compliance with the License.+# You may obtain a copy of the License at+#+#     http://www.apache.org/licenses/LICENSE-2.0+#+# Unless required by applicable law or agreed to in writing, software+# distributed under the License is distributed on an "AS IS" BASIS,+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+# See the License for the specific language governing permissions and+# limitations under the License.+# ==============================================================================+from __future__ import absolute_import+from __future__ import division+from __future__ import print_function++import tensorflow as tf+from tensorflow.keras import layers, optimizers+++def _get_big_cnn_model(img_dim, n_channels, num_partitions,+                       blocks_per_partition):+  """Creates a test model whose activations are significantly larger than model size."""+  model = tf.keras.Sequential()+  model.add(layers.Input(shape=(img_dim, img_dim, n_channels)))+  for _ in range(num_partitions):+    for _ in range(blocks_per_partition):+      model.add(layers.Conv2D(10, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(40, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(20, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+  model.add(layers.Flatten())+  model.add(layers.Dense(32, activation=tf.nn.relu))+  model.add(layers.Dense(10))+  return model+++def _get_split_cnn_model(img_dim, n_channels, num_partitions,+                         blocks_per_partition):+  """Creates a test model that is split into `num_partitions` smaller models"""+  models = [tf.keras.Sequential() for _ in range(num_partitions)]+  models[0].add(layers.Input(shape=(img_dim, img_dim, n_channels)))+  for i in range(num_partitions):+    model = models[i]+    if i > 0:+      last_shape = models[i - 1].layers[-1].output_shape+      model.add(layers.Input(shape=last_shape[1:]))+    for _ in range(blocks_per_partition):+      model.add(layers.Conv2D(10, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(40, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+      model.add(layers.Conv2D(20, 5, padding='same', activation=tf.nn.relu))+      model.add(layers.MaxPooling2D((1, 1), padding='same'))+  models[-1].add(layers.Flatten())+  models[-1].add(layers.Dense(32, activation=tf.nn.relu))+  models[-1].add(layers.Dense(10))+  return models+++def _compute_loss(logits, labels):+  return tf.reduce_mean(+      tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,+                                                     labels=labels))+++def _limit_gpu_memory():+  """Helper function to limit GPU memory for testing  """+  gpus = tf.config.experimental.list_physical_devices('GPU')+  if gpus:+    try:+      tf.config.experimental.set_virtual_device_configuration(+          gpus[0], [+              tf.config.experimental.VirtualDeviceConfiguration(+                  memory_limit=1024)+          ])+    except RuntimeError as e:+      print(e)+++def _get_dummy_data(img_dim, n_channels, batch_size):+  inputs = tf.ones([batch_size, img_dim, img_dim, n_channels])+  labels = tf.ones([batch_size], dtype=tf.int64)+  return inputs, labels+++def _train_no_recompute(n_steps):+  """Trains a single large model without gradient checkpointing."""+  _limit_gpu_memory()+  img_dim, n_channels, batch_size = 256, 1, 4+  x, y = _get_dummy_data(img_dim, n_channels, batch_size)+  model = _get_big_cnn_model(img_dim,+                             n_channels,+                             num_partitions=3,+                             blocks_per_partition=2)+  optimizer = optimizers.SGD()+  losses = []+  tr_vars = model.trainable_variables+  for _ in range(n_steps):+    with tf.GradientTape() as tape:+      logits = model(x)+      loss = _compute_loss(logits, y)+      losses.append(loss)+    grads = tape.gradient(loss, tr_vars)  # tr_vars+    optimizer.apply_gradients(zip(grads, tr_vars))+    del grads+  return losses+++def _train_with_recompute(n_steps):+  """Trains a single large model with gradient checkpointing using tf.recompute_grad."""+  _limit_gpu_memory()+  img_dim, n_channels, batch_size = 256, 1, 4+  x, y = _get_dummy_data(img_dim, n_channels, batch_size)+  # This model is the same model as _get_big_cnn_model but split into 3 parts.+  models = _get_split_cnn_model(img_dim,+                                n_channels,+                                num_partitions=3,+                                blocks_per_partition=2)+  model1, model2, model3 = models+  # Apply gradient checkpointing to the submodels using tf.recompute_grad.+  model1_re = tf.recompute_grad(model1)+  model2_re = tf.recompute_grad(model2)+  model3_re = tf.recompute_grad(model3)+  optimizer = optimizers.SGD()+  tr_vars = model1.trainable_variables + model2.trainable_variables + model3.trainable_variables+  losses = []+  for _ in range(n_steps):+    with tf.GradientTape() as tape:+      logits1 = model1_re(x)+      logits2 = model2_re(logits1)+      logits3 = model3_re(logits2)+      loss = _compute_loss(logits3, y)+      losses.append(loss)+      grads = tape.gradient(loss, tr_vars)  # tr_vars+      optimizer.apply_gradients(zip(grads, tr_vars))+      del grads+  return losses+++class GradientCheckpointTest(tf.test.TestCase):++  def test_raises_oom_exception(self):+    with self.assertRaises(Exception) as context:+      _train_no_recompute(1)+    self.assertTrue(+        context.exception.__class__.__name__ == 'ResourceExhaustedError')

I am scared of tests that check for actual OOMs as they are sensitive to things like what specific device it's run on, etc.

I see two options here:

  1. use a virtual GPU device with fixed limited memory
  2. (my preferred alternative) use side effects to detect whether things are recomputed. For example if the forward pass does variable.assign_add(1) we can count how many times it ran
pidajay

comment created time in 20 days

Pull request review commenttensorflow/community

Procedure and RFC template for Addons migrations

+# Migration From TF-Addons To TensorFlow Core++### In-Progress & Previous Migrations:+https://github.com/tensorflow/addons/projects/2/++### Process +1. Create an issue in TensorFlow Addons for a candidate that you think should be +migrated. +2. The SIG will evaluate the request and add it to the `Potential Candidates` section +of our GitHub project.+3. If it's agreed that a migration makes sense, an RFC needs to be written to discuss +the move with a larger community audience. +    * If the transition will impact tf-core and Keras then submit the RFC to +    [TensorFlow Community](https://github.com/tensorflow/community)+    * Additions which only subclass Keras APIs should submit their migration proposals to +    [Keras Governance](https://github.com/keras-team/governance)+    +4. A sponsor from the TF/Keras team must agree to shepard the transition.+   * If no sponsor is obtained after 45 days the RFC will be rejected and will remain +   as part of Addons.+5. If a sponsor is obtained, and the RFC is approved, a pull request must move the +addon along with proper tests.+6. After merging, the addition will be replaced with an alias to the core function +if possible. If an alias is not possible (e.g. large parameter changes), then a deprecation +warning will be added and will be removed from TFA after 2 releases. +++### Criteria for Migration

...and someone on the TensorFlow team agrees to take over maintenance of the API

seanpmorgan

comment created time in 21 days

Pull request review commenttensorflow/community

Procedure and RFC template for Addons migrations

+# Migration From TF-Addons To TensorFlow Core++### In-Progress & Previous Migrations:+https://github.com/tensorflow/addons/projects/2/++### Process +1. Create an issue in TensorFlow Addons for a candidate that you think should be +migrated. +2. The SIG will evaluate the request and add it to the `Potential Candidates` section +of our GitHub project.+3. If it's agreed that a migration makes sense, an RFC needs to be written to discuss +the move with a larger community audience. +    * If the transition will impact tf-core and Keras then submit the RFC to +    [TensorFlow Community](https://github.com/tensorflow/community)+    * Additions which only subclass Keras APIs should submit their migration proposals to +    [Keras Governance](https://github.com/keras-team/governance)+    +4. A sponsor from the TF/Keras team must agree to shepard the transition.

shepherd the transition and take over maintenance of the new API

seanpmorgan

comment created time in 21 days

pull request commenttensorflow/tensorflow

Provide NVIDIA CUDA build data in metadata and API

Oh nevermind it is a public namespace.

On Tue, May 5, 2020 at 11:27 AM Alexandre Passos apassos@google.com wrote:

I don't thinkin tf.sysconfig is a public namespace.

On Tue, May 5, 2020 at 11:06 AM Jason Zaman notifications@github.com wrote:

@angerson https://github.com/angerson @alextp https://github.com/alextp @gunan https://github.com/gunan These new API calls should be under tf.sysconfig.foo not under tf.config.foo

The other build-time type things are already under tf.sysconfig eg tf.sysconfig.get_include() or tf.sysconfig.get_compile_flags()

Theres also tf.test.is_built_with_cuda() I think those should move under tf.sysconfig too but they are already released so they'd have to stay as aliases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/38964#issuecomment-624216895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMENSKBS5WPSV2PKJLRQBIRPANCNFSM4MSLY43A .

--

  • Alex

--

  • Alex
angerson

comment created time in 22 days

pull request commenttensorflow/tensorflow

Provide NVIDIA CUDA build data in metadata and API

I don't thinkin tf.sysconfig is a public namespace.

On Tue, May 5, 2020 at 11:06 AM Jason Zaman notifications@github.com wrote:

@angerson https://github.com/angerson @alextp https://github.com/alextp @gunan https://github.com/gunan These new API calls should be under tf.sysconfig.foo not under tf.config.foo

The other build-time type things are already under tf.sysconfig eg tf.sysconfig.get_include() or tf.sysconfig.get_compile_flags()

Theres also tf.test.is_built_with_cuda() I think those should move under tf.sysconfig too but they are already released so they'd have to stay as aliases.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/38964#issuecomment-624216895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMENSKBS5WPSV2PKJLRQBIRPANCNFSM4MSLY43A .

--

  • Alex
angerson

comment created time in 22 days

pull request commenttensorflow/tensorflow

Fix invalid shape issue in random.uniform

It should be safe to disable this test internally.

yongtang

comment created time in 22 days

issue commenttensorflow/tensorflow

C API Release

@gunan do you know what happened to libtensorflow_framework.so?

eaplatanios

comment created time in 22 days

pull request commenttensorflow/tensorflow

Fixed recompute grad issue - no memory savings

Thanks! Please keep me updated on your progress, and post here if you get stuck.

On Mon, May 4, 2020 at 2:39 PM pidajay notifications@github.com wrote:

@alextp https://github.com/alextp . Thanks for your comments and appreciate your patience, especially since this is my first PR to TF core :). I understand API change is a hassle. The hatchet was definitely required in TF2.1 where there was no variable_watcher and tape.watch was storing all the intermediate outputs while tape.stop_recording would not give me the trainable variables. It is quite possible this may not be the case anymore with variable_watcher. I will try to explore ways to avoid this API change and build a more solid case. Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/39042#issuecomment-623722086, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNGMDRWGIKABU5GF6TRP4YZLANCNFSM4MUEMYXQ .

--

  • Alex
pidajay

comment created time in 23 days

pull request commenttensorflow/community

RFC: Sparse Domain Isolation for Supporting large-scale Sparse Weights Training.

+1 to Yuefeng's suggestion.

Can this proposal be enhanced with a section discussing such extension?

On Mon, May 4, 2020 at 12:14 PM Yuefeng Zhou notifications@github.com wrote:

I think TensorFlow can provide a way to extend optimizers so that you can extend existing optimizers to handle your sparse weights.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/237#issuecomment-623651955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPX63755ZBUA6BC5ODRP4HXRANCNFSM4MQXNN6A .

--

  • Alex
rhdong

comment created time in 23 days

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses+    return loss++  def _get_gradients(self, loss, variables, tape):+    # Can be overridden to use jacobian, etc.+    return tape.gradient(loss, variables)++  def _transform_unaggregated_gradients(self, grads_and_vars):+    # Can be overridden in subclasses+    return grads_and_vars++  def _aggregate_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.aggregate_gradients_fn:+      grads_and_vars = self.aggregate_gradients_fn(+         grads_and_vars)+    return grads_and_vars++  def _transform_gradients(self, grads_and_vars):+    # Can still be overridden in subclasses if needed+    if self.transform_gradients_fns:+      for fn in self.transform_gradients_fns:+        grads_and_vars = fn(grads_and_vars)+    return grads_and_vars+   +  def _apply_updates(self, distribution, grads_and_vars, ...):+    # Calls _resource_apply_{dense | sparse}+    # Variable updating math is still in _resource_apply_{dense | sparse}+  +  def minimize(self, loss, variables, tape=None):+    grads_and_vars = self.compute_gradients(loss, variables, tape)+    self.apply_gradients(grads_and_vars)++  def compute_gradients(+      self,+      loss,+      variables,+      tape=None,+      all_reduce_sum_gradients=False):

backwards compatibility I think

omalleyt12

comment created time in 23 days

Pull request review commenttensorflow/community

RFC: Easily Customizable Optimizer.minimize

+# Easily Customizable `Optimizer.minimize`+++| Status        | Proposed       |+:-------------- |:---------------------------------------------------- |+| **RFC #**     | [234](https://github.com/tensorflow/community/pull/234) |+| **Author(s)** | [omalleyt12@](https://github.com/omalleyt12) |+| **Sponsor**   | apassos@, fchollet@, karmel@                 |+| **Updated**   | 2020-04-20                                           |++## Objective++Create an `Optimizer` API that gives `Optimizer` subclasses full control of gradient updates. The API should ensure `Optimizer`s can be accessed via a unified API, and will not leak abstractions. Training loops should not be required to know the internal details of how the `Optimizer` chooses to:++* Scale losses and gradients++* Aggregate gradients++* Clip gradients++* etc++We also need to ensure we maintain endpoints with maximum flexibility for those users who do want control over these items.++By creating this API, it will enable users to write training loops that are interoperable with a wide range of Optimizers.++Specific use cases considered:++* Gradient clipping++* Mixed precision++* `Horovod`++## Background++During backpropagation, there are 6 possible actions that can be taken when starting from a loss Tensor and ending with a Variable update:++(1) Transform the loss++(2) Calculate the gradients++(3) Transform the unaggregated (per-device) gradients++(4) Aggregate the gradients (across devices)++(5) Transform the aggregated gradients++(6) Apply a variable update based on the gradients++We currently have three Optimizer endpoints that start at different points in this process:++* `Optimizer.minimize` - handles 1-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=True)` - handles 4-6++* `Optimizer.apply_gradients(..., experimental_aggregate_gradients=False)` - handles 6++However, there is no easy way for Optimizer subclasses to support custom logic in these steps. This proposal suggests a refactoring of the Optimizer class to achieve these goals.+++## Motivation++This section discusses the experience of supporting mixed-precision and Horovod in Keras’s built-in training logic (hereafter called Model.fit).++Keras now allows users to write custom training logic for their `Model`s via overriding `Model.train_step`: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L538). The default implementation of this method is 8 lines long, and fully supports all types of `Model`s, `loss`es, `metric`s, etc that Keras supports. It attempts to serve as a reference that users can copy / paste to start writing their own training logic.++The only remaining pain point is the call to `_minimize` here: [code](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training.py#L1873). This logic is necessary because details of whether an `Optimizer` needs to transform the loss, clip the gradients, perform custom aggregation, etc have leaked into the main training loop code.++Despite the complexity of `_minimize`, it covers only a small subset of possible optimization logic. Keras continues to receive valid requests to support more custom optimization logic (including adding hooks for different aggregation methods, different methods of loss reduction, etc). To continue expanding support for these items, Keras needs to rely on a unified API that keeps `Optimizer` implementation details from leaking into the main training loop code.++The proposal below shows how this can be accomplished, and the examples section shows how this can be applied to 3 use cases: gradient clipping, mixed precision, and `Horovod`.++### Custom training loops:++The logic above also applies to custom training loops. The design should allow custom training loops to be written so that they work with any `Optimizer`.+++## User Benefit++This design will allow users to write full-featured training loops that work for all `Optimizer`s. This design will allow users to easily perform custom gradient clipping and other transformations.++## Design Proposal++`Optimizer` class:++```python+class Optimizer(object):+  def __init__(self,+               transform_gradients=None,+               aggregate_gradients=all_reduce_sum):+     self.aggregate_gradients_fn = aggregate_gradients+     self.transform_gradients_fns = transform_gradients_fns++  def _transform_loss(self, loss):+    # Can be overridden in subclasses

That's why I prefer that these things are constructor arguments instead of methods to be overridden.

omalleyt12

comment created time in 23 days

pull request commenttensorflow/community

RFC: Sparse Domain Isolation for Supporting large-scale Sparse Weights Training.

The change to the existing SparseApply* kernels which removes Ref(T) from the signature is backwards incompatible and can't be done.

Adding new kernels for the hash apply is fine, though.

I do wonder if we need the Optimizer method _apply_dense_hash or whether we can use a separate optimizer-like class which knows about the hash application. This has the advantage that it naturally covers the use cases where people want different optimizers for the really sparse embedding layers (which I think is relatively common).

On Mon, May 4, 2020 at 10:17 AM rhdong notifications@github.com wrote:

That's a very interesting proposal.

From a high level view (and I'm probably wrong) it looks like it proposes a new type of variable and a new type of optimizer which can update that variable. Given that this is the case I think we can implement this in addons or some other SIG package as long as there are APIs in core TF to ensure that this variable can declare itself checkpointable, be tracked by something like tf.Module / keras.Model (so you can do model.trainable_sparse_variables), and maybe be automatically watched via the gradient tape.

Can you expand the document to clarify the details of these changes to existing parts of TF as opposed to most of the content which is on the new types?

Thanks!

Thank Alex, In fact, My initial idea was to encapsulate a some kind of ResourceVariable backed Hashtable, as we know TF is not good at training any non tf.Variable. I reuse lookup.MutableHashTable because I don't like to write a new hash lib in TF , especially, lookup.XX support checkpointable and deployable on tf.distribute.Server. Here is the compare based on v1.15.2 shows that the range of core effected by the RFC:

https://github.com/tensorflow/tensorflow/compare/v1.15.2...rhdong:rfc?expand=1

The main changes:

  1. supporting the random initiallizer on lookup.MutableHashTable.Find
  2. Four stateful optimizers(Adagrad, Adam, FTRL, Momentum) adaptation.(Maybe cancelled in new schema)

Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/237#issuecomment-623592882, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRMN3Q36LC7PKGV6URDRP32D7ANCNFSM4MQXNN6A .

--

  • Alex
rhdong

comment created time in 23 days

pull request commenttensorflow/community

Add C++ style guide.

@martinwicke let's merge this PR

aaudiber

comment created time in 23 days

issue commenttensorflow/tensorflow

Rank-k cholesky up/downdates

@rmlarsen for FYI

stratisMarkou

comment created time in 23 days

issue commenttensorflow/tensorflow

Error while trying to use tf.broadcast_weights

I think we need to remove the reference to tf.broadcast_weights from the documentation.

MeghnaNatraj

comment created time in 23 days

Pull request review commenttensorflow/tensorflow

Fix BatchNormalization issue with virtual_batch_size when shape has None

 def call(self, inputs, training=None):     if self.virtual_batch_size is not None:       # Virtual batches (aka ghost batches) can be simulated by reshaping the       # Tensor and reusing the existing batch norm implementation-      original_shape = [-1] + inputs.shape.as_list()[1:]+      original_shape = [+          d if d is not None else -1 for d in inputs.shape.as_list()]

This is not super safe as it can generate more than one -1.

Isn't it better to use tf.shape instead?

yongtang

comment created time in 23 days

pull request commenttensorflow/tensorflow

Fixed recompute grad issue - no memory savings

You're proposing a change to TF's public API and for us to review it you should be able to write documentation specifying how this change needs to be used and to write a unit test that exercises this change.

I will say though that I do not like the API change, and I also don't understand why replacing the variable watcher with an explicit list of variables solves the problem. Can't we fix the problem by fixing the variable watcher without any API changes?

pidajay

comment created time in 23 days

issue closedtensorflow/tensorflow

Creating multiple stacked variable within tf.map_fn

<em>Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template</em>

System information

  • TensorFlow version (you are using): 1.10
  • Are you willing to contribute it (Yes/No): No

Describe the feature and the current behavior/state. Currently, I am trying to create multiple variables within tf.map_fn but it just creates a single variable copied multiple times.

Will this change the current api? How? Maybe

Who will benefit with this feature? Projects related to multi-object tracking

Any Other info.

closed time in 23 days

ammarabbasali

issue commenttensorflow/tensorflow

Creating multiple stacked variable within tf.map_fn

Please ask this type of question on StackOverflow as it's not a TF bug.

ammarabbasali

comment created time in 23 days

issue closedtensorflow/tensorflow

No Upsampling2D layer in Tensorflow C++

I am Rahul, an M.Sc. in Digital Engineering student at University of Magdeburg, Germany. I am currently pursuing my master thesis and it involves developing a U-Net (a neural network designed for biomedical image segmentation, you can read more about it here : https://arxiv.org/pdf/1505.04597.pdf). And deploying/loading it in a C++ Application. The U-Net has a couple of layers namely Concatenate, UpSampling2D which are not common in Image processing.

To deploy it I am guessing I need all the layers used U-Net available in Tensorflow C++ here : https://www.tensorflow.org/api_docs/cc/group/nn-ops

I can't find Upsampling2D there.

Am I looking at the wrong location?

I would like to know if loading the protobuf and predicting using it is independent of the structure and layers used in the model I developed or it only supports standard layers such as flatten, convolution, maxpool etc….

closed time in 23 days

mortalrahu

issue commenttensorflow/tensorflow

No Upsampling2D layer in Tensorflow C++

Upsampling2d is just tf.image.resize; it's implemented using a bunch of different ops depending on what type of resizing you want to do (see the code in https://github.com/tensorflow/tensorflow/blob/fadfcdf27ba3973525b7ef24f5d2cb10a13231bd/tensorflow/python/ops/image_ops_impl.py#L1316 ); those individual ops are all available through the TF C++ API.

We do recommend though that you use python to build your model and C++ to execute it, as taking gradients and doing optimization in the TF C++ API directly doesn't work very well right now.

mortalrahu

comment created time in 23 days

issue closedtensorflow/tensorflow

Why there are serveral topkv2 ops

<em>Please make sure that this is an issue related to performance of TensorFlow. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:performance_template</em>

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):ubuntu18.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary):binary
  • TensorFlow version (use command below):tensorflow2.1
  • Python version:3.6.8
  • Bazel version (if compiling from source):
  • GCC/Compiler version (if compiling from source):
  • CUDA/cuDNN version:10.1
  • GPU model and memory:

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with:

  1. TF 1.0: python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"
  2. TF 2.0: python -c "import tensorflow as tf; print(tf.version.GIT_VERSION, tf.version.VERSION)"

Describe the current behavior Screenshot from 2020-04-17 18-27-35 Describe the expected behavior Only one topkV2 Standalone code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. If possible, please share a link to Colab/Jupyter/any notebook.

https://github.com/google/automl/blob/028789605f1f140b00c045f77be2c4e13638d17c/efficientdet/det_model_fn.py#L313

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

trace.json

closed time in 23 days

fsx950223

issue commenttensorflow/tensorflow

Why there are serveral topkv2 ops

Why do you expect to see only one operation in the timeline? Estimator will run the graph in a loop and each execution will show up on the timeline.

fsx950223

comment created time in 23 days

issue commenttensorflow/tensorflow

[TF 2.0] tf.hessians

That creates a new tape in every loop iteration, and so only computes the hessian wrt the last iteration. Instead put the whole loop inside a single tf.gradienttape.

On Mon, May 4, 2020 at 3:46 AM Korbinian Kottmann notifications@github.com wrote:

If my forward pass is too big to be done in one pass but needs to be serialized in batches (as is the case for big training sets), can I put the GradientTape inside a (for) loop like this?

import tensorflow as tf

v = tf.Variable([1,1],dtype="float32") ws = tf.constant([[0,0],[1,1],[2,2]], dtype="float32") grad = tf.cast(0,dtype="float32")

for w in ws: with tf.GradientTape(persistent=True) as tape: tape.watch(v) y = tf.sin(tf.tensordot(ws,v,axes=1)) grad += tape.gradient(y, v)

hessian = tape.jacobian(grad, v)

Or is it that in each iteration in the loop the previous tape is overwritten? Reversing for- and with- statements makes it very slow.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/29781#issuecomment-623392052, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPPFQNXJGVBMJTIVATRP2MI3ANCNFSM4HYEVW3A .

--

  • Alex
mukeshmithrakumar

comment created time in 23 days

pull request commenttensorflow/tensorflow

[Intel MKL] Automatic BFloat16 converter

I definitely think a whitelist is safer here. Can we just create the whitelist by listing all the ops for which we have kernels today?

On Fri, May 1, 2020 at 6:09 PM Reed notifications@github.com wrote:

I think using IsMklOp API is an extendible way to adding new Mkl ops to auto_mixed_precision pass than to rely on white/black lists.

I implemented a prototype where we add bfloat16-supported ops to the graylist here https://github.com/reedwm/tensorflow/commit/cb6f4728f0338c68dc9628adc7e13c07742ce2a0. The change will add an op to the graylist if either IsMklOp returns true or if the op kernel supports bfloat16. I implemented this by copying code from this PR. The change can easily be modified to instead add the ops to the whitelist, which would unconditionally convert them to bfloat16 unless they are on the gray/black/clearlist.

However, I am nervous about this change as it seems dangerous. Someone in the future may add a bfloat16 kernel to an op, which would then change the grappler pass to change that op to bfloat16, potentially breaking models. For example, suppose L2Loss originally only supported float32, so we wouldn't bother adding it to the blacklist. If someone then added bfloat16 support, it would cause the grappler pass to convert every L2Loss to bfloat16, breaking many models. If we instead stick with the original plan of hardcoded white/gray/black/clearlists, this issue does not occur.

Also, there may be ops we haven't considered which should be on the blacklist/graylist/clearlist, so defaulting them to the whitelist is dangerous. For example, IdentityN is not on any list, but putting it on the whitelist is probably a mistake, as it should probably be in float32 if it follows an L2Loss op. I can add IdentityN to the graylist manually, but there will be likely be other ops that should be in float32 that neither of us have considered. With the original plan of hardcoded lists, an op will be float32 if its not on any list (which is basically equivalent to being on the blacklist).

Adding bfloat16-supported ops to the graylist instead of the whitelist is a lot better but still has these problems to an extent.

What are your thoughts? I think we should stick with the plan of hardcoded lists.

For BatchNorm issue - I am looking into it. I've recently seen this issue internally also.

Thanks for looking into this!

— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/34504#issuecomment-622646830, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRJMMMAVYHHCAIWN6L3RPNXFFANCNFSM4JQKA4RQ .

--

  • Alex
nhasabni

comment created time in 23 days

issue closedtensorflow/tensorflow

Passing tf.keras.Model as tf.function argument does not create concrete function

System information

  • Have I written custom code: Yes
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 2.1.0
  • Python version: 3.7.5

Describe the current behavior

Passing a tf.keras.Model or tf.keras.Optimizer as argument into tf.function does not create a concrete function. I expect that it would, since function tracing works as it should if the model/optimizer is a global variable.

Standalone code to reproduce the issue

import tensorflow as tf

class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()

    def call(self, inputs):
        return 2 * inputs

@tf.function
def step_model(model, inputs):
    return model(inputs)

@tf.function
def step(inputs):
    return model(inputs)

inputs = tf.convert_to_tensor(1, dtype=tf.float32)
model = MyModel()
# This works as expected
print(f"step() = {step(inputs)}") # 2.0
print(f"step() concrete functions: {step._list_all_concrete_functions_for_serialization()}") # [<tensorflow.python.eager.function.ConcreteFunction object at 0x13a2c0510>]
# This does not, no concrete function is saved
print(f"step_model() = {step_model(model, inputs)}") # 2.0
print(f"step_model() concrete functions: {step_model._list_all_concrete_functions_for_serialization()}") # []

Output:

step() = 2.0
step() concrete functions: [<tensorflow.python.eager.function.ConcreteFunction object at 0x133788410>]
step_model() = 2.0
step_model() concrete functions: []

It appears that passing a tf.keras.Model as an argument into tf.function is not supported, as tracing fails. In a different use case, this error appears:

INFO:tensorflow:Unsupported signature for serialization: ((<tensorflow.python.framework.func_graph.UnknownArgument object at 0x13a2c0510>))

My use case requires limiting usage of global variables, since there are several models running simultaneously and they need to be garbage collected efficiently. How can I pass a model as a function argument into a tf.function?

closed time in a month

jarednielsen

issue commenttensorflow/tensorflow

Passing tf.keras.Model as tf.function argument does not create concrete function

This is a known issue. The problem is that we cannot introspect inside a model and generate a single TF graph which is agnostic to the details of the model (which is what it would mean to serialize a concrete function which can take the model as an argument), mostly because we cannot represent function pointers in tf graph now.

Instead I recommend you do something like

def get_model_fn(model):
  @tf.function
  def fn(data):
    return model(data)
  return fn

and then call get_model_fn on all the models you want, to ensure the model gets properly captured.

jarednielsen

comment created time in a month

issue closedtensorflow/tensorflow

tf.one_hot should support strings

System information

  • TensorFlow version: 2.2.0rc3

Describe the feature and the current behavior/state.

I'm sure this has been asked before, but I can't find any issue related. tf.one_hot should support string tensors for encoding string labels. Currently, passing a string tensor gives:

>>> tf.one_hot(tf.constant(["a", "b", "c"]), depth=3)
---------------------------------------------------------------------------
NotFoundError                             Traceback (most recent call last)
<ipython-input-19-f9e0ea7fe045> in <module>
----> 1 tf.one_hot(tf.constant(["a", "b", "c"]), depth=3)

~/anaconda3/envs/ml/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py in wrapper(*args, **kwargs)
    178     """Call target, and fall back on dispatchers if there is a TypeError."""
    179     try:
--> 180       return target(*args, **kwargs)
    181     except (TypeError, ValueError):
    182       # Note: convert_to_eager_tensor currently raises a ValueError, not a

~/anaconda3/envs/ml/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py in one_hot(indices, depth, on_value, off_value, axis, dtype, name)
   4008 
   4009     return gen_array_ops.one_hot(indices, depth, on_value, off_value, axis,
-> 4010                                  name)
   4011 
   4012 

~/anaconda3/envs/ml/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py in one_hot(indices, depth, on_value, off_value, axis, name)
   6191         pass  # Add nodes to the TensorFlow graph.
   6192     except _core._NotOkStatusException as e:
-> 6193       _ops.raise_from_not_ok_status(e, name)
   6194   # Add nodes to the TensorFlow graph.
   6195   if axis is None:

~/anaconda3/envs/ml/lib/python3.6/site-packages/tensorflow/python/framework/ops.py in raise_from_not_ok_status(e, name)
   6651   message = e.message + (" name: " + name if name is not None else "")
   6652   # pylint: disable=protected-access
-> 6653   six.raise_from(core._status_to_exception(e.code, message), None)
   6654   # pylint: enable=protected-access
   6655 

~/anaconda3/envs/ml/lib/python3.6/site-packages/six.py in raise_from(value, from_value)

NotFoundError: Could not find valid device for node.

This leads to clunky practices of encoding labels outside of TF before one_hot can be used, which is problematic in graph mode:

from sklearn.preprocessing import LabelEncoder

classes = tf.constant(["a", "b", "c"]).numpy()  # problematic with graph mode
le = LabelEncoder().fit(classes)

>>> tf.one_hot(le.transform(classes), depth=3)
<tf.Tensor: shape=(3, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]], dtype=float32)>

Can tf.one_hot be adapted to support string input with a vocabulary given a priori?

Will this change the current api? How? It will add type support, but shouldn't change the API.

Who will benefit with this feature? ML developers.

closed time in a month

tgsmith61591

issue commenttensorflow/tensorflow

tf.one_hot should support strings

Can't this be already done as a composition of a vocabulary (maybe implemented using tf.lookup) and one_hot?

tgsmith61591

comment created time in a month

issue commenttensorflow/tensorflow

Increase consitency of tensorShape

(tf.shape(r) is always a tensor, and then your code would work)

cmarlin

comment created time in a month

issue closedtensorflow/tensorflow

Increase consitency of tensorShape

<em>Please make sure that this is a feature request. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:feature_template</em>

System information

  • TensorFlow version : 2.2.0rc3 python API
  • Are you willing to contribute it : No (not enough experience of tensorflow architecture)

Describe the feature and the current behavior/state. Using where I met an unexpecting behaviour on tensor shape testing, behaviour with tensorshape differs to tensor.

r = tf.random.uniform([2,3,4], 0, 4, dtype=tf.int32) print(tf.where(r==0, 0, 1).shape) # (2,3,4) => ok print(tf.where(r.shape==3, 0, 1).shape) () => expecting (3,)

Will this change the current api? How? Sure, but consistency will increase. Tensorshape could be considered as a kind of tensor itself

Who will benefit with this feature?

  1. community: code will be clearer to read, the concept is the same for tensorshape & tensor
  2. optimisation ?: tensorshape could be processed by tensor graph instead of python object evaluation

closed time in a month

cmarlin

issue commenttensorflow/tensorflow

Increase consitency of tensorShape

r.shape is not a tensor, it's a python object, so when you do r.shape==3 you get a python boolean back, which in turns gets turned into a scalar by tf.where.

cmarlin

comment created time in a month

pull request commenttensorflow/community

RFC: Sparse Domain Isolation for Supporting large-scale Sparse Weights Training.

It requires changes to core that we should discuss now. From my point of view the most important feature tf core can offer here is allowing experimentation and development of this type of problem (for which there is very high demand at least in industry) to happen without needing to involve tf core.

Separately from that I think the design of the actual components here has many interesting parts, and a fairly close version of these components to what is proposed here should be in core, but I think it's more important now that we make core properly extensible than that we debate the details of this component.

On Thu, Apr 30, 2020 at 10:56 AM Bairen Yi notifications@github.com wrote:

@byronyi https://github.com/byronyi If we are going to contribute to addon first, do we need a RFC here?

I guess the design was originally targeted to TF core.

As @alextp https://github.com/alextp said, if part of it still requires changes to TF core, then we still need a (probably smaller) RFC here.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/community/pull/237#issuecomment-622008574, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRI2TS4DEYRY3FU2IBDRPG3TXANCNFSM4MQXNN6A .

--

  • Alex
rhdong

comment created time in a month

Pull request review commenttensorflow/tensorflow

Fixed recompute grad issue - no memory savings

 def internal_grad_fn(unused_op, *result_grads):  # pylint: disable=unused-variab  def _eager_mode_decorator(f, args, kwargs):   """Implement custom gradient decorator for eager mode."""-  with tape_lib.VariableWatcher() as variable_watcher:-    result, grad_fn = f(*args, **kwargs)++  trainable_vars = []+  if 'trainable_variables' in kwargs:

This is an implicit API change where you now ask users to explicitly pass trainable_variables to the decorated function.

Can you also edit the documentation? Can you add tests that show behavior which is broken without this explicit hatchet? Can you clarify what's the plan for code that creates variables when it's called?

pidajay

comment created time in a month

issue commenttensorflow/tensorflow

There are no control inputs between 'Assign' and 'read' nodes

No, not really. Control dependencies only apply to the scope of a single session.run, and if all reads depended on the initialization you'd reinitialize the variable on every call to session.run, not just the first one, so you'd never be able to update any variable.

On Wed, Apr 29, 2020 at 10:50 PM Lianshui Zhao notifications@github.com wrote:

@alextp https://github.com/alextp Thanks. But aren't they supposed to be ordered? Variables should be initialized by Assign before being read. Right?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/38902#issuecomment-621626737, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRJC4FPBE5XYAFSO4R3RPEGQ7ANCNFSM4MRAMLOA .

--

  • Alex
zhao1157

comment created time in a month

pull request commenttensorflow/tensorflow

Add gelu

A PR with python ops SGTM!

On Wed, Apr 29, 2020 at 12:38 PM Tzu-Wei Sung notifications@github.com wrote:

I am also willing to contribute C++ ops along with XLA kernel if there are some complaints about performance/speed, but for now, maybe python ops is the final decision on both end of addons and core. Please correct me if I misread anything. Thanks again for everyone's suggestion.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-621415285, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHROIMWAJHBW2SVN4NYLRPB64VANCNFSM4JIKFKMA .

--

  • Alex
WindQAQ

comment created time in a month

issue closedtensorflow/tensorflow

AttributeError: 'Tensor' object has no attribute '_in_graph_mode'

I am having an error: 'Tensor' object has no attribute '_in_graph_mode'. I've debugged the code, and I think it's in this GradientTape function, but I don't know why. If anyone knows, please help me! :)

System information

  • TensorFlow version: 2.0 - '2.2.0-dev20200407'
  • OS Platform and Distribution: Linux Mint
  • Python version: Python 3.7.4

Describe the current behavior I am trying to minimize a function using opt = tf.keras.optimizers.Adam() and I am getting a TypeError when I apply opt.apply_gradients.

Standalone code to reproduce the issue


def explain(
      self,
      validation_data,
      model,
      class_index,
      layer_name=None,
      colormap=cv2.COLORMAP_VIRIDIS,
      image_weight=0.7,
      _grid=True
  ):


# Returns: numpy.ndarray: Grid of all the inverted image or 4D array (batch_size, height, width, channels)
     
      tf.executing_eagerly()
      images, _ = validation_data

      if layer_name is None:
          layer_name = self.infer_target_layer(model)
      
      inverted_image = InvertedImage.get_optimize_image(
          images, model, class_index, layer_name
      )

      if _grid:
          return grid_display(inverted_image)
      else:
          return inverted_image

@staticmethod
def infer_target_layer(model):
 
     # Returns: str: Name of the target layer

      for layer in reversed(model.layers):
          # Select closest 4D layer to the end of the network.
          if len(layer.output_shape) == 4 and layer.name.count('conv') > 0:
              return layer.name

      raise ValueError(
          "Model does not seem to contain 4D layer. Inverted image cannot be applied."
      )

@tf.function
def get_optimize_image(images, model, class_index, layer_name):
 
      grad_model = tf.keras.models.Model(
          [model.inputs], [model.get_layer(layer_name).output]
      )

      opt = tf.keras.optimizers.SGD(learning_rate=1-4, momentum=0.9)
      dtype = model.get_layer(layer_name).output.dtype
      tensor_image = tf.convert_to_tensor(images)

      opt_img = tf.Variable(1e-1 * tf.random.normal((tensor_image.shape[0], tensor_image.shape[1], tensor_image.shape[2], tensor_image.shape[3])), trainable=True)

      steps = 50
      for i in range(steps):
           with tf.GradientTape() as tape:
              
              inverted_feature = tf.cast(opt_img, dtype)
              content_feature = tf.cast(images, dtype)
                  
              conv_inverted_outputs = grad_model(inverted_feature)
              conv_content_outputs = grad_model(content_feature)
          
              loss = InvertedImage.get_loss(conv_content_outputs, conv_inverted_outputs, content_feature, inverted_feature)
              #print("Initial loss: {:.3f}".format(loss))

          grad = tape.gradient(loss, [conv_inverted_outputs, conv_content_outputs])
          print(grad)
          processed_grads = [g for g in grad]
          opt.apply_gradients(zip(processed_grads, [conv_inverted_outputs, conv_content_outputs]))

return opt_img

Loss function

def get_loss(conv_content_outputs, conv_inverted_outputs, content_feature, inverted_feature):
        euclidian = tf.norm(conv_content_outputs - conv_inverted_outputs, ord='euclidean') / tf.norm(conv_content_outputs, ord='euclidean')
        reg_alpha = 1e-7 * tf.math.reduce_sum(tf.norm(inverted_feature, ord=6))
        total_variation = 1e-8 * tf.math.reduce_sum(tf.image.total_variation(content_feature+inverted_feature))

        return euclidian + reg_alpha + total_variation 

Traceback

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/helena/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/__main__.py", line 45, in <module>
    cli.main()
  File "/home/helena/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/../ptvsd/server/cli.py", line 361, in main
    run()
  File "/home/helena/.vscode/extensions/ms-python.python-2020.2.64397/pythonFiles/lib/python/new_ptvsd/wheels/ptvsd/../ptvsd/server/cli.py", line 203, in run_file
    runpy.run_path(options.target, run_name="__main__")
  File "/usr/local/lib/python3.7/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.7/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/helena/Documents/LAR_Celesc/lar-computer-vision/objdet-api/test_inverted_image.py", line 20, in <module>
    data, model, class_index=tabby_cat_class_index, layer_name="block5_conv3"
  File "/home/helena/Documents/LAR_Celesc/lar-computer-vision/objdet-api/tf_explain/core/inverted_image.py", line 54, in explain
    images, model, class_index, layer_name
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 568, in __call__
    result = self._call(*args, **kwds)
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call
    self._initialize(args, kwds, add_initializers_to=initializers)
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize
    *args, **kwds))
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected
    graph_function, _, _ = self._maybe_define_function(args, kwargs)
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function
    capture_by_value=self._capture_by_value),
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn
    return weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/framework/func_graph.py", line 968, in wrapper
    raise e.ag_error_metadata.to_exception(e)
AttributeError: in converted code:
    /home/helena/Documents/LAR_Celesc/lar-computer-vision/objdet-api/tf_explain/core/inverted_image.py:125 get_optimize_image  *
        opt.apply_gradients(grads_and_vars)
    /home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:434 apply_gradients
        self._create_slots(var_list)
    /home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/gradient_descent.py:100 _create_slots
        self.add_slot(var, "momentum")
    /home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:574 add_slot
        var_key = _var_key(var)
    /home/helena/Documents/LAR_Celesc/larenv/lib/python3.7/site-packages/tensorflow_core/python/keras/optimizer_v2/optimizer_v2.py:1065 _var_key
        if var._in_graph_mode:

    AttributeError: 'Tensor' object has no attribute '_in_graph_mode'

closed time in a month

helenabdr

issue commenttensorflow/tensorflow

AttributeError: 'Tensor' object has no attribute '_in_graph_mode'

This is a well-documented limitation where you're not allowed to unconditionally create new keras models (including new variables) inside a tf.function.

helenabdr

comment created time in a month

pull request commenttensorflow/addons

Added gaussian_blur_op

Ah, sorry, I misread the thread.

@rmlarsen do you know who understands our depthwise conv kernels enough to answer this?

ghosalsattam

comment created time in a month

issue commenttensorflow/tensorflow

Tf.reshape: allow different reshape order (e.g. column-wise)

Once you decide what columns you want to reorder you can reorder those columns with transpose and then reshape to get the shape you want.

If you give names to your original and intended axes the sequence of reshapes and transposes will become clear.

For example if you start with an image as n c h w i and you want to get n iw c h you can do a reshape to coalesce i and w and then transpose them into place. If you want to go instead to n ih c w you can transpose to put i and h together, reshape, and then transpose again.

On Wed, Apr 29, 2020 at 10:14 AM Nikolai10 notifications@github.com wrote:

@alextp https://github.com/alextp thanks, however, this does not solve the issue - e.g. consider three (seq_len=3) image patches:

simulate batch of image patches

x = np.arange(3 * 3 * 8).reshape(3, 3, 4, 2, 1)

reshape/ concat image patches along axis 3

x_new = tf.reshape(x, (3, 4, 6, 1))

Visualize first batch sample (current behaviour):

tf.Tensor( [[ 0 1 2 3 4 5] [ 6 7 8 9 10 11] [12 13 14 15 16 17] [18 19 20 21 22 23]], shape=(4, 6), dtype=int64)

using tf.transpose either changes the output dimension (case 1) or rearranges the pixel values within the image patch:

Case 1:

x_new_t = tf.transpose(x_new) print(x_new_t[0, :, :2, 0]) array([[ 0, 6], [ 1, 7], [ 2, 8], [ 3, 9], [ 4, 10], [ 5, 11]])

print(x_new_t.shape) TensorShape([1, 6, 4, 3])

Case 2:

x_new_t2 = tf.transpose(x_new, (0, 1, 2, 3)) print(x_new_t2.shape) (3, 4, 6, 1)

print(x_new_t2[0, :, :2, 0]) tf.Tensor( [[ 0 1] [ 6 7] [12 13] [18 19]], shape=(4, 2), dtype=int64)

Please reopen the issue. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/39017#issuecomment-621346420, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRJMRS2TEQOAIYOZC2TRPBN7VANCNFSM4MTRPUEQ .

--

  • Alex
Nikolai10

comment created time in a month

pull request commenttensorflow/addons

Added gaussian_blur_op

@ghosalsattam and how do those compare with the hand-written kernel times? And are you excluding warm-up costs (measuring warm-up costs for both approaches independently would also be nice)

ghosalsattam

comment created time in a month

issue commenttensorflow/addons

Dropping the custom ops for activation functions

On Wed, Apr 29, 2020 at 9:37 AM bhack notifications@github.com wrote:

I wonder if this is easier to resolve in a SIG meeting?

Thanks Alex. I don't know if you or someone else in your team could join on the next SIG meeting it would be very useful.

I'm asking around internally to be invited to the SIG meeting.

It has drawbacks, such as retracting and recompiling on shape changes In the current status can you tell us something more about using

experimental_relax_shapes=True? Could it be a good compromise for smaller functions?

experimental_relax_shapes=True will help remove overhead of function retracing, but not currently of xla compilation.

Note also that there are (still internal and not fully staffed, so there hasn't been a design review yet) plans to improve our code generation soon to be able to generate kernels which are truly shape-dynamic (which is comparatively easy for activation functions since they vectorize elementwise) avoiding the xla overheads.

I head something about this in the MLIR channels traffic.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/addons/issues/1752#issuecomment-621326702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRL4TGPLJYJK7N6EME3RPBJWJANCNFSM4MSYOJVQ .

--

  • Alex
gabrieldemarmiesse

comment created time in a month

issue commenttensorflow/addons

Dropping the custom ops for activation functions

I wonder if this is easier to resolve in a SIG meeting?

From where I sit, having hand-written kernels in TF is turning into a maintenance problem. Hand-written kernels for ops need hand-written gradient registrations (and potentially hand-written gradient kernels which also need gradients and the other things in this list), tf2xla bridges, tflite converter rules, pfor converters (for tf.vectorized_map), hand-written kernels for other platforms (say, amd gpu, which maintains its owmn fork of TF with its own hand-written kernels), and much more.

So we on the TF team have been thinking fairly hard about how can we keep the performance of fused hand-written kernels for things like activation functions without having to do this ever-growing list of manual work since there are so many consumers of the TF op set other than just the TF executor itself. One solution which I am partial to is tf.function with experimental_compile=True. It has drawbacks, such as retracting and recompiling on shape changes, but it also has the advantage of not requiring all this manual work. Given that full set of batch sizes seen during the execution of a model is fairly bounded, I wondered if for activation functions this one-time (or few-times) tracing and compilation overhead would be an acceptable tradeoff over having to hand-write all these converters and consumers (specially for gelu, which will be used on bert, which will get deployed on all sorts of ways, requiring all this manual work to happen).

It might be that it's a bad idea, and that we should go with a hand-written kernel for now. I am perfectly fine with that.

Note also that there are (still internal and not fully staffed, so there hasn't been a design review yet) plans to improve our code generation soon to be able to generate kernels which are truly shape-dynamic (which is comparatively easy for activation functions since they vectorize elementwise) avoiding the xla overheads.

So all I want to know is what do you think the tradoff is between all the manual work of tflite converters, tfxla converters, gradient registrations, pfor converters, etc, vs the performance cost of retracing, for this particular activation function.

gabrieldemarmiesse

comment created time in a month

issue closedtensorflow/tensorflow

Tf.reshape: allow different reshape order (e.g. column-wise)

System information

  • TensorFlow version (you are using): '2.1.0'/ latest
  • Are you willing to contribute it (Yes/No): Yes

Describe the feature and the current behavior/state.

Currently, tf.reshape returns a new tf.Tensor(tensor, shape, name=None) that has the same values as tensor in the same order, except with a new shape given by shape (see tf.reshape).

However, in some cases it would be beneficial to adjust the ordering; e.g. np.reshape allows ‘C’, ‘F’, ‘A’ orders - I am in particular interested in column-wise order.

For example: Given a tf.Tensor with multiple image patches (denoted by seq_len) of shape h, w, c: (batch_size, seq_len, h, w, c). I would like to concat/ reshape these image patches horizontally along axis 3 that result in shape (batch_size, h, w * seq_len, c), without rearranging the pixel values within the image patches. In the fully convolutional setting, where seq_len varies, I can not iterate over unknown axis dimension seq_len in order to achieve this goal by e.g. using tf.split/ tf.concat (as far as I understand). Please see example below for more details.

Will this change the current api? How?

Yes - tf.reshape requires another optional parameter, e.g. 'order'.

Who will benefit with this feature?

Everyone.

Any Other info.

The following example shall demonstrate the desired behaviour of tf.reshape:

# simulate batch of image patches
x = np.arange(3 * 2 * 8).reshape(3, 2, 4, 2, 1)

# reshape/ concat image patches along axis 3
x_new = tf.reshape(x, (3, 4, 4, 1))

Visualize first batch sample (current behaviour)

print(x_new[0, :, :, 0])
<tf.Tensor: shape=(4, 4), dtype=int64, numpy=
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])>

Visualize first batch sample (desired behaviour)

<tf.Tensor: shape=(4, 4), dtype=int64, numpy=
array([[ 0, 4, 8, 12],
       [ 1, 5,  9, 13],
       [ 2, 6, 10, 14],
       [ 3, 7, 11, 15]])>

Please correct me if I missed something. Thanks.

closed time in a month

Nikolai10

issue commenttensorflow/tensorflow

Tf.reshape: allow different reshape order (e.g. column-wise)

It seems like you want tf.transpose+tf.reshape instead of purely tf.reshape. You can also use tf.einsum to have a saner syntax for writing tf.transpose.

Nikolai10

comment created time in a month

issue closedtensorflow/tensorflow

Random uniform is inconsistent given same seed values

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I was trying to use one of example usages from tensorflow docs and wanted to make results reproducible. After investigation it looks to me that gen_random_ops.random_uniform is inconsistent given seeds. That is why I have written a test ( custom code)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Darwin-18.0.0-x86_64-i386-64bit
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: No
  • TensorFlow installed from : pip install tensorflow
  • TensorFlow version : v2.1.0-rc2-17-ge5bf8de410 2.1.0
  • Python version: python version: 3.7.6
  • Bazel version (if compiling from source): brew installation
  • GCC/Compiler version (if compiling from source): -
  • CUDA/cuDNN version: -
  • GPU model and memory: -

Describe the current behavior gen_random_ops.random_uniform return different values even with same seeds Describe the expected behavior consistent behaviour is expected give same seeds

Standalone code to reproduce the issue

shape = (8,12)
dtype = 'float32'
seed = 5
seed2 = 1234
tf.random.set_seed(seed)
rnd = gen_random_ops.random_uniform(shape, dtype, seed=seed, seed2=seed2)
rnd2 = gen_random_ops.random_uniform(shape, dtype, seed=seed, seed2=seed2)
# rnd and rnd2 will be different

Other info / logs code does not throw exceptions. Just expected behaviour is different from results.

From random_seed.py I can see that probably it is expected behaviour but I don't understand why is that. This counter that keeps changing seed by default is a way to avoid generation of same values? But usually for tests we need reproducibility. Does it mean that It should be done through re-setting of the tf.random.set_seed(1234)?

closed time in a month

deil87

issue commenttensorflow/tensorflow

Random uniform is inconsistent given same seed values

Yes this is an undesirable property of the legacy stateful random uniform ops. Use either the generator API or the stateless random ops for sane behavior.

deil87

comment created time in a month

issue commenttensorflow/tensorflow

AttributeError: 'Tensor' object has no attribute '_in_graph_mode'

The problem is in the following line of code:

          opt.apply_gradients(zip(processed_grads, [conv_inverted_outputs, conv_content_outputs]))

Here conv_inverted_inputs and conv_content_outputs are not tf.Variable, and TF cannot apply optimizer updates on things which are not tf.Variables.

That said the error message could be better. @tanzhenyu can you work on the error message, by maybe checking for tensors earlier?

helenabdr

comment created time in a month

issue commenttensorflow/tensorflow

gradient of `einsum` is incorrect for complex numbers

@bloops are you aware of this issue?

Krastanov

comment created time in a month

issue commenttensorflow/tensorflow

There are no control inputs between 'Assign' and 'read' nodes

The default assign and the default read are unordered. If you want ordered reads and writes you have to add them yourself with the control dependencies you need.

On Tue, Apr 28, 2020 at 6:21 PM Lianshui Zhao notifications@github.com wrote:

@alextp https://github.com/alextp Can you clarify it a little more? When building a model, we usually don't specify the execution order of Assign and read since they are added by TF internally, and TF does not add control dependencies between them either. So how does TF decide which one should execute first? Hope you can comment on that a little. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/38902#issuecomment-620937829, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPWXKISGAIIUNRVEULRO56I3ANCNFSM4MRAMLOA .

--

  • Alex
zhao1157

comment created time in a month

Pull request review commenttensorflow/tensorflow

Provide NVIDIA CUDA build data in metadata and API

 def disable_mlir_bridge(): def disable_mlir_graph_optimization():   """Disables experimental MLIR-Based TensorFlow Compiler Optimizations."""   context.context().enable_mlir_graph_optimization = False+++@tf_export('config.get_cuda_version_used_to_compile_tf')+def get_cuda_version_used_to_compile_tf():

Why not a config.get_build_info that returns a dictionary with string keys representing all relevant build info (including cuda and cudnn versions)?

angerson

comment created time in a month

pull request commenttensorflow/tensorflow

Provide NVIDIA CUDA build data in metadata and API

@gunan I don't see any API changes

angerson

comment created time in a month

issue closedtensorflow/tensorflow

How to convert TenorFlow model with functional ops to graph definition with no Tensorflow version installed?

In TensorFlow V2, functional control ops such if/while/for/case are introduced, if the model contains such functional control ops, how to convert them to graph definition? In TensorFlow, there are some methods to transmit original node names correctly in file. InstantiateFunction - AddDefaultAttrs - helper.BuildNodeOutputIndex - helper.InstantiateNode But in some other circustances if there is no TensorFlow installed, how to convert TenorFlow model with functional ops to graph definition?

closed time in a month

gitgetcode

issue commenttensorflow/tensorflow

How to convert TenorFlow model with functional ops to graph definition with no Tensorflow version installed?

It's a fairly complicated graph lowering step, and we currently don't have a standalone tool to do that.

gitgetcode

comment created time in a month

issue closedtensorflow/tensorflow

There are no control inputs between 'Assign' and 'read' nodes

I thought there should be some dependencies between 'Assign' and 'read' nodes, so that 'read' only executes after 'Assign' is done. But through the following toy example, this seems to be not the case:

import tensorflow as tf
a = tf.get_variable('a', shape = (2,3))

print ('op_name', 'control_inputs', 'input_ops', 'output[0]_shape')
for op in tf.get_default_graph().get_operations():
  print (op.name, op.control_inputs, [inp.op.name for inp in op.inputs], op.outputs[0].shape)

output:

op_name control_inputs input_ops output[0]_shape
a/Initializer/random_uniform/shape [] [] (2,)
a/Initializer/random_uniform/min [] [] ()
a/Initializer/random_uniform/max [] [] ()
a/Initializer/random_uniform/RandomUniform [] ['a/Initializer/random_uniform/shape'] (2, 3)
a/Initializer/random_uniform/sub [] ['a/Initializer/random_uniform/max', 'a/Initializer/random_uniform/min'] ()
a/Initializer/random_uniform/mul [] ['a/Initializer/random_uniform/RandomUniform', 'a/Initializer/random_uniform/sub'] (2, 3)
a/Initializer/random_uniform [] ['a/Initializer/random_uniform/mul', 'a/Initializer/random_uniform/min'] (2, 3)
a [] [] (2, 3)
a/Assign [] ['a', 'a/Initializer/random_uniform'] (2, 3)
a/read [] ['a'] (2, 3)

So a/Assign and a/read nodes have no dependencies, and may execute in any order. Is there supposed to be any?

closed time in a month

zhao1157

issue commenttensorflow/tensorflow

There are no control inputs between 'Assign' and 'read' nodes

TF1 doesn't add any dependencies you didn't add yourself. It's up to you to decide what order you want stuff to run in.

zhao1157

comment created time in a month

issue commenttensorflow/tensorflow

Allow name argument of tf.name_scope to be a tf.string

Creating a graph that looks like loop(0), loop(1), ..., loop(n) as opposed to one that looks like for i in range(n): loop(i)

On Mon, Apr 27, 2020 at 7:48 PM Sumanth Ratna notifications@github.com wrote:

Can you explain what you mean by statically unrolling the graph?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/38773#issuecomment-620345338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRNPXAZJ4RDCAFIIBUTROY7WDANCNFSM4MNV3WLQ .

--

  • Alex
sumanthratna

comment created time in a month

pull request commenttensorflow/tensorflow

[ROCm] Gelu op

To merge this we need the kernels for higher order derivatives. Without running a profiler though I can't tell why the backward step is slow. Can you tun the TPU profiler and post screenshots of a trace here?

ekuznetsov139

comment created time in a month

issue commenttensorflow/tensorflow

Allow name argument of tf.name_scope to be a tf.string

Use "for i in range", not "for i in tf.range" to use a python for loop and statically unroll the graph.

sumanthratna

comment created time in a month

issue closedtensorflow/tensorflow

Gradients for tf.py_function with mixed arguments

System information

  • Have I written custom code: yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Mint 19.3
  • TensorFlow installed from (source or binary): conda binary
  • TensorFlow version (use command below): 2.1
  • Python version: 3.7

Describe the current behavior gradient calculation throws an error if a py_function is used that has both integer and floating point inputs/outputs.

Describe the expected behavior Gradients with respect to all integers should be zero/None, and others should be correctly calculated.

Standalone code to reproduce the issue

import tensorflow as tf

def pf(x, y):
    return x ** 2, y ** 2

def pyf(x, y):
    return tf.py_function(pf, [x, y], [tf.int32, tf.float32])

x = tf.constant(5)
v = tf.Variable(0.5)
with tf.GradientTape() as tape:
    y, m = pyf(x, v)
    z = tf.cast(y, tf.float32) * m

print(tape.gradient(z, v))

When calling pf, gradient computation works, but for pyf we get first a warning and then an error.

The problematic code seems to be in script_ops.py:

@ops.RegisterGradient("EagerPyFunc")
def _EagerPyFuncGrad(op, *dy):
  """Computes the gradient of an EagerPyFunc."""

  token = op.get_attr("token")

  def eagerly_executed_grad(*dy):
    tape, eager_inputs, eager_outputs = tape_cache.pop(compat.as_bytes(token))
    return tape.gradient(eager_outputs, eager_inputs, output_gradients=dy)

  with ops.control_dependencies(op.outputs):
    return _internal_py_func(
        func=eagerly_executed_grad,
        inp=dy,
        Tout=[tensor.dtype for tensor in op.inputs],
        eager=True,
        is_grad_func=True)

closed time in a month

ngc92

issue commenttensorflow/tensorflow

Gradients for tf.py_function with mixed arguments

if it doesn't reproduce in nightly let's close the issue

ngc92

comment created time in a month

issue closedtensorflow/tensorflow

Allow name argument of tf.name_scope to be a tf.string

System information

  • TensorFlow version (you are using): 2.1.0
  • Are you willing to contribute it (Yes/No): Yes (if it's not too difficult/time-taking)

Describe the feature and the current behavior/state. Currently, something like the following doesn't work since name in tf.name_scope must be a Python str:

import tensorflow as tf


for i in tf.range(4):
    tag = tf.strings.format('tag{}', i + 1)
    with tf.name_scope(tag):
        tf.summary.scalar('value', tf.constant(i**4))

Will this change the current API? How? Not significantly.

Who will benefit with this feature? Users of autograph who want to use a Tensor's value in tf.name_scope.

Any Other info.

closed time in a month

sumanthratna

issue commenttensorflow/tensorflow

Allow name argument of tf.name_scope to be a tf.string

This is not possible. Names in name_scope have to be statically known at graph build time, but string tf tensors are dynamic and their value is only known at graph execution time.

sumanthratna

comment created time in a month

issue commenttensorflow/tensorflow

Gradient not registered in autograph mode with tf.data.Dataset loop

Nevermind, I think there's an actual bug here where autograph is generating loops which break autodiff sometimes.

AdrienCorenflos

comment created time in a month

IssuesEvent

issue commenttensorflow/tensorflow

Gradient not registered in autograph mode with tf.data.Dataset loop

Think of tf.data as a tool for data loading into tensorflow, not for data processing as part of TF.

So we recommend you don't use tf.data to transform a piece of data you already have loaded in memory, as it's not really designed for that. Instead use your own TF loops (which should be relatively straightforward to write with autograph) to do your mapping, filtering, batching (which is just tf.stack), etc, all of which are differentiable operations.

AdrienCorenflos

comment created time in a month

pull request commenttensorflow/tensorflow

Corrected examples of tf.diag() and tf.diag_part()

Yes, make it python code.

On Thu, Apr 23, 2020 at 12:14 AM Manish Aradwad notifications@github.com wrote:

@ManishAradwad commented on this pull request.

In tensorflow/core/api_def/base_api/api_def_Diag.pbtxt https://github.com/tensorflow/tensorflow/pull/38573#discussion_r413567997 :

@@ -18,12 +18,14 @@ rank 2k with dimensions [D1,..., Dk, D1,..., Dk] where:

For example:

-``` -# 'diagonal' is [1, 2, 3, 4] -tf.diag(diagonal) ==> [[1, 0, 0, 0]

  •                   [0, 2, 0, 0]
    
  •                   [0, 0, 3, 0]
    
  •                   [0, 0, 0, 4]]
    

-``` +"""

Do u mean I should make like this? """ >>> diagonal = [1, 2, 3, 4]

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/38573#discussion_r413567997, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRKDGHDVZVBZRQTOK6LRN7TEDANCNFSM4MITEV5Q .

--

  • Alex
ManishAradwad

comment created time in a month

pull request commenttensorflow/tensorflow

[BugFix] - Prefetching on GPU is actually executed on CPU

Can't you do the fix I suggested if in tf2 graph mode? Something like if not executing_eagerly() and executing_eagerly_outside_functions()?

On Wed, Apr 22, 2020 at 2:05 PM Jonathan DEKHTIAR notifications@github.com wrote:

I can't fix it in this PR. It looks to be an issue in the Iterator in TF2 graph mode. A complete different issue than this PR tries to solve.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/37277#issuecomment-618040158, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRPWA4Y7XGHCHXGDOOTRN5L2JANCNFSM4LAYRPHA .

--

  • Alex
DEKHTIARJonathan

comment created time in a month

pull request commenttensorflow/tensorflow

[BugFix] - Prefetching on GPU is actually executed on CPU

It also needs to work in tf2 graph mode.

On Wed, Apr 22, 2020 at 2:02 PM Jonathan DEKHTIAR notifications@github.com wrote:

It just doesn't work in TF2 graph mode. The attribute does not exist. Hence I only check this for TF1 and TF2 eager. If that's good with you then we can merge

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/37277#issuecomment-618038688, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRKZEX733EEU4AMCD5DRN5LOFANCNFSM4LAYRPHA .

--

  • Alex
DEKHTIARJonathan

comment created time in a month

pull request commenttensorflow/tensorflow

[BugFix] - Prefetching on GPU is actually executed on CPU

Oh if it works in tf1 and tf2 then it's fine.

On Wed, Apr 22, 2020 at 1:53 PM Jonathan DEKHTIAR notifications@github.com wrote:

@alextp https://github.com/alextp are you sure replacing with iterator._iterator_resource.device is the proper way ? As stated above: host_iterator._device is working perfectly in TF1 or TF2 Eager.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/37277#issuecomment-618034617, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRI52P6LDXL7U3WWHIDRN5KNFANCNFSM4LAYRPHA .

--

  • Alex
DEKHTIARJonathan

comment created time in a month

Pull request review commenttensorflow/tensorflow

Corrected examples of tf.diag() and tf.diag_part()

 rank 2k with dimensions [D1,..., Dk, D1,..., Dk] where:  For example: -```-# 'diagonal' is [1, 2, 3, 4]-tf.diag(diagonal) ==> [[1, 0, 0, 0]-                       [0, 2, 0, 0]-                       [0, 0, 3, 0]-                       [0, 0, 0, 4]]-```+"""

Don't put the triple quotes in here as this exits the docstring and it's not valid python syntax

ManishAradwad

comment created time in a month

pull request commenttensorflow/tensorflow

Add gelu

@cheshire and @sanjoy might be interested in the performance of XLA vs the custom op here

@joker-eph are there docs for how to make the moral equivalent of tfxla kernels in the new bridge yet?

WindQAQ

comment created time in a month

pull request commenttensorflow/tensorflow

Add gelu

We would merge a python composite op version.

We're in the process of revamping tfxla to make writing kernels unnecessary (you'll write MLIR legalization passes instead, with more syntactic sugar) so I'd rather wait a bit and expose that API to addons.

On Tue, Apr 21, 2020 at 6:52 PM Sean Morgan notifications@github.com wrote:

This also needs a tfxla kernel for Gelu or we cannot support gelu with XLA on TPU/GPU.

Hi @alextp https://github.com/alextp . So this PR and migration from TFA has gotten a bit stale. Wanted to give a quick update on where we are with this in Addons and see if you have some advice. As a side note we were requested to submit a PR to keras/governance as well, but haven't gotten around to it.

We've ran into issues supporting XLA / TPUs in our repo (no way to load custom XLA kernels): tensorflow/custom-op#53 (comment) https://github.com/tensorflow/custom-op/issues/53#issuecomment-617100900

In order to support TPUs (as well as ABI incompatible installs of TF) we've implemented a python composite op for gelu:

https://github.com/tensorflow/addons/blob/master/tensorflow_addons/activations/gelu.py#L67

We benchmarked the CPU/GPU custom-op implementations vs. XLA compiled python version: https://colab.research.google.com/drive/1rLb4EuydbFg9PbhboXhCDqopcl6BmphG

There are some noticeable speed ups on (especially CUDA kernel and backward passes), but it's becoming quite a maintenance burden for us to support custom-ops for simple activations. Would TF core be willing to merge in a python composite op version or would it be preferable to migrate the custom-op with a tfxla kernel?

Is there any documentation on writing tfxla kernels? I suppose it could mostly be inferred from say relu https://github.com/tensorflow/tensorflow/blob/master/tensorflow/compiler/tf2xla/kernels/relu_op.cc, but it's quite difficult for us to test any implementation.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/33945#issuecomment-617499180, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRIO25FYWK7HC45WIMDRNZEWJANCNFSM4JIKFKMA .

--

  • Alex
WindQAQ

comment created time in a month

pull request commenttensorflow/tensorflow

_ConstantValue can now see through tf.identity ops

You can make changes to the tests to make them pass.

ngc92

comment created time in a month

pull request commenttensorflow/tensorflow

Fix issue in tf.image.extract_glimpse

I prefer option (3), coupled with a deprecation of tf.image.extract_glimpse with instructions in the message for how to get the "broken" results using extract_glimpse_v2.

Any other option can change the properties of code which existing works now and depends on the buggy behavior, and because TF promises backwards compatibility of its python API and graphdef we cannot do that.

On Tue, Apr 21, 2020 at 8:15 AM Yong Tang notifications@github.com wrote:

Thanks @karmel https://github.com/karmel @tensorflow/api-owners https://github.com/orgs/tensorflow/teams/api-owners for the comment. The previous behavior of tf.image.extract_glimpse was incorrectly implemented in two issues:

  1. both centered=True and centered=False will behave like centered=False (sort of, see combination effect of 2)
  2. there is also another issue where the centered=False will take positive value incorrectly.

Below is an example:

import tensorflow as tf import numpy as np

img = tf.constant(np.arange(25).reshape((1, 5, 5, 1)), dtype=tf.float32)

Image:

[ 0. 1. 2. 3. 4.]

[ 5. 6. 7. 8. 9.]

[ 10. 11. 12. 13. 14.]

[ 15. 16. 17. 18. 19.]

[ 20. 21. 22. 23. 24.]

tf.image.extract_glimpse( img, [3, 3], [[-2, 2]], centered=False, normalized=False, noise='zero')

[ 0. 0. 0.]

[ 0. 0. 0.]

[ 0. 0. 0.]

In the above example, since size = [3, 3], offset = [[-2, 2]], and centered=False, the obtained region should take upper-left corner as [0, 0], thus the correct one should be:

[ 0. 0. 0.]

[ 0. 0. 0.]

[ 2. 3. 4.]

Given this implementation had been in place for a really long time. I don't know the best way to address it without breaking the API. There might be several considerations:

  1. We could consider this as a bug that must be fixed, so change the underlying implementation directly.
  2. We could implement the correct one in C++ with a new kernel ( ExtractGlipmpseV2), and leave the old C++ kernel alone. In this way, when people use python API, they will be rerouted to the new kernel, however, for saved models poeple will get the old result.
  3. We could implement the correct one with a new python API extract_glimpse_v2, and co-exist with both extract_glimpse and extract_glimpse_v2. It might cause some confusion though, as user will see two exactly APIs with different behavior.

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/38549#issuecomment-617244215, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRLEBMICJOFHBTLZ5J3RNW2A7ANCNFSM4MICQYMA .

--

  • Alex
yongtang

comment created time in a month

issue commenttensorflow/tensorflow

GradientTape() unable to compute the gradient wrt Keras model inputs

ckyleda: please file a separate issue with instructions to reproduce against nightly.

On Tue, Apr 21, 2020 at 8:12 AM ckyleda notifications@github.com wrote:

Despite all above comments, none appear to actually resolve the issue in the latest version of tf2.0.

tape.watch on a numpy input converted to a tensor still causes tape.gradient(model_output, model_input) to return None.

This behaviour works as expected using tf.gradients without eager execution enabled.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/issues/31264#issuecomment-617242529, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRKC7MM7LM76H2CX3DDRNWZWDANCNFSM4II3PUXQ .

--

  • Alex
lujq96

comment created time in a month

Pull request review commenttensorflow/tensorflow

Add gelu

 def _BiasAddGradV1(unused_bias_op, received_grad):                                              reduction_dim_tensor))  +@ops.RegisterGradient("Gelu")+def _GeluGrad(op, grad):+  return gen_nn_ops.gelu_grad(grad, op.inputs[0], op.get_attr("approximate"))

We need a registered gradient for GeluGrad as well.

WindQAQ

comment created time in a month

issue commentemeryberger/scalene

Possible to track code run via PyEval_CallObject?

Yes tf.function (graph mode) runs all the code inside the TF runtime (so behind a python C API call) instead of through the python interpreter, so there's nothing scalene can easily do here.

You can use the newly released tensorflow profiler to profile TF's graph mode, but it has fairly different characteristics than scalene.

I do wonder if it makes sense to integrate scalene and the tensorflow profiler a little, though. The current python tracer used optionally by the tensorflow profiler is fairly high overhead (It's not sampling-based, etc) and it does not track memory usage. We could support people using scalene instead of it. We can also consider letting scalene now about the memory allocated through tensorflow APIs (since right now debugging GPU/TPU memory usage through tensorflow involves a lot more work than necessary). @ckluk @jbaiocchi would either of you be interested in following up on this? For reference, scalene is described in https://github.com/emeryberger/scalene and what I consider relevant for us are the line-by-line (as opposed to function-by-function) time and memory profiling from python with relatively low overhead.

guoshimin

comment created time in a month

pull request commenttensorflow/tensorflow

Added a warning note in tf.where documentation for 'NaN' gradient issue with workaround

I wish...

On Fri, Apr 17, 2020 at 8:54 AM Maadesh Sivakumar notifications@github.com wrote:

@alextp https://github.com/alextp is there a way to run all these checks in my local before I push my change. That way it is much more easier.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tensorflow/pull/38467#issuecomment-615323660, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABHRIAKBMCRIBOAF63J7DRNB3SFANCNFSM4MGJVUIQ .

--

  • Alex
anorak-k

comment created time in a month

more