profile
viewpoint
Zilin Zhu zhuzilin tencent Beijing https://zhuzilin.github.io/ fulltime SDE @Tencent. working on tensorflow

zhuzilin/NP_ML 159

A tool library of classical machine learning algorithms with only numpy.

zhuzilin/monkey 13

A C++ version monkey language interpreter. From Write An Interpreter In Go

zhuzilin/google-translate-desktop 9

Google Translate Desktop built with Electron

zhuzilin/simple-pandas 9

A much simpler pandas!!!

HaoguangYang/AgileVehicle 7

The AgileVehicle Project, an automated road vehicle that take you wherever your destination is in whatever attitude you want.

zhuzilin/Together-backend 5

Backend for a microservice platform (🚧 under construction 🚧)

zhuzilin/li 3

another mini text editor with 500 loc and simple interfaces for short cut.

DH-Diego/CapstoneAnormalyDetection 2

This is a repo for capstone

zhuzilin/electron-react-starter 2

Starter for the desktop software development using electron and react.

zhuzilin/Together-frontend 2

frontend for course E6156

PR opened tensorflow/tensorflow

Remove repeated call in FindKernelDef

Thank you for your time on reviewing this pr.

+1 -1

0 comment

1 changed file

pr created time in an hour

push eventzhuzilin/tensorflow

TensorFlower Gardener

commit sha 908664eac03821257ccd17dd0414670b9cbf3692

Merge pull request #40675 from zhuzilin:tiled-layout-doc-fix PiperOrigin-RevId: 317946833 Change-Id: I4cd9de065fc47143d8ccdff3552cc8bf716fe0c0

view details

A. Unique TensorFlower

commit sha 8535dafb37ec4ce5c7272ffa4b8b4c491d44e999

Internal change PiperOrigin-RevId: 317950322 Change-Id: I83c81973a220b74c015a8571c4f1d50b4ede91db

view details

Dero Gharibian

commit sha ab05b8d7776e04e6e483c5b0bc7d7358df3ec967

Replaced extern inline with static inline to mitigate duplicate symbols in cgo PiperOrigin-RevId: 317950897 Change-Id: Ia3cb17d5946a969187d8f1a81ff4c77844dcde3a

view details

David Majnemer

commit sha 83b4d04ae2621456172aaf7fa0fa54aea6fb2e81

[XLA] Evaluate tf.sign for real arguments with fewer operations We can evaluate it as: (x > 0) - (x < 0) which should be cheaper than: x != x ? 0 : sign(x) PiperOrigin-RevId: 317952523 Change-Id: I7b848497c9ceedb8aba10185cdba8d9c3d3d6a3d

view details

Xiao Yu

commit sha a912655abd2f8b55441c2a8396c2580ceee07a29

Fix a heap-use-after-free issue. PiperOrigin-RevId: 317955770 Change-Id: I843e4bcd9b5cac3c22893d4e0e9aa6867e18a8c4

view details

abhichou4

commit sha 672b293cbdbfc5225120a3f46f4665a89fc62acf

use Regexp for right ValueError

view details

rahul-kamat

commit sha 7f0e00817fe3c5c090dfb748137120f5a1ae0261

PR review changes

view details

Rick Chao

commit sha a24767dcaeac10dd87b01ac4de27f0f7ff1e3c55

Skip testClusterResolverProperty for TPU cases and follow up with a fix. PiperOrigin-RevId: 317959686 Change-Id: I6ad671e2a5b03886e24d5db88d2cf57db35b3bd1

view details

Priya Gupta

commit sha dfe03768e01a3488e57b428e4c7f02ede66af555

Enhance the docstring for tf.distribute.Stategy.reduce API. PiperOrigin-RevId: 317965125 Change-Id: I46ce4c2e6a8d547d9d26c01ccb27b25394f1dc7d

view details

abhichou4

commit sha 30c8b4a5bf64f982a626fb1f5e3888e5c481dce5

reformat

view details

A. Unique TensorFlower

commit sha 7211f4c2b12fb0e4f4ce24e710900048c8a322a4

Add the "--define=no_tensorflow_py_deps=true" flag for the windows cpu release builds. PiperOrigin-RevId: 317968971 Change-Id: I7d4db21474d85620928f3a5ffb1e4cfebaa2be9f

view details

Rick Chao

commit sha 4dd7002d5697d729e281d3b05a140088361690e2

MultiProcessRunner: Add more information regarding UnexpectedSubprocessExitError. PiperOrigin-RevId: 317970123 Change-Id: Ie2aff422fc7eff2bd48b6a82fab34e4b0c0bb930

view details

rahul-kamat

commit sha b00c93e3c13db45b8d19121598f9a0b6eaf0b93f

Add test with all ops annotated, Move proto strings inside tests

view details

Mehdi Amini

commit sha 422825f1a904b0cf0b82ccf804af7c433ca6b56a

Fix Markdown table format to dispay correctly on GitHub GitHub requires a leading | for tables. PiperOrigin-RevId: 317971572 Change-Id: I0b0860e143d21fb8fa52a8421fa62b43fa9bfd04

view details

A. Unique TensorFlower

commit sha 5a5679c8aa3645aae5a47582f40f6697a04efa9a

when python is not initialized , do nothing in python hooks. PiperOrigin-RevId: 317971811 Change-Id: Ib73f11e1c2a88dee6f11105c2ae8ab20599703a6

view details

Nupur Garg

commit sha 3252c965ee399aa795522f9f383805dc4aaec68f

Add input array shape instructions to Keras model. PiperOrigin-RevId: 317972245 Change-Id: I9863d2e6beda85e4c0d016db541bb4341e739bc9

view details

Lluis-Miquel Munguia

commit sha 7db333e5545ccd6784b2e752a95b8119769e6696

Internal code refactoring. PiperOrigin-RevId: 317973409 Change-Id: Ic249b4e1380313b6c556022dc78826c3165f1d3f

view details

TensorFlower Gardener

commit sha 8c80b0433f6f09a720a4fc2fbe55b64b4948965c

Merge pull request #40512 from samholt:master PiperOrigin-RevId: 317976246 Change-Id: Ia28ecdce7acf20183ad4abd1057c0c0e8037dc68

view details

Ashwin Murthy

commit sha e213574acedc8810cf3eb753ff387d70c52b90a3

Add g3doc for TensorFlow composite operation fusion in the TensorFlow Lite converter PiperOrigin-RevId: 317976446 Change-Id: I5b9093f5290f14444cb1a64c1c17f3017996e5b5

view details

Tare Gaskin

commit sha dc704773c3fbe0cf3756d5e975871b695da59311

[-Wsign-compare] batch resolution 1

view details

push time in an hour

create barnchzhuzilin/tensorflow

branch : fix-repeated-call

created branch time in an hour

pull request commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

@aaudiber I've changed the SkipNext to Skip. Could you have another look? Thank you!

zhuzilin

comment created time in 19 hours

push eventzhuzilin/tensorflow

zilinzhu

commit sha d8d044d7d811f5e951ae4bb90ba35090307f17ee

change SkipNext to Skip

view details

push time in 19 hours

PR opened tensorflow/tensorflow

Add SkipRecords to RecordReader

This is a PR from JIZHI Team & TaiJi AI platform in Tencent.

This pr add SkipRecords(uint64* offset, int num_to_skip) to RecordReader, which will skip num_to_skip number of records without reading them (using SkipNBytes). It will help to implement the Skip interface mentioned in #40963, where we are hoping to avoid unnecessary data IO and transformation in ops like tf.data.Dataset.shard and tf.data.Dataset.skip.

Thank you for your time on reviewing this pr.

FYI, @aaudiber

+74 -5

0 comment

3 changed files

pr created time in 21 hours

create barnchzhuzilin/tensorflow

branch : record_reader-skip_records

created branch time in 21 hours

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber The tests are finally passed! 😂 Thank you so much for your help in this pr!

zhuzilin

comment created time in a day

pull request commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

@aaudiber Thank you for your nice questions. Here are some of my thoughts.

Is there a specific use case motivating this change?

The specific use case would be using shard when there are uneven amount of data in the files of the dataset. For now, we encourage to shard by file names, so that each shard won't process unnecessary data. But if there are not enough files to allocate one for each worker or the size of the file varies a lot, then the user has to shard the dataset after reading them. In such situation, he or she will meet performance loss, especially when the cpu resource is limited.

Which datasets are you planning to implement skipping for?

I think probablity most of them, except TFRecordDataset, for it is using a SequentialRecordReader (I wonder if we can only move the offset without actually reading from the file...). Therefore, it's likely to be lots of additions. I hope we can settle down the design in this pr in order to avoid large scale of refactor.

What do you think of changing SkipNext(ctx, end_of_input) to Skip(ctx, num_to_skip, end_of_input)? In some cases skipping multiple could be cheaper than skipping one at a time.

This is one of the design issues I hope to discuss 😄 . I agree that Skip(num_to_skip) would potentially have better performance than SkipNext, but I'm not sure how to deal with input_impl_.reset() when end_of_sequence is met. Are we going to put the reset in the Skip or shall we return the number of successful skipping back to the iterator?

zhuzilin

comment created time in 2 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber It seems that random building errors keep on happening 😂.

zhuzilin

comment created time in 2 days

pull request commenttensorflow/tensorflow

[XLA] Make postorder stack adds a channel once for all predecessors

@gbaned It seems that @joker-eph seldom reviews pr unrelated to mlir. Shall we add another reviewer for this? Thank you!

zhuzilin

comment created time in 2 days

pull request commenttensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

@aaudiber Could you have a look at this pr? Thank you!

zhuzilin

comment created time in 2 days

pull request commenttensorflow/tensorflow

[tf.data] Use output_shapes from python for batch dataset

@jsimsa Could you have a look at this pr? Thank you!

zhuzilin

comment created time in 2 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber The building failure is at tensorflow/python/kernel_tests:cast_op_test, which should have nothing to do with tf.data. Is there anything else I can help with this pr?

zhuzilin

comment created time in 2 days

push eventzhuzilin/electron-fc

dependabot[bot]

commit sha b57ecba4c23cb290d32550a89278772d604b0b38

Bump electron from 5.0.6 to 7.2.4 Bumps [electron](https://github.com/electron/electron) from 5.0.6 to 7.2.4. - [Release notes](https://github.com/electron/electron/releases) - [Changelog](https://github.com/electron/electron/blob/master/docs/breaking-changes.md) - [Commits](https://github.com/electron/electron/compare/v5.0.6...v7.2.4) Signed-off-by: dependabot[bot] <support@github.com>

view details

Zilin Zhu

commit sha f9ab4f2eaa8e30723a321bead7dce0748400f312

Merge pull request #3 from zhuzilin/dependabot/npm_and_yarn/electron-7.2.4 Bump electron from 5.0.6 to 7.2.4

view details

push time in 3 days

PR merged zhuzilin/electron-fc

Bump electron from 5.0.6 to 7.2.4 dependencies

Bumps electron from 5.0.6 to 7.2.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/electron/electron/releases">electron's releases</a>.</em></p> <blockquote> <h2>electron v7.2.4</h2> <h1>Release Notes for v7.2.4</h1> <h2>Fixes</h2> <ul> <li>Fixed Promise timeout issue when running Electron as Node. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23324">#23324</a></li> <li>Fixed a use-after-free error that could happen if a Tray was destroyed while showing a custom context menu. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23182">#23182</a></li> <li>Fixed an issue where windows without <code>nativeWindowOpen: true</code> could invoke the non-native-open path. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23224">#23224</a></li> <li>Fixed memory leak when using contextBridge with sandbox=true. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23232">#23232</a></li> <li>MacOS VoiceOver is now able to find its way back into web contents after it navigated "out" of an application. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23174">#23174</a></li> </ul> <h2>electron v7.2.3</h2> <h1>Release Notes for v7.2.3</h1> <h2>Fixes</h2> <ul> <li>Security: Ensure proxy object is created in the correct context a9bead22</li> </ul> <h2>electron v7.2.2</h2> <h1>Release Notes for v7.2.2</h1> <h2>Fixes</h2> <ul> <li>Fixed a potential crash on invalid <code>zoomFactor</code> values when setting the zoom factor of a webpage. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22710">#22710</a></li> <li>Fixed an issue with <code>maximizable</code> state persistence of BrowserWindows on macOS. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23019">#23019</a></li> <li>Fixed an issue with possible creation of a messageBox which cannot be dismissed on macOS. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23089">#23089</a></li> <li>Fixed an occasional crash when closing all BrowserWindows. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23024">#23024</a></li> <li>Security: Backported fix for CVE-2020-6426: inappropriate implementation in V8. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23043">#23043</a></li> <li>Security: backported a fix for crbug.com/1065094. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23059">#23059</a></li> <li>Security: backported fix for a potential buffer overrun in WebRTC audio encoding. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23037">#23037</a></li> <li>Security: backported fix for site isolation bypass in dedicated workers. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23040">#23040</a></li> <li>Security: backported the fix to CVE-2020-6452: potential container-overflow in MediaStream mojo. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23044">#23044</a></li> </ul> <h2>Other Changes</h2> <ul> <li>Security: Backport fix for buffer underflow in DWrite. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22979">#22979</a></li> <li>Security: Backported fix for use after free in file chooser. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22981">#22981</a></li> <li>Security: backport fix for CVE-2020-6451: Use after free in WebAudio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22945">#22945</a></li> <li>Security: backport fix for use after free in VideoEncodeAccelerator. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22983">#22983</a></li> <li>Security: backported fix for CVE-2019-20503: Out of bounds read in usersctplib. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22986">#22986</a></li> <li>Security: backported fix for CVE-2020-6422: Use after free in WebGL. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23017">#23017</a></li> <li>Security: backported fix for CVE-2020-6423: Use after free in audio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23048">#23048</a></li> <li>Security: backported fix for CVE-2020-6427: Use after free in audio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23015">#23015</a></li> <li>Security: backported fix for CVE-2020-6428: Use after free in audio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23013">#23013</a></li> <li>Security: backported fix for CVE-2020-6429: Use after free in audio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23011">#23011</a></li> <li>Security: backported fix for CVE-2020-6449: Use after free in audio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/23009">#23009</a></li> <li>Security: backported fix for use-after-poison in WebAudio (crbug.com/1023810). <a href="https://github-redirect.dependabot.com/electron/electron/issues/22869">#22869</a></li> <li>Security: backported fix for use-after-poison in WebAudio. <a href="https://github-redirect.dependabot.com/electron/electron/issues/22943">#22943</a></li> </ul> <!-- raw HTML omitted --> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/electron/electron/commit/0552e0d5de46ffa3b481d741f1db5c779e201565"><code>0552e0d</code></a> Bump v7.2.4</li> <li><a href="https://github.com/electron/electron/commit/c87b474496c35580ebbeab13f9c8c982b4eab4d3"><code>c87b474</code></a> refactor: port window-setup to use ctx bridge instead of being run in the mai...</li> <li><a href="https://github.com/electron/electron/commit/69683def0dbea9a7a1bb382050293183c2ba066e"><code>69683de</code></a> fix: use Node's microtasks policy in node_main.cc (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23154">#23154</a>) (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23324">#23324</a>)</li> <li><a href="https://github.com/electron/electron/commit/8148b76efa0092ec236bcde64249fda93ce6ff1d"><code>8148b76</code></a> style: use build/include_directory for NOLINT (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23266">#23266</a>) (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23304">#23304</a>)</li> <li><a href="https://github.com/electron/electron/commit/7dfcb5ef0495cc34738e52c2ef13ecda5c998017"><code>7dfcb5e</code></a> fix: block custom window.open when nativeWindowOpen is true (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23188">#23188</a>) (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23224">#23224</a>)</li> <li><a href="https://github.com/electron/electron/commit/0b3bf1e5566efe64c31d5b6aa07c0a8eb1c81836"><code>0b3bf1e</code></a> fix: do not mutate ipc instances across contexts (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23239">#23239</a>)</li> <li><a href="https://github.com/electron/electron/commit/fd529ac30a7875c22d6d5b6d7af003843e0fbf18"><code>fd529ac</code></a> fix: do not allow child windows to specify their own preload script (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23229">#23229</a>)</li> <li><a href="https://github.com/electron/electron/commit/3909001a006cff7b36505a0a80ca8926f0e0646f"><code>3909001</code></a> fix: ensure that functions are not retained beyond their context being releas...</li> <li><a href="https://github.com/electron/electron/commit/039be2e407237c4696ec099e00662c235527cfb7"><code>039be2e</code></a> build: improve patch filename remembering (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23070">#23070</a>) (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23184">#23184</a>)</li> <li><a href="https://github.com/electron/electron/commit/fb6f60460ba646cb5ea9985b09db3d7bd92e1376"><code>fb6f604</code></a> fix: heap-use-after-free in tray.popUpContextMenu (<a href="https://github-redirect.dependabot.com/electron/electron/issues/22842">#22842</a>) (<a href="https://github-redirect.dependabot.com/electron/electron/issues/23182">#23182</a>)</li> <li>Additional commits viewable in <a href="https://github.com/electron/electron/compare/v5.0.6...v7.2.4">compare view</a></li> </ul> </details> <br />

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


<details> <summary>Dependabot commands and options</summary> <br />

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
  • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
  • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
  • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
  • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

</details>

+2341 -1338

0 comment

3 changed files

dependabot[bot]

pr closed time in 3 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

Genlty ping @aaudiber. I wonder if we are going to merge this pass? Thank you :).

zhuzilin

comment created time in 3 days

PR opened tensorflow/tensorflow

[XLA] Make postorder stack adds a channel once for all predecessors

This is a PR from JIZHI, the AI platform in Tencent.

This pr slightly changes ComputeInstructionPostOrder and make the stack for post order traversal not add the whole channel for every predecessor. And example is the InstructionPostOrderWithAllReduce test case in hlo_computation_test.cc:

HloModule Module

add {
  lhs = f32[] parameter(0)
  rhs = f32[] parameter(1)
  ROOT add = f32[] add(lhs, rhs)
}

ENTRY entry {
  param = f32[128] parameter(0), sharding={maximal device=0}
  crs0 = f32[128] all-reduce(param),
    replica_groups={{0}}, channel_id=1, to_apply=add,
    sharding={maximal device=0}
  crs1 = f32[128] all-reduce(param),
    replica_groups={{0}}, channel_id=1, to_apply=add,
    sharding={maximal device=1}
  add = f32[128] add(crs0, crs0), sharding={maximal device=0}
  ROOT t = (f32[128], f32[128]) tuple(add, crs1)
})

The add instruction would add crs0 and crs1 twice in the old implementation. And this pr avoids that.

Thank you for your time on reviewing this pr.

+26 -10

0 comment

1 changed file

pr created time in 7 days

create barnchzhuzilin/tensorflow

branch : hlo-post-order-calc

created branch time in 7 days

pull request commenttensorflow/tensorflow

[grappler] Convert identity ConjugateTranspose to Conj instead of removing it

@ezhulenev @gbaned Could you help to merge this pr? Thank you! :).

zhuzilin

comment created time in 7 days

issue commentpytorch/pytorch

[ppc64le pytorch] Pytorch build failure on IBM power8, NVIDIA P100, CUDA 9.2, CUDNN 7.4, NCCL 2.4 & Python 3.6

@ghltshubh Did you find any solution to this problem? I've encountered the same error...

ghltshubh

comment created time in 9 days

PR opened tensorflow/tensorflow

[tf.data] Use output_shapes from python for batch dataset

This pr is related to #40938 . It removes the output shape calculation in C++ for BatchDatasetOp and PaddedBatchDatasetOp and use the shapes passed from python instead.

Thank you for your time on reviewing this pr.

+11 -38

0 comment

4 changed files

pr created time in 9 days

create barnchzhuzilin/tensorflow

branch : batch-dataset-op-output-shape

created branch time in 9 days

PR opened tensorflow/tensorflow

[tf.data] Add SkipNext interface to iterator

This pr adds a SkipNext interface to IteratorBase and use this method in SkipDatasetOp and ShardDatasetOp.

If this interface is added, we can gradually implement it for all dataset ops so that some unnecesscary calculation can be avoided.

Thank you for your time on reviewing this pr.

+44 -14

0 comment

4 changed files

pr created time in 9 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 62cb089a6d3e3180e38c7a75f97d9321da7c085b

add SkipNext interface to iterator

view details

push time in 9 days

create barnchzhuzilin/tensorflow

branch : skip-next

created branch time in 9 days

pull request commenttensorflow/tensorflow

Change GetMatchingPaths to avoid traversing unnecesscary paths

@mihaimaruseac Sorry... I didn't notice that //tensorflow/tools/api/tests:api_compatibility_test had different error... The new build failure is caused by paths like /root/tensorflow/../* and I've fixed it.

zhuzilin

comment created time in 9 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 80bedbb8baf761400900e5876eb403cf789e7a12

fix build failure

view details

push time in 9 days

issue openedtensorflow/tensorflow

Calculating output_shapes and output_types of a dataset in python or C++?

In current tensorflow implementation, almost all dataset ops have attributes:

    .Attr("output_types: list(type) >= 1")
    .Attr("output_shapes: list(shape) >= 1")

However, most of these attributes are not used since many ops won't change them, like cache, repeat, prefetch and these two attributes are calculated in C++ part again for batch and padded_batch. Are we going to remove those unnecessary attributes? And for the shapes that are calculated twice, shall we leave the C++ version or the python version?

If there are any modification needed, I'd love to contribute :).

created time in 10 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 6ada3ddee89fc76a007c39b9812f87bb5ffb99be

fix errors on paths with protocol

view details

push time in 10 days

pull request commenttensorflow/tensorflow

Change GetMatchingPaths to avoid traversing unnecesscary paths

@mihaimaruseac I've fixed the build failures, which are related to paths with protocol like file://test. Could you have another look? Thank you!

zhuzilin

comment created time in 10 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 2bc1232af08160e9b2caa73ba94356c6ebd52d75

fix errors on paths with protocol

view details

push time in 10 days

Pull request review commenttensorflow/tensorflow

Change GetMatchingPaths to avoid traversing unnecesscary paths

 void ForEach(int first, int last, const std::function<void(int)>& f) { Status GetMatchingPaths(FileSystem* fs, Env* env, const string& pattern,                         std::vector<string>* results) {   results->clear();-  // Find the fixed prefix by looking for the first wildcard.-  string fixed_prefix = pattern.substr(0, pattern.find_first_of("*?[\\"));-  string eval_pattern = pattern;-  std::vector<string> all_files;-  string dir(io::Dirname(fixed_prefix));-  // If dir is empty then we need to fix up fixed_prefix and eval_pattern to-  // include . as the top level directory.-  if (dir.empty()) {-    dir = ".";-    fixed_prefix = io::JoinPath(dir, fixed_prefix);-    eval_pattern = io::JoinPath(dir, pattern);+  if (pattern.empty()) {+    return Status::OK();   } +  string eval_pattern = pattern;+  bool is_directory = pattern[pattern.size() - 1] == '/';+#ifdef PLATFORM_WINDOWS+  is_directory = is_directory || pattern[pattern.size() - 1] == '\\';+#endif

@mihaimaruseac Because the fs->IsDirectory will check FileExists first and here I'm checking a pattern, which is not an existing file.

zhuzilin

comment created time in 11 days

PR opened tensorflow/tensorflow

change GetMatchingPaths to speed it up

This pr updates the function GetMatchingPaths.

The old code will collect all possible path before using any wildcard characters, which will be really slow and memory demanding when matching patterns like "/*". Similar issue was stated in #40553 . The updated function will match the directory while gradually traverse the possible path.

Thank you for your time on reviewing this pr.

+40 -30

0 comment

1 changed file

pr created time in 13 days

create barnchzhuzilin/tensorflow

branch : fix_matching_files

created branch time in 13 days

delete branch zhuzilin/tensorflow

delete branch : fix_matching_files

delete time in 13 days

create barnchzhuzilin/tensorflow

branch : fix_matching_files

created branch time in 13 days

delete branch zhuzilin/tensorflow

delete branch : fix_matching_files

delete time in 13 days

create barnchzhuzilin/tensorflow

branch : fix_matching_files

created branch time in 13 days

issue commenttensorflow/tensorflow

tf.io.matching_files hangs given a certain pattern

The reason for this problem is the algorithm in tf.io.matching_files will search for all files and nested directories under / when /*name is used. I'd love to help to fix this problem.

mjkim720

comment created time in 13 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber It is strange that Linux CPU test has failed... All I did last commit was adding a newline in python code to make it shorter than 80 characters, but the Linux CPU test turns from passed to failed... The error is about vectorizer registry:

tensorflow/core/grappler/optimizers/data/vectorization/vectorizer_registry_test.cc:46
Expected equality of these values:
  ::tensorflow::Status::OK()
    Which is: OK
  (s)
    Which is: Not found: Op type not registered '' in binary running on localhost.
  Make sure the Op and Kernel are registered in the binary running in this process.
  Note that if you are loading a saved graph which used ops from tf.contrib, accessing
  (e.g.) `tf.contrib.resampler` should be done before importing the graph,
  as contrib ops are lazily registered when the module is first accessed.

Is there any reason this new optimization may influence it?

zhuzilin

comment created time in 13 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber There was a lint error (one line was longer than 80 characters after renaming to reorder_data_discarding_ops). I've fix it and could you help aprove this pr again? Thank you!

zhuzilin

comment created time in 13 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha db2b7a6677f87bdfa37557b1872dd7ef2b6315f3

fix lint error

view details

push time in 13 days

startedcloudwu/coroutine

started time in 14 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 6b01b02274a696a2c4c391e2aef9555f1fcdc0a8

modify docstring

view details

push time in 14 days

PR closed tensorflow/tensorflow

[XLA] Remove cross device nodes from clusters cla: yes size:S

This is a PR from JIZHI, the AI platform in Tencent.

This pr removes the the cross device nodes (nodes that have input or output from other device) from the compilation candidates in MarkForCompilation. The reason for this change is that if the cross device nodes are introduced into clusters, the following 2 situation may happen and largely harm the performance.

  device0:  op1 ---    op2        cluster(op1, op2) ---
                  |          ===>                     |
  device1:        ---> op3                            ---> op3

  or

  device0:  op1   ---> op2              ---> cluster(op1, op2)
                  |          ===>       |
  device1:  op3 ---               op3 ---

XLA will delay the execution of an op when one of its predecessors is merged in the middle of a cluster. This delay will be ok when the op and the cluster are on the same device, because their kernels are executed sequentially. But when using multiple devices, this delay will make one device wait. For example, this is the timeline of a transformer model:

  • Before removing cross device nodes: image
  • After removing cross device nodes: image

The alternative way to solve this problem is to add more condition in TryToContract. However, that will result in larger modification because we need to save the device information of all nodes in a cluster. If you prefer this way, we are also glad to help.

Thank you for your time on reviewing this pr.

+32 -134

26 comments

2 changed files

zhuzilin

pr closed time in 14 days

pull request commenttensorflow/tensorflow

[XLA] Remove cross device nodes from clusters

Ok. I'll close this pr for now and will propose another design.

zhuzilin

comment created time in 14 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber I've removed cache and updated the docstring. Could you have another look? :).

zhuzilin

comment created time in 14 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 7b11290ba0927e63ea9002ecd907a2e63ce021e7

remove support for cache op

view details

push time in 14 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber Sure. I'll remove the cache for now if we will separate memory and file cache ops.

zhuzilin

comment created time in 14 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber Thank you for the information! I haven't thought about this use case before. However, it's hard to distinguish file cache and memory cache because filename is an input instead of a attribute, which means we can't known whether it will be empty or not in grappler... If we need to preserve the use case you mentioned, maybe we have to remain this optimization as default false?

zhuzilin

comment created time in 15 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber I've renamed the pass to reorder_data_discarding_ops and modified the doc. Could you have another look? As for the file-caching issue, I think we should maintain the optimization to it and tell user the consequence of it. Because in my opinion, using file cache to save the whole dataset while discarding some of it is not likely a common usage. @karmel Could you also check the updated doc to see if it has the information we need?

zhuzilin

comment created time in 16 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha bbf639859f43feec75081b9e9c9c739f07f65feb

add details to doc

view details

push time in 16 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 71b54d0c6a23fd1f5dff1f91c19eace16e93704e

add details to doc

view details

push time in 16 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 6acbd6b91236a9b914155a0473158f58fa8a39bd

rename to reorder_data_discarding_ops

view details

push time in 16 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@karmel Thank you for your questions :).

  1. What exactly is hoisting in this context? What are things hoisted into, and how does this affect the data pipeline in question? (Looking at the one-line example, this seems to be a reordering.)

I'm a little confused about the terms reordering and hoisting. Should hoisting have loop involve? In that case, we may rename it to reordering. I was just imitating the pass hoist_random_uniform.

  1. What is the full set of transformations that is hoisted? Do these eventually get run somewhere else?

The transformation should involve moving skip, take, shard to the front of map, cache, prefetch so that no unnecessary calculation or cache will take place. I think it will only involve tf.data ops.

  1. Can you add examples to the docstrings so that users will understand how to use this?

I'd love to. Could you tell me how specific the docstrings should be? And should the extra docstrings be added in optimization_options.py?

More broadly, should this be turned on by default? What would we have to test/ensure first?

I believe there will be no harm turning it on by default. And I'll be glad if it does.

@aaudiber @jsimsa Could you share your opinions on these questions? Thank you.

zhuzilin

comment created time in 17 days

pull request commenttensorflow/tensorflow

[doc] Fix broken figures in tiled_layout.md

@lamberta This is a fix of xla doc. Could you have a look at this? Thank you!

zhuzilin

comment created time in 18 days

PR opened tensorflow/tensorflow

[doc] Fix broken figures in tiled_layout.md

This pr changed the figures in a style that https://stackoverflow.com/a/12118349/5163915 suggests so that they can be shown correctly on github.

Thank you for your time on reviewing this pr.

+9 -7

0 comment

2 changed files

pr created time in 18 days

pull request commenttensorflow/tensorflow

[XLA] Remove cross device nodes from clusters

@gbaned Sorry for the delay... @tpopp Thank you for these detailed explanation. I also agree that allowing those ops with cross-device edges inside a cluster but stopping them from merging may be a better way to walk around these tests. In that case, we will need to add a new attribute to Cluster that shows the input devices and output devices of it. Any if we separate the input devices and output devices, we can allow cluster A and B be merged when only B has a cross device output and A is an input of B. Any suggestion on the design? or maybe we need a new pr thread for that?

zhuzilin

comment created time in 18 days

create barnchzhuzilin/tensorflow

branch : tiled-layout-doc-fix

created branch time in 18 days

push eventzhuzilin/zhuzilin.github.io

zilinzhu

commit sha 5c91f10b791d0ca280260504b71c59737fd909b1

Updates

view details

push time in 19 days

push eventzhuzilin/zhuzilin.github.io

zilinzhu

commit sha 88856a1e49dc5292a8e4cb9af5a75a71834eac93

source

view details

zilinzhu

commit sha 98396f7e9e62ed6b31a9d7a16e5caec51efa9d0a

add post

view details

push time in 19 days

push eventzhuzilin/zhuzilin.github.io

zilinzhu

commit sha 35cfcc07e76fd0b98b69a6c8d8c93a5ed949f7bc

Updates

view details

push time in 19 days

startedrxi/json.lua

started time in 19 days

pull request commenttensorflow/tensorflow

Fix only pass RunOptions to keras will trigger core

@tanzhenyu Could you have a look at this pr?

zhuzilin

comment created time in 21 days

pull request commenttensorflow/tensorflow

[grappler] Convert identity ConjugateTranspose to Conj instead of removing it

@gbaned This pr has been waiting for reviewing for a while... Maybe we can add someone else to review it? I believe it is a pretty simple change and won't take too much time...

zhuzilin

comment created time in 21 days

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@aaudiber Thank you for your reviews. I've changed the code according to them. Could you have a second look?

zhuzilin

comment created time in 21 days

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_discard.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++const std::unordered_set<string> kDataDiscarding = {

Changed.

zhuzilin

comment created time in 21 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 0fd550c5e6949402775381c551b93596770911f1

modification based on reviews

view details

push time in 21 days

pull request commenttensorflow/tensorflow

[XLA] Remove cross device nodes from clusters

@tpopp Sorry that I'm not familiar with the "AlwaysCompile" semantic... However, I found out that there are test errors for both gpu and cpu tests (like add_n_test_cpu) and for cpu ones, I think there shouldn't be any cross device edges. BTW, could you tell me how to run the add_n_test? I tried to use bazel build and run the binary but it exited with:

  File "/root/ttensorflow/bazel-bin/tensorflow/compiler/tests/add_n_test_gpu.runfiles/org_tensorflow/tensorflow/compiler/tests/xla_test.py", line 89, in __init__
    for name in FLAGS.types.split(',')
AttributeError: 'NoneType' object has no attribute 'split'

The error is caused by the flag in xla_test.py, but I couldn't pass the flag to the binary...

zhuzilin

comment created time in 22 days

issue closedzhuzilin/monkey

No need for dummy methods like `expressionNode()`

In GoLang interfaces are implemented implicitly, meaning that there is no way to say IfExpression implements Expression that is why you need to implement one of its methods expressionNode() to say that this struct implements Expression interface. In C++ you don't need that, just saying class IfExpression : public Expression ... is enough, I believe.

closed time in 23 days

orkhan-huseyn

issue commentzhuzilin/monkey

No need for dummy methods like `expressionNode()`

@orkhan-huseyn You are absolutely right... I guess I was blindly copy the Go code without understanding it. Thank you for your advice and I've changed it with commit 4ce753770f80.

orkhan-huseyn

comment created time in 23 days

push eventzhuzilin/monkey

zilinzhu

commit sha 4ce753770f802c2208540d25875774c32e122d64

remove the unnecessary expressionNode()

view details

push time in 23 days

pull request commenttensorflow/tensorflow

[XLA] Remove cross device nodes from clusters

@gbaned I'd love to. @tpopp I've just checked the building errors. All of them seem to have nothing to do with the change made in this pr. Instead, they have triggered the following error:

2020-06-16 12:07:21.049677: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at xla_compile_on_demand_op.cc:209 : Invalid argument: Unsupported type in DataTypeToPrimitiveType: 'variant'
Fatal Python error: Segmentation fault

Do you have any idea the cause of this problem? Thank you :).

zhuzilin

comment created time in 24 days

pull request commenttensorflow/tensorflow

[XLA] Remove cross device nodes from clusters

@tpopp I've removed those tests. Could you have another look :).

zhuzilin

comment created time in 24 days

push eventzhuzilin/tensorflow

zilinzhu

commit sha 8cabd2c25593fa37fa99e8b212a9db6e5d90c157

remove unnecessary tests

view details

push time in 24 days

push eventzhuzilin/talent-plan

zilinzhu

commit sha a56982fa82d5cd6277fb4e1b90424d6b6874ed9f

add dev-dependencies

view details

zilinzhu

commit sha 82c123b240b7777b8f40a2962d5bcb33c17719ca

finish project-1

view details

push time in a month

push eventzhuzilin/talent-plan

zilinzhu

commit sha 84bf571fcbc00da4182c47b7389842d794290049

set project-1 to its initial status

view details

push time in a month

fork zhuzilin/talent-plan

open source training courses about distributed database and distributed systemes

https://university.pingcap.com/talent-plan/

fork in a month

pull request commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

@jsimsa Thank you for your quick review! I've added the python test and updated the code according to the review. Could you have a second look?

zhuzilin

comment created time in a month

push eventzhuzilin/tensorflow

zilinzhu

commit sha 31827dbc7c3ec8c3a45f3c8813d1a7cfd35774dd

add python test

view details

push time in a month

push eventzhuzilin/tensorflow

zilinzhu

commit sha eca4d50cadefa3e3621b82efe14d1f83708676d9

update misleading doc

view details

push time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "Whether to fuse filter dataset that predicts random_uniform < rate into "       "a sampling dataset. If None, defaults to False.") +  hoist_data_discarding_ops = options.create_option(+      name="hoist_data_discarding_ops",+      ty=bool,+      docstring=+      "Whether to hoist ops that will discard data (such as skip, take, shard)"+      "out of map transformations. If None, defaults to False.")

modified.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}

changed.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",

fixed.

zhuzilin

comment created time in a month

push eventzhuzilin/tensorflow

zilinzhu

commit sha f86b97ab3d3055772594053377d541562e1be95d

updates based on reviews

view details

push time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {

@jsimsa I think the current code behaves basically the same as your suggestion. The outer do .. while loop is used in case there are new potential op that can be hoist after a round of optimization. For example, if we have

TFRecord -> ParallelMap -> Cache -> Skip -> Take -> Repeat

The first round it may only hoist skip and we need a second round to hoist take.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {+      if (IsDataDiscarding(node)) {+        NodeDef* start = &node;+        NodeDef* start_parent = graph_utils::GetInputNode(*start, graph);+        while (IsCardinalityPreserving(*start_parent) &&+               NumOutputs(*start_parent, graph.graph()) == 1) {+          start = start_parent;+          start_parent = graph_utils::GetInputNode(*start, graph);+        }+        // no cardinality preserving op with indegree 1.+        if (start->name() == node.name()) {+          continue;+        }+        NodeDef hoisted_node = node;+        if (!absl::StartsWith(node.name(), "hoist_data_dsicarding_op/")) {

changed.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+#define TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_++#include "tensorflow/core/grappler/optimizers/data/optimizer_base.h"++namespace tensorflow {+namespace grappler {++// This optimization hoists the data discarding ops (such as `skip`, `take` and+//  `shard`) to avoid unnecessary computation.+class HoistDataDiscardingOps : public TFDataOptimizerBase {

renamed.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#ifndef TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_+#define TENSORFLOW_CORE_GRAPPLER_OPTIMIZERS_DATA_HOIST_DATA_DISCARDING_OPS_H_++#include "tensorflow/core/grappler/optimizers/data/optimizer_base.h"++namespace tensorflow {+namespace grappler {++// This optimization hoists the data discarding ops (such as `skip`, `take` and+//  `shard`) to avoid unnecessary computation.+class HoistDataDiscardingOps : public TFDataOptimizerBase {+ public:+  HoistDataDiscardingOps() = default;+  ~HoistDataDiscardingOps() override = default;++  string name() const override { return "hoist_data_discarding_ops"; };

renamed.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {+    "CacheDataset", "CacheDatasetV2", "PrefetchDataset",+    "MapDataset", "ParallelMapDataset", "ParallelMapDatasetV2",+};++bool IsDataDiscarding(const NodeDef& node) {+  for (const auto& data_discarding_op : kDataDiscarding) {+    if (node.op() == data_discarding_op) {+      return true;+    }+  }+  return false;+}++bool IsCardinalityPreserving(const NodeDef& node) {+  for (const auto& cardinality_preserving_op : kCardinalityPreserving) {+    if (node.op() == cardinality_preserving_op) {+      return true;+    }+  }+  return false;+}++}  // namepsace++Status HoistDataDiscardingOps::OptimizeAndCollectStats(Cluster* cluster,+                                                       const GrapplerItem& item,+                                                       GraphDef* output,+                                                       OptimizationStats* stats) {+  *output = item.graph;+  MutableGraphView graph(output);+  bool updated;+  do {+    updated = false;+    for (NodeDef node : graph.graph()->node()) {+      if (IsDataDiscarding(node)) {+        NodeDef* start = &node;+        NodeDef* start_parent = graph_utils::GetInputNode(*start, graph);+        while (IsCardinalityPreserving(*start_parent) &&+               NumOutputs(*start_parent, graph.graph()) == 1) {

@jsimsa I'm not sure if the following code will make the map dataset have 2 outputs:

ds = ...
ds = ds.map(lambda x: x+1)
ds1 = ds.repeat()
ds2 = ds.take(10)

If there will be no such issue, I'll remove the check.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 class OptimizationOptions(options.OptionsBase):       "Whether to fuse filter dataset that predicts random_uniform < rate into "       "a sampling dataset. If None, defaults to False.") +  hoist_data_discarding_ops = options.create_option(+      name="hoist_data_discarding_ops",

renamed.

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

+/* Copyright 2020 The TensorFlow Authors. All Rights Reserved.++Licensed under the Apache License, Version 2.0 (the "License");+you may not use this file except in compliance with the License.+You may obtain a copy of the License at++    http://www.apache.org/licenses/LICENSE-2.0++Unless required by applicable law or agreed to in writing, software+distributed under the License is distributed on an "AS IS" BASIS,+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.+See the License for the specific language governing permissions and+limitations under the License.+==============================================================================*/++#include "tensorflow/core/grappler/optimizers/data/hoist_data_discarding_ops.h"++#include "absl/container/flat_hash_set.h"+#include "tensorflow/core/framework/attr_value.pb.h"+#include "tensorflow/core/framework/node_def.pb.h"+#include "tensorflow/core/grappler/clusters/cluster.h"+#include "tensorflow/core/grappler/grappler_item.h"+#include "tensorflow/core/grappler/mutable_graph_view.h"+#include "tensorflow/core/grappler/op_types.h"+#include "tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.h"+#include "tensorflow/core/grappler/optimizers/data/function_utils.h"+#include "tensorflow/core/grappler/optimizers/data/graph_utils.h"+#include "tensorflow/core/grappler/utils.h"+#include "tensorflow/core/platform/protobuf.h"++namespace tensorflow {+namespace grappler {+namespace {++constexpr std::array<const char*, 3> kDataDiscarding = {+    "ShardDataset", "SkipDataset", "TakeDataset",+};++constexpr std::array<const char*, 6> kCardinalityPreserving = {

As for concatenate, I think it's hard to hoist those ops when we don't know the cardinality figure. And for shuffle, I think that the hoisting may change the ouput. I can add support for zip (enumerate) I guess, however, in that case, it will be a condition IsZip and some extra implementation because right now the supported ops are all unary. I wonder if we could first make the basic functionality work and then let me create another pr for zip?

zhuzilin

comment created time in a month

Pull request review commenttensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

 cc_library(     ] + tf_protos_all(), ) +cc_library(+    name = "hoist_data_discarding_ops",

Renamed.

zhuzilin

comment created time in a month

push eventzhuzilin/tensorflow

zilinzhu

commit sha 0c296013f624e35232570b5ffd1575b77a6a0d93

rename the pass the hoist_discard

view details

push time in a month

PR opened tensorflow/tensorflow

[tf.data] Add grappler pass to hoist data-discarding ops

This pr adds a tf.data grappler pass which is used to hoist the data-discarding ops like shard, skip and take. In this way there will be less unnecessary calculation or cache. For example, this pass will turn this code:

def parse_and_preprocessing(x):
  # very slow

ds = tf.data.Dataset.TFRecordDataset("example.tfrecord")
ds = ds.map(parse_and_preprocessing, num_parallel_calls=10)
ds = ds.cache()
ds = ds.skip(100)
ds = ds.take(1000)
ds = ds.repeat()

into

# ...

ds = tf.data.Dataset.TFRecordDataset("example.tfrecord")
ds = ds.skip(100)
ds = ds.take(1000)
ds = ds.map(parse_and_preprocessing, num_parallel_calls=10)
ds = ds.cache()
ds = ds.repeat()

This transformation will enable the cache and avoid some unnecessary background calculation in map. Also, this pass will help to make a consistently processing map possible (in other words, let the MapDataset continue to process whenever there is a empty thread instead of scheduling a element only after a GetNext call.). Both of these problems are partially discussed in #39992 .

The design of this pass is basically the same as noop_elimination or inject_prefetch and its inner logic is:

  1. Find a node that will discard data. (Right now there are take, skip and shard). Let's use the above code as an example:
TFRecord -> ParallelMap -> Cache -> | Skip | -> Take -> Repeat
  1. Find the chain of ops that will not change the order of the input and is connected to the node in step 1. (Right now there are map, prefetch and cache.)
TFRecord -> { ParallelMap -> Cache } -> | Skip | -> Take -> Repeat
  1. Move the node from the end of the chain to the start.
TFRecord -> | Skip | -> { ParallelMap -> Cache } -> Take -> Repeat
  1. If there is an update of the graph, repeat 1 - 3. (For the example, take will be moved in the second round).

gently ping @jsimsa

Thank you for your time on reviewing this pr.

+381 -1

0 comment

10 changed files

pr created time in a month

create barnchzhuzilin/tensorflow

branch : hoist-data-discarding-op

created branch time in a month

issue commenttensorflow/tensorflow

Suboptimal execution order of parallel map calls for tf.data

@jsimsa Great! Thank you for your advice and I'll start working on that.

eriikj

comment created time in a month

pull request commenttensorflow/tensorflow

[XLA] Remove cross device nodes from clusters

@tpopp I've changed the test according to this pr. Right now there should be no building failure related to the new cluster rule.

I personally would vote for still making CPUs special and allowing them through because there was good reason at the time, but I won't block you doing otherwise if I don't see a benchmark regression (like was currently the case).

For simplicity, I remain removing all cpu ops because it may require the same "hardcoded check between the names".

zhuzilin

comment created time in a month

push eventzhuzilin/tensorflow

zilinzhu

commit sha 17e06bd3fc0b894eeeb5596d367ff90b8b60b8ae

change tests for cross device edges

view details

push time in a month

more