profile
viewpoint

startedtensorflow/lucid

started time in 3 minutes

issue commenttensorflow/addons

Adding sample weight parameter to metrics

@AndersonHappens Thanks for the comment. PR is in progress for both.

SSaishruthi

comment created time in 5 minutes

fork xuenbin/tensorflow

An Open Source Machine Learning Framework for Everyone

https://tensorflow.org

fork in 7 minutes

push eventtensorflow/rust

Andrea Catania

commit sha b968cff28792c4b2256b82dffac23e6c045f59b9

Added set and get to the Tensor

view details

Adam Crume

commit sha dc20fa5c99ee2a4fbae928d2e02f5adcb3de5014

Merge pull request #223 from AndreaCatania/setget Added set and get to the Tensor

view details

push time in 7 minutes

PR merged tensorflow/rust

Added set and get to the Tensor

Added set and get to the Tensor, closes #222.

You should consider to use usize instead of u64 for the Tensor dims, so we don't need to convert u64 to usize.

+54 -0

9 comments

1 changed file

AndreaCatania

pr closed time in 7 minutes

issue closedtensorflow/rust

Set data to a Tensor in Row/Column style

I can create a 2x2 Matrix in this way:

let mut matrix = Tensor::new(&[2, 2]);

I would like to set its data in row / column style, something like:

// Using a macro to generate the Index
matrix[get_index(1, 1)] = 1.0;

// Using a function
matrix.set(1, 1, 1.0);

I'm not able to find any function / macro to perform this kind of operation, there is a reason for this? (In case I'm available to submit a PR to implement it).

closed time in 7 minutes

AndreaCatania

pull request commenttensorflow/rust

Added set and get to the Tensor

Thanks for the contribution!

AndreaCatania

comment created time in 8 minutes

pull request commenttensorflow/rust

Added set and get to the Tensor

We can't change the Tensor dimensions from u64 to usize without making a breaking change, unless we simply add separate methods. It would also mean a mismatch between Tensor dimension types and dimension types for tensors in the graph which are not fully materialized on the current machine. We also have to consider sparse tensors, which could easily have a u64 address space, even if only a couple of items are filled out.

This is something we'll need to revisit before moving to version 1.0.

AndreaCatania

comment created time in 9 minutes

issue commenttensorflow/tensorflow

Surprising random seed behavior when using @tf.function

Hi @wangershi ,

Thanks for your feedback. Just to be clear, are you planning to go for option 3 in my previous comment? In other words, will the new (simpler) behavior replace the old one or will they live side by side (if so, how)?

Also, I'd like to suggest providing a set_seed() method in the Generator class that calls reset_from_seed() if a seed is provided or reset_from_non_deterministic_state() if not. IMHO, it would be more convenient, more intuitive and less verbose.

Regarding the initial state, I see your point, it makes sense. Perhaps the initial state could be saved by default when creating the Generator (since most people won't create millions of Generators, RAM usage should be fine) and have an option in the constructor such as save_initial_state=True so that people who need the extra RAM can set this to False if they want? Or perhaps set it to False by default and make the reset() method print our a clear error message if the user doesn't provide a state: "You need to set save_initial_state=True when constructing the Generator if you want to be able to call reset without any argument". Not sure about this.

ageron

comment created time in 9 minutes

push eventtensorflow/tfjs

Ann Yuan

commit sha 2f56dabdb4ff0c492f1be175ee40a105fd298819

webgpu: Improve benchmarks page readability. (#2035) DEV

view details

Ann Yuan

commit sha 346c85d3c3381b3c0fb59a9ae915cdc89b8e0abd

core: Fuse relu6 activation (#2037) FEATURE PERF

view details

Ann Yuan

commit sha d3d805ab3dff15c125678828bd9927baada9e3c5

webgpu: Upgrade tfjs-core dependency. (#2038) DEV

view details

ted chang

commit sha 8183f1db0bf6483f7ba01df83e62da805e6ea5ab

fix incorrect Erf op output (#2027) BUG The Erf output should be between -1 and 1, but I got [-1.0235416, -7.3770967, 80427.2890625, 1.0000005] when I ran below code: ``` tf = require('@tensorflow/tfjs') const x = tf.tensor1d([-1.4,-2.5,-3.1,-4.4]); x.erf().print();

view details

esouthren

commit sha a3927b56d919ea45c89f5d8fb82cec20ef674b04

Lazily instantiate the TextEncoder in platform_browser (#2032) BUG This makes browsers that don't support TextEncoders fail later only when encoding happens (which typically is rare).

view details

Daniel Smilkov

commit sha 2ad138c661048f4ad391b270d233f3fa9452301c

save

view details

Daniel Smilkov

commit sha a29e07bbbc6ae33bc4693a070663f75e93026ad5

2ad138c661048f4ad391b270d233f3fa9452301c

view details

Nikhil Thorat

commit sha 6d3a9a994bfd67fc721a6a4f699bedb060045a8f

Bring back the dist importing lint rule (#2039) DEV This was lost during the move to the monorepo. I removed dist importing where possible in packages. Now the custom lint rule runs for all packages (except data which doesn't share lint rules yet because it has a circular dependency).

view details

Daniel Smilkov

commit sha dd5d9ed60e9cadf0426304d5a470504fd9b6d44a

Fix access to proto enum fields (#2040) Fix access to the proto enum fields. Only newest protobuf version allows enum constants to be accessed via the enum name. However, if users have already installed an older version of protobuf which satisfies requirement constraints of `tensorflowjs`, they will get the following error: ``` File "/usr/local/lib/python3.6/dist-packages/tensorflowjs/converters/fuse_prelu.py", line 42, in register_prelu_op value.list.type.extend([types_pb2.DataType.DT_FLOAT]) AttributeError: 'EnumTypeWrapper' object has no attribute 'DT_FLOAT' ``` BUG

view details

Xu Xing

commit sha 0ffc2b53494aa234bc9b0398b6e03b192b79ce4b

Implement operator greater and greaterEqual (#2028) FEATURE Fix https://github.com/tensorflow/tfjs/issues/1700

view details

Xu Xing

commit sha 9529b153f927d1e656fcf9d6fc92785f3680ddf3

Replace mul+add with fma (#2047) FEATURE

view details

Kevin VanGelder

commit sha 3cc913494782b72d3d04662633fcc9046ce16c5c

[tfjs-react-native] Update setup instructions to reflect required async-storage dependency DOC

view details

Daniel Smilkov

commit sha 84ae6e04fd7126e1cc6184735a5fb1b14029cfd3

Resolve to absolute paths before reading json files (#2052) BUG The problem arises when building a par (Python Archive) internally, that depends on our converter tool. This resulted in the following error: ``` .../third_party/py/tensorflowjs/converters/tf_saved_model_conversion_v2.py", line 91, in validate for filename in os.listdir(op_list_path): NotADirectoryError: [Errno 20] Not a directory: '.../third_party/py/tensorflowjs/converters/../op_list/' ```

view details

Daniel Smilkov

commit sha c4f3b775d869bacd388c50b3b1891ddfc4bc9172

Bump automl alpha version (#2053) INTERNAL Bump automl alpha version, which now includes the object detection API.

view details

Ann Yuan

commit sha 3bcc71ab917bf227c67128fd2c152d85a79e2466

fix (#2051) BUG

view details

Ann Yuan

commit sha 990eaee02dd0f7c1b6fe2790e6b45600980a5ae7

core: Fix tf.where on CPU to broadcast scalar condition. (#2056) BUG

view details

Xu Xing

commit sha 035319aa06c7742ed5320a816dc759b4293a703a

Revert "Replace mul+add with fma" (#2060) BUG

view details

Yannick Assogba

commit sha 7a657b8b4e1eb5596eae4b27ab9fee6341b0260d

Add chrome on android to CI tests (#2015) This adds a chrome on android to our test matrix. It includes fixes to get isNaN and thus NaN propagation working. As well as fix to concat that was occurring on Pixel 3 (this was mostly likely due to a compiler/driver issue not the original code). DEV

view details

Orta

commit sha 4a498fe7502b8751138f780aa5987197518e8b17

[tfjs-react-native] Update README.md (#1995) DOC

view details

Yannick Assogba

commit sha cf1d1fbd03dc076c2992054eee222f4a0d5eb78f

[tfjs-vis] Support custom tickLabels with duplicate values in heatmaps (#2012) BUG * Support custom tickLabels with duplicate values. Fixes #1201 * Improve tooltips on confusion matrix

view details

push time in 9 minutes

startedtensorflow/tfjs

started time in 9 minutes

fork ANGDL/models

Models and examples built with TensorFlow

fork in 10 minutes

issue commenttensorflow/mlir

GPU to NVVM test fails

@bondhugula if you don't mind trying this: https://github.com/tensorflow/mlir/pull/193

bondhugula

comment created time in 10 minutes

push eventtensorflow/tfx

tfx-team

commit sha ce5ec087fa70ad4e433d6d3d206c650caad54d8d

Fix TFX CLI container builder bug PiperOrigin-RevId: 274527963

view details

push time in 10 minutes

PR opened tensorflow/mlir

Reviewers
Use a SmallVector instead of an ArrayRef to materialize a temporary local array

This pattern is error prone and unfortunately none of the sanitizer is catching it at the moment.

+1 -1

0 comment

1 changed file

pr created time in 11 minutes

startedtensorflow/models

started time in 11 minutes

issue commenttensorflow/addons

Adding sample weight parameter to metrics

@SSaishruthi , MultiLabelConfusionMatrix and RSquare both still need the sample weight parameter added as well.

Thank you for your hard work!

SSaishruthi

comment created time in 14 minutes

PR closed tensorflow/tensorflow

Fix the issue for importing googletest.h cla: yes ready to pull size:S

This PR removes #include "testing/base/public/googletest.h" in the input_generator_test and util_test so that these test can also run on the local environment. Otherwise, there will be an error below:

ERROR: /home/abc/tensorflow/tensorflow/lite/testing/kernel_test/BUILD:33:1: C++ compilation of rule '//tensorflow/lite/testing/kernel_test:util_test' failed (Exit 1)
tensorflow/lite/testing/kernel_test/util_test.cc:21:10: fatal error: testing/base/public/googletest.h: No such file or directory
 #include "testing/base/public/googletest.h"
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Target //tensorflow/lite/testing/kernel_test:util_test failed to build
Use --verbose_failures to see the command lines of failed build steps.

Fixes #26671

+23 -4

2 comments

3 changed files

feihugis

pr closed time in 15 minutes

pull request commenttensorflow/tensorflow

Fix the issue for importing googletest.h

As this issue has been fixed by @nairb774 via (https://github.com/tensorflow/tensorflow/commit/1c88fff548acb048f2fe64fd2564e885759def12, https://github.com/tensorflow/tensorflow/commit/c9c0e70d0b1d6d4fd618c5ebdebd859dbe59848f, https://github.com/tensorflow/tensorflow/commit/db5816707ab0fc4ae6d3959a313d67dd871a8d9d), I will close this PR. Thanks for your fix, @nairb774 !

feihugis

comment created time in 15 minutes

pull request commenttensorflow/models

[mnist] Use compat.v1 to support TF 2.0

Thanks. Just a nit for pylint error.

minoring

comment created time in 15 minutes

issue commenttensorflow/mlir

GPU to NVVM test fails

The error I see in your output:

/home/uday/llvm-project/llvm/projects/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir split at line #39:12:15: error: op "add" is invalid
    %result = "gpu.all_reduce"(%arg0) ({}) {op = "add"} : (f32) -> (f32)
              ^

Seems to be emitted here: https://github.com/tensorflow/mlir/blob/master/lib/Dialect/GPU/IR/GPUDialect.cpp#L147

I think that the issue is that the ArrayRef is invalid.

Can you try replacing it with a SmallVector? (I'll send a patch internally)

bondhugula

comment created time in 17 minutes

push eventtensorflow/models

Tyler

commit sha 9317f3b4b91daa7e727bbb4a8e4de23e0004d8b9

Fixed tfe.Variable error (#7674) After Eager was moved to core Tensorflow, this notebook gives the error: AttributeError: module 'tensorflow.contrib.eager' has no attribute 'Variable' I just fixed it.

view details

push time in 17 minutes

PR merged tensorflow/models

Fixed tfe.Variable error cla: yes

After Eager was moved to core Tensorflow, this notebook gives the error: AttributeError: module 'tensorflow.contrib.eager' has no attribute 'Variable' I just fixed it.

+1 -2

4 comments

1 changed file

Tylersuard

pr closed time in 18 minutes

issue commenttensorflow/tfjs

Using the video frames as a texture in tf.fromPixels without canvas element

@dsmilkov Did you need any further clarification on the code samples etc?

chaosmail

comment created time in 18 minutes

issue commenttensorflow/tfjs

Cannot use relu6 with webgl

Thanks @pyu10055 for the explanation! I'm familiar with the "load_model with custom_objects" method from the Python side as I've had to deal with it there. I didn't see a corresponding custom objects parameter etc. for tensorflowjs_converter (nor am I quite sure how one would add that flexibility in the future), so I just tried my hack. I may have to give your method a shot in the future as it sounds cleaner for sure; I haven't played as much with TF's underlying formats yet vs. Keras .h5 format so thanks for the suggestion on a better supported path.

kingsharaman

comment created time in 20 minutes

fork condoleezza/tensorboard

TensorFlow's Visualization Toolkit

fork in 20 minutes

startedtensorflow/tensorflow

started time in 21 minutes

startedtensorflow/tensorflow

started time in 21 minutes

issue commenttensorflow/mlir

GPU to NVVM test fails

The commit you're pointing at is passing our CI on Ubuntu, Mac, and Windows: https://github.com/tensorflow/mlir/commits/master

Can you provide more info your host environment (OS version, host compiler, and LLVM version) so I can try to repro?

bondhugula

comment created time in 22 minutes

issue closedtensorflow/benchmarks

benchmark cpu training has poor performance

Hello, I have two servers, and no gpu in them, I want to know whether TensorFlow CPU can run distributed training or not. So I run tf_cnn_benchmark.py in the two servers to run distributed TensorFlow. But I find that it is too slow during distributed training with TensorFlow CPU. I save screen output of controllerworker1 and worker2 when training Alexnet with batch size. Benchmark train 1000 steps then exit, and use 2 hours to complete warm up and training.
Below is screen output: controller:

[gpu4] 2_node(cnn_tf_v1.12_zxy*) $ sh controller.sh 
+ . ./env.sh
+ batch_size=32
+ protocol=grpc
+ host0=10.0.22.3
+ host1=10.0.24.3
+ host2=10.0.26.3
+ host3=10.0.28.2
+ controller_host=10.0.22.3
+ all_reduce_alg=xring
+ benchmark_model=alexnet
+ variable_update_method=distributed_all_reduce
+ export PROTOCOL=grpc
+ export WORKER1=10.0.22.3:5000
+ export TF_PS1=10.0.22.3:6000
+ export WORKER2=10.0.24.3:5000
+ export TF_PS2=10.0.24.3:6000
+ export WORKER3=10.0.26.3:5000
+ export TF_PS3=10.0.26.3:6000
+ export WORKER4=10.0.28.2:5000
+ export TF_PS4=10.0.28.2:6000
+ export TRAIN_MODEL=alexnet
+ export BATCH_SIZE_PER_GPU=32
+ export ALL_REDUCE_ALG=xring
+ export CONTROLLER_HOST=10.0.22.3:6000
+ export VARIABLE_UPDATE=distributed_all_reduce
+ export PYTHONPATH=/home/zxy/models-master
+ export CUDA_VISIBLE_DEVICES=0,1,2,3
+ rm -rf /tmp/bench_log/benchmark_run.log /tmp/bench_log/metric.log
+ echo TRAIN BEGIN AT:
TRAIN BEGIN AT:
+ date
Tue Mar 19 05:40:17 UTC 2019
+ [ distributed_all_reduce = distributed_all_reduce ]
+ python /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --worker_hosts=10.0.22.3:5000,10.0.24.3:5000 --controller_host=10.0.22.3:6000 --job_name=controller --variable_update=distributed_all_reduce --local_parameter_device=cpu --use_fp16 --batch_size=32 --device cpu --num_gpus=1 --model=alexnet --data_format NHWC --task_index=0 --server_protocol=grpc --all_reduce_spec=xring --benchmark_log_dir=/tmp/bench_log
TensorFlow:  1.13
Model:       alexnet
Dataset:     imagenet (synthetic)
Mode:        BenchmarkMode.TRAIN
SingleSess:  True
Batch size:  64 global
             32 per device
Num batches: 100
Num epochs:  0.00
Devices:     ['job:worker/replica:0/task0/cpu:0', 'job:worker/replica:0/task1/cpu:0']
Data format: NHWC
Optimizer:   sgd
Variables:   distributed_all_reduce
AllReduce:   xring
Sync:        True
==========
2019-03-19 05:40:19.948366: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-03-19 05:40:19.952712: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x528def0 executing computations on platform Host. Devices:
2019-03-19 05:40:19.952773: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
Generating training model
W0319 05:40:20.001632 139817988011776 deprecation.py:317] From /home/zxy/.local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
W0319 05:40:20.015480 139817988011776 deprecation.py:317] From /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:126: conv2d (from tensorflow.python.layers.convolutional) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.conv2d instead.
W0319 05:40:20.034499 139817988011776 deprecation.py:317] From /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:250: max_pooling2d (from tensorflow.python.layers.pooling) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.max_pooling2d instead.
W0319 05:40:20.126064 139817988011776 deprecation.py:317] From /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py:386: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
W0319 05:40:20.126800 139817988011776 deprecation.py:500] From /home/zxy/.local/lib/python2.7/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
W0319 05:40:20.168297 139817988011776 deprecation.py:317] From /home/zxy/.local/lib/python2.7/site-packages/tensorflow/python/ops/losses/losses_impl.py:209: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Initializing graph
W0319 05:40:20.894763 139817988011776 deprecation.py:317] From /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py:2166: __init__ (from tensorflow.python.training.supervisor) is deprecated and will be removed in a future version.
Instructions for updating:
Please switch to tf.train.MonitoredTrainingSession
check rpc_layer values
I0319 05:40:33.383174 139817988011776 session_manager.py:491] Running local_init_op.
I0319 05:40:33.837728 139817988011776 session_manager.py:493] Done running local_init_op.
Running warm up
Done warm up      
Step    Img/sec total_loss
1       images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
10      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
20      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
30      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
40      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
50      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
60      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
70      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
80      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
90      images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
100     images/sec: 1.0 +/- 0.0 (jitter = 0.0)  nan
----------------------------------------------------------------
total images/sec: 0.99
----------------------------------------------------------------
+ echo TRAIN END AT:
TRAIN END AT:
+ date
Tue Mar 19 07:39:09 UTC 2019

worker1:

[gpu4] 2_node(cnn_tf_v1.12_zxy*) $ sh worker1.sh 
+ . ./env.sh
+ batch_size=32
+ protocol=grpc
+ host0=10.0.22.3
+ host1=10.0.24.3
+ host2=10.0.26.3
+ host3=10.0.28.2
+ controller_host=10.0.22.3
+ all_reduce_alg=xring
+ benchmark_model=alexnet
+ variable_update_method=distributed_all_reduce
+ export PROTOCOL=grpc
+ export WORKER1=10.0.22.3:5000
+ export TF_PS1=10.0.22.3:6000
+ export WORKER2=10.0.24.3:5000
+ export TF_PS2=10.0.24.3:6000
+ export WORKER3=10.0.26.3:5000
+ export TF_PS3=10.0.26.3:6000
+ export WORKER4=10.0.28.2:5000
+ export TF_PS4=10.0.28.2:6000
+ export TRAIN_MODEL=alexnet
+ export BATCH_SIZE_PER_GPU=32
+ export ALL_REDUCE_ALG=xring
+ export CONTROLLER_HOST=10.0.22.3:6000
+ export VARIABLE_UPDATE=distributed_all_reduce
+ export PYTHONPATH=/home/zxy/models-master
+ export CUDA_VISIBLE_DEVICES=
+ rm -rf /tmp/bench_log/benchmark_run.log
+ [ distributed_all_reduce = distributed_all_reduce ]
+ python /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --worker_hosts=10.0.22.3:5000,10.0.24.3:5000 --controller_host=10.0.22.3:6000 --job_name=worker --variable_update=distributed_all_reduce --local_parameter_device=cpu --use_fp16 --batch_size=32 --device cpu --num_gpus=1 --model=alexnet --data_format NHWC --task_index=0 --server_protocol=grpc --all_reduce_spec=xring --benchmark_log_dir=/tmp/bench_log
2019-03-19 05:40:25.204429: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-03-19 05:40:25.209204: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x533d7d0 executing computations on platform Host. Devices:
2019-03-19 05:40:25.209263: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-19 05:40:25.213366: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:252] Initialize GrpcChannelCache for job worker -> {0 -> localhost:5000, 1 -> 10.0.24.3:5000}
2019-03-19 05:40:25.219641: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:391] Started server with target: grpc://localhost:5000
TensorFlow:  1.13
Model:       alexnet
Dataset:     imagenet (synthetic)
Mode:        BenchmarkMode.TRAIN
SingleSess:  True
Batch size:  64 global
             32 per device
Num batches: 100
Num epochs:  0.00
Devices:     ['job:worker/replica:0/task0/cpu:0', 'job:worker/replica:0/task1/cpu:0']
Data format: NHWC
Optimizer:   sgd
Variables:   distributed_all_reduce
AllReduce:   xring
Sync:        True
==========
Starting worker 0
2019-03-19 05:40:33.025624: I tensorflow/core/distributed_runtime/master_session.cc:1192] Start master session 87dd15b7a9ea7a05 with config: gpu_options { experimental { } } allow_soft_placement: true experimental { collective_group_leader: "/job:worker/replica:0/task:0" }

worker2:

[gpu5] 2_node(cnn_tf_v1.12_zxy*) $ sh worker2.sh 
+ . ./env.sh
+ batch_size=32
+ protocol=grpc
+ host0=10.0.22.3
+ host1=10.0.24.3
+ host2=10.0.26.3
+ host3=10.0.28.2
+ controller_host=10.0.22.3
+ all_reduce_alg=xring
+ benchmark_model=alexnet
+ variable_update_method=distributed_all_reduce
+ export PROTOCOL=grpc
+ export WORKER1=10.0.22.3:5000
+ export TF_PS1=10.0.22.3:6000
+ export WORKER2=10.0.24.3:5000
+ export TF_PS2=10.0.24.3:6000
+ export WORKER3=10.0.26.3:5000
+ export TF_PS3=10.0.26.3:6000
+ export WORKER4=10.0.28.2:5000
+ export TF_PS4=10.0.28.2:6000
+ export TRAIN_MODEL=alexnet
+ export BATCH_SIZE_PER_GPU=32
+ export ALL_REDUCE_ALG=xring
+ export CONTROLLER_HOST=10.0.22.3:6000
+ export VARIABLE_UPDATE=distributed_all_reduce
+ export PYTHONPATH=/home/zxy/models-master
+ export CUDA_VISIBLE_DEVICES=
+ rm -rf /tmp/bench_log/*
+ [ distributed_all_reduce = distributed_all_reduce ]
+ python /home/zxy/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --worker_hosts=10.0.22.3:5000,10.0.24.3:5000 --controller_host=10.0.22.3:6000 --job_name=worker --variable_update=distributed_all_reduce --local_parameter_device=cpu --use_fp16 --batch_size=32 --device cpu --num_gpus=1 --data_format NHWC --model=alexnet --task_index=1 --server_protocol=grpc --all_reduce_spec=xring --benchmark_log_dir=/tmp/bench_log
2019-03-19 05:40:32.655562: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2200000000 Hz
2019-03-19 05:40:32.660308: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x42ef750 executing computations on platform Host. Devices:
2019-03-19 05:40:32.660364: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
2019-03-19 05:40:32.664347: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:252] Initialize GrpcChannelCache for job worker -> {0 -> 10.0.22.3:5000, 1 -> localhost:5000}
2019-03-19 05:40:32.670954: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:391] Started server with target: grpc://localhost:5000
TensorFlow:  1.13
Model:       alexnet
Dataset:     imagenet (synthetic)
Mode:        BenchmarkMode.TRAIN
SingleSess:  True
Batch size:  64 global
             32 per device
Num batches: 100
Num epochs:  0.00
Devices:     ['job:worker/replica:0/task0/cpu:0', 'job:worker/replica:0/task1/cpu:0']
Data format: NHWC
Optimizer:   sgd
Variables:   distributed_all_reduce
AllReduce:   xring
Sync:        True
==========
Starting worker 1

I output train start time and end time in controller,
train start at Tue Mar 19 05:40:17 UTC 2019 and end at Tue Mar 19 07:39:09 UTC 2019 which can be seen on screen output of controller, it use 2 hours to complete, it is so long, and train speed is too slow.
Why distributed training with TensorFlow CPU is so slow, thanks.

closed time in 25 minutes

Keepmoving-ZXY

issue commenttensorflow/benchmarks

benchmark cpu training has poor performance

@wei-v-wang Yesterday, I try tensorflow collective, train speed is also about 160 images/s with mkl, looks good.

Keepmoving-ZXY

comment created time in 26 minutes

issue commenttensorflow/tensorflow

Is there any documents of TF Core

@MarkDaoust I found this when we move our code ti TF2.0

KANGRuipeng

comment created time in 30 minutes

issue commenttensorflow/tensorflow

Is there any documents of TF Core

@jvishnuvardhan I know this and I have read related docs.

KANGRuipeng

comment created time in 31 minutes

fork joseph-hurtado/models

Models and examples built with TensorFlow

fork in 33 minutes

startedtensorflow/models

started time in 33 minutes

issue openedtensorflow/probability

A question about target_log_prob_fn

I'm doing MCMC and there are two RVs in my model, the batch shape of one is1 and the batch shape of another one is 64, i'm wondering what should i return as the target_log_prob_fn. The return shape should be (num_chain,1) or (num_chain,64). Thanks a lot.

created time in 34 minutes

startedtensorflow/graphics

started time in 34 minutes

startedtensorflow/models

started time in 37 minutes

startedtensorflow/model-analysis

started time in 38 minutes

issue commenttensorflow/tensorflow

Potential error in Codelab : Learning Tensorflow 2 : Computer Vision

@ravikyram I got to this codelab by searching for Tensorflow on this website : https://codelabs.developers.google.com/

Satwato

comment created time in 38 minutes

issue openedtensorflow/models

Object Detection API: Confidence score getting lower with increase in number of training steps

Please go to Stack Overflow for help and support:

http://stackoverflow.com/questions/tagged/tensorflow

Also, please understand that many of the models included in this repository are experimental and research-style code. If you open a GitHub issue, here is our policy:

  1. It must be a bug, a feature request, or a significant problem with documentation (for small docs fixes please send a PR instead).
  2. The form below must be filled out.

Here's why we have that policy: TensorFlow developers respond to issues. We want to focus on work that benefits the whole community, e.g., fixing bugs and adding features. Support only helps individuals. GitHub also notifies thousands of people when issues are filed. We want them to see you communicating an interesting problem, rather than being redirected to Stack Overflow.


System information

  • What is the top-level directory of the model you are using: models/research/object_detection
  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Unix
  • TensorFlow installed from (source or binary):source
  • TensorFlow version (use command below):1.4
  • Bazel version (if compiling from source):
  • CUDA/cuDNN version:
  • GPU model and memory:
  • Exact command to reproduce:

You can collect some of this information using our environment capture script:

https://github.com/tensorflow/tensorflow/tree/master/tools/tf_env_collect.sh

You can obtain the TensorFlow version with

python -c "import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)"

Describe the problem

I am trying to use the object detection API by TensorFlow to detect a particular pattern in a 3190X3190 image using faster_rcnn_inception_resnet_v2_atrous_coco. All my training images are of size 1140X1140. The pattern itself is of width 380 pixels and height 430 pixels. The pattern is made up of basic shapes such as rectangles and circles. Every rectangle and circle is of the same dimensions in both the inference and training images. I have 4000 training images. I notice even though I am getting reasonable results at around 5k steps with a confidence score of around 98%, the maximum confidence score drops to around 5% when the number of training steps increase to 8k. I have noticed almost the same thing if I increase the size of the training image to 1290x1290 by padding a black ring around it. However, I get good results if I use training images of size 1500x1500 even if the number of steps is 25k. I have also tried with training images of size 3190x3190 (i.e. same size as the inference image) and got detections with really low confidence score.

Is there any particular reason for this behavior? I would be interested to know as to how the size of the training image affecting the detection results. I am using version 1.4 of the object-detection API.

Source code / logs

Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached. Try to provide a reproducible test case that is the bare minimum necessary to generate the problem.

created time in 41 minutes

issue openedtensorflow/mlir

GPU to NVVM test fails

This test currently fails on the trunk (705a743e69ebb3da93f494cb92b9dea7b852235c):

[240/241] Running the MLIR regression tests FAIL: MLIR :: Conversion/GPUToNVVM/gpu-to-nvvm.mlir (103 of 320) ******************** TEST 'MLIR :: Conversion/GPUToNVVM/gpu-to-nvvm.mlir' FAILED ******************** Script:

: 'RUN: at line 1'; /home/uday/llvm-project/build.release/bin/mlir-opt /home/uday/llvm-project/llvm/projects/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir -lower-gpu-ops-to-nvvm-ops -split-input-file | /home/uday/llvm-project/build.release/bin/FileCheck /home/uday/llvm-project/llvm/projects/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir

Exit Code: 1

Command Output (stderr):

/home/uday/llvm-project/llvm/projects/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir split at line #39:12:15: error: op "add" is invalid %result = "gpu.all_reduce"(%arg0) ({}) {op = "add"} : (f32) -> (f32) ^ /home/uday/llvm-project/llvm/projects/mlir/test/Conversion/GPUToNVVM/gpu-to-nvvm.mlir:42:18: error: CHECK-LABEL: expected string not found in input // CHECK-LABEL: func @gpu_all_reduce_op() ^ <stdin>:5:2: note: scanning from here attributes {gpu.kernel} { ^ <stdin>:37:7: note: possible intended match here llvm.func @gpu_all_reduce_region() ^

--


FAIL: MLIR :: Dialect/GPU/ops.mlir (119 of 320) ******************** TEST 'MLIR :: Dialect/GPU/ops.mlir' FAILED ******************** Script:

: 'RUN: at line 1'; /home/uday/llvm-project/build.release/bin/mlir-opt /home/uday/llvm-project/llvm/projects/mlir/test/Dialect/GPU/ops.mlir | /home/uday/llvm-project/build.release/bin/FileCheck /home/uday/llvm-project/llvm/projects/mlir/test/Dialect/GPU/ops.mlir

Exit Code: 2

Command Output (stderr):

/home/uday/llvm-project/llvm/projects/mlir/test/Dialect/GPU/ops.mlir:83:14: error: op "add" is invalid %sum = "gpu.all_reduce"(%one) ({}) {op = "add"} : (f32) -> (f32) ^ FileCheck error: '-' is empty. FileCheck command line: /home/uday/llvm-project/build.release/bin/FileCheck /home/uday/llvm-project/llvm/projects/mlir/test/Dialect/GPU/ops.mlir

--


Testing Time: 1.44s


Failing Tests (2): MLIR :: Conversion/GPUToNVVM/gpu-to-nvvm.mlir MLIR :: Dialect/GPU/ops.mlir

Expected Passes : 292 Unsupported Tests : 26 Unexpected Failures: 2

created time in 41 minutes

fork yhzhangyu/hub

A library for transfer learning by reusing parts of TensorFlow models.

https://tensorflow.org/hub

fork in 44 minutes

startedtensorflow/models

started time in 44 minutes

push eventtensorflow/tfx

tfx-team

commit sha 0656dc1e4c0e2cb2861f88318475b622617f043e

Fix TFX CLI container builder bug PiperOrigin-RevId: 274527963

view details

push time in an hour

fork WilliamPoch/models

Models and examples built with TensorFlow

fork in an hour

fork KennethChewx/tensorflow

An Open Source Machine Learning Framework for Everyone

https://tensorflow.org

fork in an hour

issue closedtensorflow/tensorflow

tf.function tracing when input tensor varies

Dear experts, In tf 2.0, when using the tf.function decorator, if the shape of the input tensors varies, it seems TF will create a new graph every single time. Is there a way to get around this?

import tensorflow as tf

@tf.function
def add(a, b):
    print('Addition')
    return a + b
add(tf.constant(1), tf.constant(2))
add(tf.constant([1, 2]), tf.constant([2, 4]))
add(tf.constant([333, 2]), tf.constant([2, 4444]))

Output:

Addition
Addition

closed time in an hour

jakezhaojb

issue commenttensorflow/tensorflow

tf.function tracing when input tensor varies

Closing.

Found input_signature to solve this.

jakezhaojb

comment created time in an hour

issue commenttensorflow/probability

NotImplementedError: Eager execution currently not supported for SGLD optimizer.

I tried that too, but it gives another error: ValueError: tf.function-decorated function tried to create variables on non-first call.

shashankg7

comment created time in an hour

push eventtensorflow/tensorflow

Renjie Liu

commit sha 02a954585bad7238431837ad3067f76708a100ff

Optimize scalar broadcast add. PiperOrigin-RevId: 275168005 Change-Id: I6263afeadfbf968dec5f30d2bac12b2d08797d01

view details

push time in an hour

startedtensorflow/datasets

started time in an hour

startedtensorflow/tensorflow

started time in an hour

issue commenttensorflow/tensorflow

Duplicated Java outer classname

I suppose this issue is very clear and just a typo mistake.

Look at these two files and check:

tensorflow/compiler/tf2xla/host_compute_metadata.proto tensorflow/compiler/tf2xla/tf2xla.proto

They both share the same java_outer_classname option value, so that error org/tensorflow/tf2xla/Tf2XlaProtos.java: Tried to write the same file twice. encountered when compiling proto files under tensorflow/compiler/tf2xla into java.

I see there is a naming convention in this project for the java_outer_classname option value: a snake-to-camel conversion from the filename and then Protos suffix appended. So java_outer_classname option value should be like HostComputeMetadataProtos in file tensorflow/compiler/tf2xla/host_compute_metadata.proto , but now Tf2XlaProtos is there.

Isn't this a simple typo mistake, right?

YuanWenqing

comment created time in an hour

issue commenttensorflow/tensorflow

tf.keras model.fit calls slow with TPU distribute strategy

@jvishnuvardhan Just to give you guys a heads up, one can directly pass a dataset to model.fit and so multiple calls to fit are not really necessary if you are using a pipeline with only tensorflow functions for data augmentation.

capilano

comment created time in an hour

pull request commenttensorflow/ecosystem

Change spark-tensorflow-connector dependency to be spark 3.0.0 snapshot

@mengxr Not relevant to spark 3.0. Create new PR here with some explanation https://github.com/tensorflow/ecosystem/pull/144

WeichenXu123

comment created time in an hour

startedtensorflow/text

started time in an hour

PR opened tensorflow/ecosystem

Fix flaky test "LocalWriteSuite"

Fix flaky test "LocalWriteSuite"

The issue is in:

The test first create a temporary path by java.nio.Files.createTempDirectory, then delete it, then use the allocated temp path to be the saving path for dataframe. This is risky. because when we delete a directory, the path is released and can be allocated as new temp dir in other place, which cause the next line df.save (errorIfExisting mode) failed.

So I update the code. Do not delete the created temp dir, but create a sub-dir inside it as the df saving destination.

+5 -4

0 comment

1 changed file

pr created time in an hour

startedtensorflow/tensorflow

started time in an hour

startedtensorflow/ranking

started time in an hour

issue openedtensorflow/tensorflow

Could not satisfy explicit device specification '' because the node placed on device Device assignments active during op was colocated with a group of nodes that required incompatible device

<em>Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template</em>

System information

  • I didn't write custom code
  • OS Platform and Distribution: Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): pip install tensorflow-gpu==1.14.0
  • Python version: 3.6.9
  • CUDA/cuDNN version: 8.0, cuDNN 6.0
  • GPU model and memory: 2 x 1080 TI Describe the current behavior I try to train a simple convolutional neural network defined by tf.keras.layers.Conv2D with random normal initialization, the network should be trained on GPU:

I use the following code to initialize the code:

config = tf.compat.v1.ConfigProto() config.log_device_placement=False sess = tf.compat.v1.InteractiveSession(config=config) with tf.device('/gpu:0'): setup the network and train Describe the expected behavior It should train without bugs

I tried to use config.allow_soft_placement=True and it didn't throw out the bug anymore but the network can only be trained on cpu instead of GPU. Please help to figure out this issue as I've seen multiple issue reported in many places but the problem is never solved with a clean solution.

Other info / logs Could not satisfy explicit device specification '' because the node placed on device Device assignments active during op was colocated with a group of nodes that required incompatible device

created time in an hour

startedtensorflow/tensorflow

started time in an hour

push eventtensorflow/tensorboard

TensorBoard Gardener

commit sha e69add8ae2a79211cbece6752aee29b24b2c77b1

Integrate 60083f4a9e3541b263a530666b2fda9a49aecbc5

view details

push time in an hour

PR closed tensorflow/tensorboard

DO_NOT_SUBMIT: Testing the build... cla: yes

meow :cat:

+57 -17

0 comment

4 changed files

stephanwlee

pr closed time in an hour

push eventtensorflow/tfx

tfx-team

commit sha 3f1ac311d576308a9325d038abd23d2e8e64631c

Basic visualization support. Render a markdown file of exec_properties for each component. PiperOrigin-RevId: 275095000

view details

push time in an hour

startedtensorflow/tensorflow

started time in an hour

push eventtensorflow/tensorboard

Stephan Lee

commit sha 60083f4a9e3541b263a530666b2fda9a49aecbc5

csp: fix bugs and properly treat projector (#2775) With enabling CSP on TensorBoard, we have broken projector in few ways. Because we enforce CSP for all text/html response, we need to have proper sha256s for all scripts including ones for the projector. Several fixes are: 1. strict-dynamic -> 'strict-dynamic' 2. 'frame-src' is now set for `iframe src="..."` 3. script hashes are piped in for projector 4. `unsafe-eval` because of numericjs [1]. [1]: https://github.com/sloisel/numeric/blob/656fa1254be540f428710738ca9c1539625777f1/src/numeric.js#L576-L585

view details

push time in an hour

PR merged tensorflow/tensorboard

csp: fix bugs and properly treat projector cla: yes

With enabling CSP on TensorBoard, we have broken projector in few ways. Because we enforce CSP for all text/html response, we need to have proper sha256s for all scripts including ones for the projector. Several fixes are:

  1. strict-dynamic -> 'strict-dynamic'
  2. 'frame-src' is now set for iframe src="..."
  3. script hashes are piped in for projector
  4. unsafe-eval because of numericjs 1.

Confession: tested projector before applying stricter rules on all text/html before :. Tested on projector now.

+58 -17

0 comment

5 changed files

stephanwlee

pr closed time in an hour

startedtensorflow/docs

started time in an hour

startedtensorflow/docs

started time in an hour

issue commenttensorflow/tensorflow

Cannot import tensorflow after installing via pip3

I have checked as many issues as possible (I used to use python a lot in the past but have fallen away from it in recent years). I don’t think my recent transition from Windows 10 to Ubuntu helped either. I will reply with my pip list tommorow morning when I get my laptop back (I left it at school in my locker) around 8:45 EST.

sykeben

comment created time in an hour

startedtensorflow/models

started time in an hour

issue commenttensorflow/tensorflow

How to get cublas handle to run cublas function?

@timshen91 Very useful info! Is there any sample code that use stream executor to do BLAS ops with cuBLAS. If I modify the src code to export a getter of blas_ and use it in some custom op, will it affect releasing resource?

7oud

comment created time in an hour

startedtensorflow/models

started time in an hour

startedtensorflow/models

started time in an hour

issue commenttensorflow/tensor2tensor

[Question] ASR Transformer performance vs. Google Speech-to-Text

I found most of the available models on github are still far from google/microsoft/apple's speech to text performance. What is missing? Training data or language model??

mabergerx

comment created time in an hour

startedtensorflow/neural-structured-learning

started time in an hour

startedtensorflow/adanet

started time in an hour

startedtensorflow/models

started time in an hour

startedtensorflow/tensorflow

started time in an hour

fork Dhanasekar-S/nmt

TensorFlow Neural Machine Translation Tutorial

fork in an hour

startedtensorflow/transform

started time in an hour

fork ZhuBaohe/tensorflow

An Open Source Machine Learning Framework for Everyone

https://tensorflow.org

fork in an hour

startedtensorflow/serving

started time in 2 hours

startedtensorflow/nmt

started time in 2 hours

issue commenttensorflow/tensorboard

Add "Baseline score" from explanations into the UI

Oh I see, you mean within the What-If Tool plugin. Cool, that's enough for me to route the request :)

htappen

comment created time in 2 hours

push eventtensorflow/tfx

tfx-team

commit sha 8fe2e9db58ae86ff2cde29d3d27b44b8d4488dec

Result of running the TF 2.0 upgrade script on the tfx. PiperOrigin-RevId: 275091567

view details

pachristopher

commit sha ce61fd2d9b66bf668ba1757087b9d4c1ac90eea9

Replace DecodedExamplesToTable with a Python implementation. Removes arrow C++ dependency. PiperOrigin-RevId: 275130487

view details

tfx-team

commit sha aa7e1eb6263b1622b790e05b0c332a63a49ba0a5

Fix TFX CLI container builder bug PiperOrigin-RevId: 274527963

view details

push time in 2 hours

startedtensorflow/docs

started time in 2 hours

startedtensorflow/docs

started time in 2 hours

issue commenttensorflow/tensorboard

Add "Baseline score" from explanations into the UI

Can you be more specific? The only match I can find to baseline_score is in TFMA (not TensorBoard or TensorFlow generally). Did you mean to file this at https://github.com/tensorflow/model-analysis?

If you did mean TensorBoard: which explanation data provides baseline scores, and where in the TensorBoard UI would you expect it to appear?

htappen

comment created time in 2 hours

issue commenttensorflow/tensorflow

Image load error

Unable to reproduce it. @GaranceRichard

Can you share your pip list under the win10 version?

On win10 : just see above : you'll have my venv (tf gpu version) and my local (tf cpu version)

Specifically, what is the version number of tensorflow you use?

GaranceRichard

comment created time in 2 hours

startedtensorflow/examples

started time in 2 hours

push eventtensorflow/tensorflow

Brian Zhao

commit sha 1114a0364ef38d81d7a34e262994cf771e3c9460

Wiring tensorflow/core/platform:regexp to tensorflow/core/BUILD. PiperOrigin-RevId: 275161498 Change-Id: Ia5ef371fe76427693f4a13f2b0c873f1f3f54606

view details

push time in 2 hours

push eventtensorflow/addons

Sean Morgan

commit sha 40de2b942c833c6afccd96759f106640b983953b

Standardize lambda with other frameworks (#601) * Standardize lambda with other frameworks * Update test cases

view details

push time in 2 hours

PR merged tensorflow/addons

Reviewers
Standardize lambda with other frameworks activations cla: yes

Per this discussion: https://github.com/tensorflow/addons/pull/570#discussion_r332281330

Matching the default lambda values with other frameworks: https://github.com/pytorch/pytorch/blob/446a79b95992dee7b987d0f35364fbb4ed1372db/torch/csrc/api/include/torch/nn/options/activation.h#L166

https://github.com/PaddlePaddle/Paddle/blob/8fb569e5b913958619f07243decebc24e5a6aa48/paddle/fluid/operators/activation_op.cc#L393

https://github.com/pytorch/pytorch/blob/446a79b95992dee7b987d0f35364fbb4ed1372db/torch/csrc/api/include/torch/nn/options/activation.h#L33

+16 -16

1 comment

6 changed files

seanpmorgan

pr closed time in 2 hours

fork siddharthdivi/cleverhans

An adversarial example library for constructing attacks, building defenses, and benchmarking both

fork in 2 hours

fork sunchch/models

Models and examples built with TensorFlow

fork in 2 hours

fork kycglobal/docs

TensorFlow documentation

https://www.tensorflow.org

fork in 2 hours

fork kqingcan/models

Models and examples built with TensorFlow

fork in 2 hours

more