profile
viewpoint

dubey/weaver 215

A scalable, fast, consistent graph store

dubey/research_trends 7

Visualization tool for CS research

tfboyd/models 1

Models built with TensorFlow

dubey/benchmarks 0

Benchmark code

dubey/crosstex 0

CrossTeX is a BibTeX replacement, with better citation and bibliographic database support.

dubey/cs6210-f16 0

Course repository for Cornell CS 6210, Fall 2016

dubey/models 0

Models and examples built with TensorFlow

dubey/neo4j-shell-tools 0

A bunch of import/export tools for the neo4j-shell

dubey/protobuf 0

Protocol Buffers - Google's data interchange format

dubey/tensorflow 0

Computation using data flow graphs for scalable machine learning

issue commenttensorflow/tensorflow

CUDA illegal error access error when running distributed mixed precision

No I haven't seen this issue before.

lminer

comment created time in 2 months

issue commenttensorflow/tensorflow

nccl_ops.all_sum does not correctly reduce gradients

Hi @ppwwyyxx can you confirm that you run collective_ops.all_reduce with NCCL and not the default ring implementation? It isn't really expected that collective_ops.all_reduce is slower than nccl_ops.all_sum.

ppwwyyxx

comment created time in 2 months

issue commenttensorflow/tensorflow

SyncBatchNormalization layer segfaults on multi-worker with NCCL

Thanks, I was working on a similar change internally, this time with a unit test that can reproduce the issue. Yes please do keep me posted.

MinasTyuru

comment created time in 2 months

IssuesEvent

issue commenttensorflow/tensorflow

SyncBatchNormalization layer segfaults on multi-worker with NCCL

Thanks for the update. Let me follow up internally.

MinasTyuru

comment created time in 2 months

issue commenttensorflow/tensorflow

SyncBatchNormalization layer segfaults on multi-worker with NCCL

@MinasTyuru I just submitted a change that should help with this issue. Feel free to reopen if you encounter the segfault again.

MinasTyuru

comment created time in 2 months

more