profile
viewpoint
Eugene Zhulenev ezhulenev Google SFBA http://eugenezhulenev.com/ Software Engineer at Google Brain, working on @tensorflow.

ezhulenev/orderbook-dynamics 629

Modeling high-frequency limit order book dynamics with support vector machines

ezhulenev/ergodicity 78

Akka Actor based trading system for Moscow Stock Exchange (http://moex.com/en/)

ezhulenev/distributo 71

ECS Scheduler for Running Massive Parallel Computations

ezhulenev/marketdb 32

Market time series database

ezhulenev/scala-openbook 32

Scala library for parsing TAQ NYSE OpenBook Ultra

ezhulenev/akka-var-calculation 14

Akka Cluster for Value-at-Risk calculation

ezhulenev/hyperloglogplus 12

Haskell implementation of HyperLogLog++ & MinHash for efficient cardinality and intersection estimation

ezhulenev/scalafi 3

Financial data analysis and market models in Scala language

ezhulenev/finagled-movie-db 2

Seamless migration from monolithic application to Finagle services

ezhulenev/finmnv 1

Pignataro / Financial Modeling and Valuation

issue commenttensorflow/tensorflow

Why does the cpu bias_op support only up to five dims?

This was done originally to reduce the number of instantiated templates, however this could be always computed with tensors:

(1) Rank 2 for NHWC -> fuse all dimensions before last (2) Rank 3 for NCHW -> fuse all dimensions after dim 1

I'm not sure when I'll have cycles to add this fix myself, but it should be rather trivial. Would gladly review external contribution.

idofr

comment created time in a day

issue commenttensorflow/tensorflow

~40% performance decrease since Tensorflow 1.9 when training large models

The problem is that NASNet model has mostly 1x1 convolutions, and in NHWC format they are computed as a matrix multiplication, in NCHW they go though cuDNN convolution, and it's ~30-50% slower. I'll try to come up with a better strategy for layout swapping, though it might be problematic, because currently we can only have a single "target" data format for the graph.

DavidWiesner

comment created time in 2 months

issue commenttensorflow/tensorflow

~40% performance decrease since Tensorflow 1.9 when training large models

I was able to measure 10-15% performance regression with a batch size 4 on a single V100 GPU (larger batch OOMs). V100 has faster (30%-50%) convolutions (and conv gradients) in NCHW data format, and layout optimizer correctly swaps all convs from default NHWC to NCHW in this model, however it seems that all the extra transpose nodes are killing all the conv gains.

Looking at the executed graph and profiles, maybe layout optimizer missed an opportunity to remove redundant transposes.

DavidWiesner

comment created time in 2 months

issue commenttensorflow/tensorflow

Eigen version bump breaks nightly AVX512 build with gcc 6.3

I've just mirrored Eigen to http://mirror.tensorflow.org/bitbucket.org/eigen/eigen/get/a0d250e79c79.tar.gz, either we never mirrored it before, or lost it.

byronyi

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

[Intel MKL]code cleanup

 bool FindContractionWithBiasAndAddActivation(    // Root of the pattern must be an activation node.   const auto* node_def = node_view->node();+  if (node_def == nullptr) return false;

Did this happen in practice? It's probably a bug somewhere if node def in null.

guizili0

comment created time in 3 months

issue commenttensorflow/tensorflow

TF 1.14.0 training crashes with unimplemented Conv2D errors (works fine in TF 1.13.2)

Based on my understanding that is what happens:

  1. Conv2D without explicit device string placed on GPU
  2. Layout optimizer changes data format NHWC->NCHW becasue it's optimal for the GPU
  3. Then something puts Conv2D back on CPU <<<---- after a discussion we have no idea what it could be

When Conv2D explicitly placed on CPU, it prevents layout optimizer from swapping data format.

@mschonwe what if you explicitly put this Conv2D on GPU?

mschonwe

comment created time in 5 months

issue commenttensorflow/tensorflow

Performance regression in sparse_dense_matmul

@yzhuang I'm surprised that this commit fixed any regressions, because I know that it actually introduced pretty large regression to TensorChipping that is used a lot in sparse matmul, and the fix was submitted with https://github.com/tensorflow/tensorflow/commit/79e6d267c299d2051ba35ce014379f3d120efce6#diff-455a4c7f8e22d7c514e8c2caa27506c5

pavanky

comment created time in 5 months

issue commenttensorflow/tensorflow

Performance regression in sparse_dense_matmul

I will take a look at this issue, unfortunately we do a lot of contributions to Eigen recently, and follow it's head pretty close. /cc @rmlarsen

Can you try to build Tensorflow from sources with XSMM support (https://github.com/tensorflow/tensorflow/blob/82bf3e6c20782727d683beae4be046335b6f3297/tensorflow/core/kernels/BUILD#L82-L90), as far as I know internally it's a LARGE performance improvement for apps that rely on sparse matmuls.

pavanky

comment created time in 5 months

more