profile
viewpoint
Adrian Kuegel akuegel Google Germany GmbH

Pull request review commenttensorflow/tensorflow

[XLA] new version of PR 37260

 static void UnrollInnerTileLoop(     bool check_x_tile_bounds, int64 x_num_steps, int64 step_x,     int64 vector_size, const string& loop_name, KernelSupportLibrary* ksl,     llvm::Value* start_offset_x, llvm::Value* y_loc, llvm::Value* tile_width,-    IrArray::Index& source_idx, llvm::IRBuilder<>& b_,

Please replace b_ with b

nouiz

comment created time in 2 hours

Pull request review commenttensorflow/tensorflow

[XLA] new version of PR 37260

 static void UnrollInnerTileLoop(     bool check_x_tile_bounds, int64 x_num_steps, int64 step_x,     int64 vector_size, const string& loop_name, KernelSupportLibrary* ksl,     llvm::Value* start_offset_x, llvm::Value* y_loc, llvm::Value* tile_width,-    IrArray::Index& source_idx, llvm::IRBuilder<>& b_,+    const IrArray::Index& source_idx, llvm::IRBuilder<>* b_,     const IrEmitterUnnested::EmitElementFunction* emit_elem_function) {   llvm::Type* index_ty = tile_width->getType();   auto constant = [&](int64 val) {     return llvm::ConstantInt::get(index_ty, val);   };-  for (int j = 0; j < x_num_steps / vector_size; j++) {-    for (int i = 0; i < vector_size; i++) {+  for (int64 j = 0; j < x_num_steps / vector_size; j++) {+    IrArray::Index source_idx_x_base =

I think this can be moved outside of the loop. Note that AddOffsetToDim creates a new index and doesn't modify the existing one.

nouiz

comment created time in 2 hours

Pull request review commenttensorflow/tensorflow

[XLA] vectorize row reduction for even row size

 static llvm::Value* GetStartOffsetX(const KernelMappingScheme& mapping_scheme,                                     llvm::Value* thread_id_x,                                     llvm::Type* index_ty,                                     llvm::IRBuilder<>* b) {-  if (mapping_scheme.DilatedX()) {+  auto constant = [&](int64 val) {+    return llvm::ConstantInt::get(index_ty, val);+  };+  if (mapping_scheme.GetIndexingOrder() == kStridedIndexingX) {     return thread_id_x;+  } else if (mapping_scheme.GetIndexingOrder() == kLinearStridedIndexingX) {+    return b->CreateMul(thread_id_x, constant(mapping_scheme.GetVectorSize()));   }+  CHECK_EQ(mapping_scheme.GetIndexingOrder(), kLinearIndexingX);   int64 x_num_steps =       mapping_scheme.GetTileSizeX() / mapping_scheme.GetNumThreadsX();-  return b->CreateMul(thread_id_x,-                      llvm::ConstantInt::get(index_ty, x_num_steps));+  return b->CreateMul(thread_id_x, constant(x_num_steps));+}++// Calls `emit_elem_function()` `x_num_steps` times.  If+// `vector_size`==1, then each element index passed to+// `emit_elem_function()` will be separated by `step_x`. If `vector_size`>1,+// then it must be a multiple of `x_num_steps`.  In that case, it+// triggers a different indexing order that is vectorizable by+// LLVM. It generates many groups of calls to `emit_elem_function`. Each+// group is separated by `step_x` elements.  Inside a group, elements+// are consecutive. If `check_x_tile_bounds` is true, then it will check+// if the element index is in bound compared to `tile_width` before+// calling `emit_elem_function`.+static void UnrollInnerTileLoop(+    bool check_x_tile_bounds, int64 x_num_steps, int64 step_x,+    int64 vector_size, const string& loop_name, KernelSupportLibrary* ksl,+    llvm::Value* start_offset_x, llvm::Value* y_loc, llvm::Value* tile_width,+    IrArray::Index& source_idx, llvm::IRBuilder<>& b_,+    const IrEmitterUnnested::EmitElementFunction* emit_elem_function) {+  llvm::Type* index_ty = tile_width->getType();+  auto constant = [&](int64 val) {+    return llvm::ConstantInt::get(index_ty, val);+  };+  for (int j = 0; j < x_num_steps / vector_size; j++) {

The int's should be int64's

nouiz

comment created time in a day

Pull request review commenttensorflow/tensorflow

[XLA] vectorize row reduction for even row size

 static llvm::Value* GetStartOffsetX(const KernelMappingScheme& mapping_scheme,                                     llvm::Value* thread_id_x,                                     llvm::Type* index_ty,                                     llvm::IRBuilder<>* b) {-  if (mapping_scheme.DilatedX()) {+  auto constant = [&](int64 val) {+    return llvm::ConstantInt::get(index_ty, val);+  };+  if (mapping_scheme.GetIndexingOrder() == kStridedIndexingX) {     return thread_id_x;+  } else if (mapping_scheme.GetIndexingOrder() == kLinearStridedIndexingX) {+    return b->CreateMul(thread_id_x, constant(mapping_scheme.GetVectorSize()));   }+  CHECK_EQ(mapping_scheme.GetIndexingOrder(), kLinearIndexingX);   int64 x_num_steps =       mapping_scheme.GetTileSizeX() / mapping_scheme.GetNumThreadsX();-  return b->CreateMul(thread_id_x,-                      llvm::ConstantInt::get(index_ty, x_num_steps));+  return b->CreateMul(thread_id_x, constant(x_num_steps));+}++// Calls `emit_elem_function()` `x_num_steps` times.  If+// `vector_size`==1, then each element index passed to+// `emit_elem_function()` will be separated by `step_x`. If `vector_size`>1,+// then it must be a multiple of `x_num_steps`.  In that case, it+// triggers a different indexing order that is vectorizable by+// LLVM. It generates many groups of calls to `emit_elem_function`. Each+// group is separated by `step_x` elements.  Inside a group, elements+// are consecutive. If `check_x_tile_bounds` is true, then it will check+// if the element index is in bound compared to `tile_width` before+// calling `emit_elem_function`.+static void UnrollInnerTileLoop(+    bool check_x_tile_bounds, int64 x_num_steps, int64 step_x,+    int64 vector_size, const string& loop_name, KernelSupportLibrary* ksl,+    llvm::Value* start_offset_x, llvm::Value* y_loc, llvm::Value* tile_width,+    IrArray::Index& source_idx, llvm::IRBuilder<>& b_,

Please pass source_idx as const reference (it is not changed), and a pointer IrBuilder instead of a reference (this is according to the style guide).

nouiz

comment created time in a day

Pull request review commenttensorflow/tensorflow

[XLA] vectorize row reduction for even row size

 ReductionCodegenInfo IrEmitterUnnested::ComputeReductionCodegenInfo(       !IsUnrollingColumnReductionBeneficial(unnested_hlo, input_shape,                                             reduction_dimensions.dimensions[2]); -  if (!dilated_x && !reduction_dimensions.is_row_reduction) {

This change makes the variable dilated_x unused, so it should be deleted.

nouiz

comment created time in a day

Pull request review commenttensorflow/tensorflow

[XLA] vectorize row reduction for even row size

 static llvm::Value* GetStartOffsetX(const KernelMappingScheme& mapping_scheme,                                     llvm::Value* thread_id_x,                                     llvm::Type* index_ty,                                     llvm::IRBuilder<>* b) {-  if (mapping_scheme.DilatedX()) {+  auto constant = [&](int64 val) {+    return llvm::ConstantInt::get(index_ty, val);+  };+  if (mapping_scheme.GetIndexingOrder() == kStridedIndexingX) {     return thread_id_x;+  } else if (mapping_scheme.GetIndexingOrder() == kLinearStridedIndexingX) {+    return b->CreateMul(thread_id_x, constant(mapping_scheme.GetVectorSize()));   }+  CHECK_EQ(mapping_scheme.GetIndexingOrder(), kLinearIndexingX);   int64 x_num_steps =       mapping_scheme.GetTileSizeX() / mapping_scheme.GetNumThreadsX();-  return b->CreateMul(thread_id_x,-                      llvm::ConstantInt::get(index_ty, x_num_steps));+  return b->CreateMul(thread_id_x, constant(x_num_steps));+}++// Calls `emit_elem_function()` `x_num_steps` times.  If+// `vector_size`==1, then each element index passed to+// `emit_elem_function()` will be separated by `step_x`. If `vector_size`>1,+// then it must be a multiple of `x_num_steps`.  In that case, it+// triggers a different indexing order that is vectorizable by+// LLVM. It generates many groups of calls to `emit_elem_function`. Each+// group is separated by `step_x` elements.  Inside a group, elements+// are consecutive. If `check_x_tile_bounds` is true, then it will check+// if the element index is in bound compared to `tile_width` before+// calling `emit_elem_function`.+static void UnrollInnerTileLoop(+    bool check_x_tile_bounds, int64 x_num_steps, int64 step_x,+    int64 vector_size, const string& loop_name, KernelSupportLibrary* ksl,+    llvm::Value* start_offset_x, llvm::Value* y_loc, llvm::Value* tile_width,+    IrArray::Index& source_idx, llvm::IRBuilder<>& b_,+    const IrEmitterUnnested::EmitElementFunction* emit_elem_function) {+  llvm::Type* index_ty = tile_width->getType();+  auto constant = [&](int64 val) {+    return llvm::ConstantInt::get(index_ty, val);+  };+  for (int j = 0; j < x_num_steps / vector_size; j++) {+    for (int i = 0; i < vector_size; i++) {+      int linear_index = j * vector_size + i;+      llvm::Value* x_loc = b_.CreateAdd(constant(j * step_x * vector_size + i),+                                        start_offset_x, "x_loc");+      IrArray::Index source_idx_x =+          source_idx.AddOffsetToDim(y_loc, kDimY, &b_)

You could pull this line out of this function and already pass a source_idx_y to the function. Less generated code, although it would certainly be optimized away by LLVM anyway.

nouiz

comment created time in a day

Pull request review commenttensorflow/tensorflow

[XLA] vectorize row reduction for even row size

 static IrArray::Index GetUnnormalizedIndex(   // If the normalization only add a new dimensions of size 1,   // generate simpler indexing. LLVM doesn't always simplify the more   // complicated indexing and this prevents it from vectorizing some-  // cases.-  if (unnormalized_shape.rank() == 2) {+  // cases. We do this only for major_to_minor memory layout.+  if (unnormalized_shape.rank() == 2 && unnormalized_shape.has_layout() &&+      unnormalized_shape.dimensions()[0] == normalized_shape_index.dims()[1] &&+      unnormalized_shape.dimensions()[1] == normalized_shape_index.dims()[2] &&+      unnormalized_shape.layout().minor_to_major(1) == 0) {     DCHECK_EQ(normalized_shape_index.dims()[0], 0);

This DCHECK is wrong, it should check that the trivial dimension 0 has value 1. DCHECK_EQ(normalized_shape_index.dims()[0], 1)

nouiz

comment created time in a day

Pull request review commenttensorflow/tensorflow

[XLA] vectorize row reduction for even row size

 void IrEmitterUnnested::EmitTile(   //   // TODO(cheshire): Once ptxas is fixed and TF switches to it, remove the   // workaround.-  ksl->For(loop_name + "_y_in_tile",-           /*start=*/constant(0),-           /*end=*/-           ceil_of_ratio(b_.CreateSub(tile_height, thread_id_info.thread_id_y),-                         num_threads_y),-           /*step=*/constant(1), [&](llvm::Value* y_indvar) {-             llvm::Value* y_loc =-                 b_.CreateAdd(thread_id_info.thread_id_y,-                              b_.CreateMul(y_indvar, num_threads_y));-             for (int64 j = 0; j < x_num_steps; j++) {-               llvm::Value* x_loc =-                   b_.CreateAdd(constant(j * step_x), start_offset_x, "x_loc");-               IrArray::Index source_idx_x =-                   source_idx.AddOffsetToDim(y_loc, kDimY, &b_)-                       .AddOffsetToDim(constant(j * step_x), kDimX, &b_);-               auto emit_element = [&] {-                 return emit_elem_function(source_idx_x, y_loc, x_loc, j);-               };-               if (!x_tile_fits) {-                 ksl->If(loop_name + "_x_in_tile",-                         b_.CreateICmpULT(x_loc, tile_width), emit_element);-               } else {-                 emit_element();-               }-             }-           });+  ksl->For(+      loop_name + "_y_in_tile",+      /*start=*/constant(0),+      /*end=*/+      ceil_of_ratio(b_.CreateSub(tile_height, thread_id_info.thread_id_y),+                    num_threads_y),+      /*step=*/constant(1), [&](llvm::Value* y_indvar) {+        llvm::Value* y_loc = b_.CreateAdd(+            thread_id_info.thread_id_y, b_.CreateMul(y_indvar, num_threads_y));+        auto unrollInnerTileLoop = [&](bool check_x_tile_bounds) {

this should be named unroll_inner_tile_loop according to the style guide. It is still a variable.

nouiz

comment created time in a day

push eventllvm/llvm-project

Adrian Kuegel

commit sha baa6f6a7828a46c37b96227282938717220f8b34

Revert "[TableGen][GlobalISel] Account for HwMode in RegisterBank register sizes" This reverts commit e9f22fd4293a65bcdcf1b18b91c72f63e5e9e45b. When building with -DLLVM_USE_SANITIZER="Thread", check-llvm has 70 failing tests with this revision, and 29 without this revision.

view details

push time in 13 days

push eventllvm/llvm-project

Adrian Kuegel

commit sha 4a7f2032a350bc7eefd26709563f65216df3e2ce

Revert "CFGDiff: Simplify/common the begin/end implementations to use a common range helper" This reverts commit 79a7ed92a9b135212a6a271dd8dbc625038c8f06. This breaks the asan buildbot.

view details

push time in 14 days

push eventllvm/llvm-project

Adrian Kuegel

commit sha 5156e38eb1d3d0ef5bce1fc8491a05f3cfca0f89

Fix memtag test. Summary: Matching %x makes the test fail. Subscribers: cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D76272

view details

push time in 16 days

push eventllvm/llvm-project

Adrian Kuegel

commit sha 86306df7dd2a8e60d88c6306956080b53ac95589

Extract common code to deal with multidimensional vectors. Summary: Also replace dyn_cast_or_null with dyn_cast when possible. Differential Revision: https://reviews.llvm.org/D75733

view details

push time in a month

push eventllvm/llvm-project

Adrian Kuegel

commit sha 91acb5b3e1c372895f7f6fa9f5cf95bf80c2ae0b

Add rsqrt op to Standard dialect and lower it to LLVM dialect. Summary: This adds an rsqrt op to the standard dialect, and lowers it as 1 / sqrt to the LLVM dialect. Differential Revision: https://reviews.llvm.org/D75353

view details

push time in a month

push eventllvm/llvm-project

Adrian Kuegel

commit sha 39e1c1fa9ee03e91751e505d747275e58069e6de

Add GPU lowerings for the different log ops. Summary: This adds GPU lowerings for log, log10 and log2. Reviewers: mravishankar, herhut Subscribers: jholewinski, mehdi_amini, rriddle, jpienaar, burmako, shauheen, antiagainst, nicolasvasilache, csigg, arpith-jacob, mgester, lucyrfox, liufengdb, Joonsoo, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D75239

view details

push time in a month

pull request commenttensorflow/tensorflow

Add multi-algorithm deterministic cuDNN convolutions

I will try to roll forward again.

duncanriach

comment created time in 2 months

pull request commenttensorflow/tensorflow

Add multi-algorithm deterministic cuDNN convolutions

This change had to be rolled back. It seems one of our test targets became flaky with this CL.

duncanriach

comment created time in 2 months

push eventllvm/llvm-project

Adrian

commit sha 5a6eae3dea2342c2a83e4502de43927808f8ca21

[mlir] Ran git-clang-format. Summary: I forgot to ran git-clang-format before committing.

view details

push time in 3 months

push eventllvm/llvm-project

Adrian Kuegel

commit sha 018b042593f007456b0695421942ec84ec816a30

[mlir] Add loop.parallel, loop.reduce and loop.reduce.return operations. Summary: These operations can be used to specify a loop nest with a body that can contain reductions. The iteration space can be iterated in any order. RFC: https://groups.google.com/a/tensorflow.org/d/topic/mlir/pwtSgiKFPis/discussion Differential Revision: https://reviews.llvm.org/D72394

view details

push time in 3 months

Pull request review commenttensorflow/tensorflow

Added hlo/lhlo emitters for Abs, Ceil, Convert, Cos, Negate, Remainder, Sign and Tanh ops.

 ENTRY %AddReduce (x: f32[100,10], c: f32[]) -> f32[100] {       )"); } +TEST_F(LhloGenTest, Abs) {+  CompileAndVerifyIr(R"(+HloModule Abs+ENTRY %Abs (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %abs = f32[2,2]{1,0} abs(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @abs(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.abs"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Ceil) {+  CompileAndVerifyIr(R"(+HloModule Ceil+ENTRY %Ceil (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %ceil = f32[2,2]{1,0} ceil(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @ceil(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.ceil"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Convert) {+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> f32[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f32[2,2]) -> f64[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %convert = f64[2,2]{1,0} convert(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f64[2,2]) -> f32[2,2] {+  %val = f64[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(f64[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> i8[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = i8[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i8[2,2]) -> i32[2,2] {+  %val = i8[2,2]{1,0} parameter(0)+  ROOT %convert = i32[2,2]{1,0} convert(i8[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Cos) {+  CompileAndVerifyIr(R"(+HloModule Cos+ENTRY %Cos (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %cos = f32[2,2]{1,0} cos(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @cos(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {

When you fix the HLO above, you need to name this cosine as well.

dfki-jugr

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Added hlo/lhlo emitters for Abs, Ceil, Convert, Cos, Negate, Remainder, Sign and Tanh ops.

 ENTRY %AddReduce (x: f32[100,10], c: f32[]) -> f32[100] {       )"); } +TEST_F(LhloGenTest, Abs) {+  CompileAndVerifyIr(R"(+HloModule Abs+ENTRY %Abs (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %abs = f32[2,2]{1,0} abs(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @abs(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.abs"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Ceil) {+  CompileAndVerifyIr(R"(+HloModule Ceil+ENTRY %Ceil (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %ceil = f32[2,2]{1,0} ceil(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @ceil(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.ceil"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Convert) {+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> f32[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f32[2,2]) -> f64[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %convert = f64[2,2]{1,0} convert(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f64[2,2]) -> f32[2,2] {+  %val = f64[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(f64[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> i8[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = i8[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i8[2,2]) -> i32[2,2] {+  %val = i8[2,2]{1,0} parameter(0)+  ROOT %convert = i32[2,2]{1,0} convert(i8[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Cos) {+  CompileAndVerifyIr(R"(+HloModule Cos+ENTRY %Cos (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %cos = f32[2,2]{1,0} cos(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @cos(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.cos"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Neg) {+  CompileAndVerifyIr(R"(+HloModule Neg+ENTRY %Neg (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %neg = f32[2,2]{1,0} neg(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @neg(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.neg"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Rem) {+  CompileAndVerifyIr(R"(+HloModule Rem+ENTRY %Rem(x: f32[2,2], y: f32[2,2]) -> f32[2,2] {+  %x = f32[2,2]{1,0} parameter(0)+  %y = f32[2,2]{1,0} parameter(1)+  ROOT %rem = f32[2,2]{1,0} remainder(f32[2,2]{1,0} %x, f32[2,2]{1,0} %y)+})",+                     R"(+;CHECK: func @remainder(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]], %[[ARG2:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.remainder(%[[ARG0]], %[[ARG1]], %[[ARG2]]) : ([[TYPE]], [[TYPE]], [[TYPE]]) -> ()

missing " after remainder: "xla_lhlo.remainder"

dfki-jugr

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Added hlo/lhlo emitters for Abs, Ceil, Convert, Cos, Negate, Remainder, Sign and Tanh ops.

 ENTRY %AddReduce (x: f32[100,10], c: f32[]) -> f32[100] {       )"); } +TEST_F(LhloGenTest, Abs) {+  CompileAndVerifyIr(R"(+HloModule Abs+ENTRY %Abs (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %abs = f32[2,2]{1,0} abs(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @abs(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.abs"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Ceil) {+  CompileAndVerifyIr(R"(+HloModule Ceil+ENTRY %Ceil (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %ceil = f32[2,2]{1,0} ceil(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @ceil(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.ceil"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Convert) {+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> f32[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f32[2,2]) -> f64[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %convert = f64[2,2]{1,0} convert(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f64[2,2]) -> f32[2,2] {+  %val = f64[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(f64[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> i8[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = i8[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i8[2,2]) -> i32[2,2] {+  %val = i8[2,2]{1,0} parameter(0)+  ROOT %convert = i32[2,2]{1,0} convert(i8[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Cos) {+  CompileAndVerifyIr(R"(+HloModule Cos+ENTRY %Cos (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %cos = f32[2,2]{1,0} cos(f32[2,2]{1,0} %val)

HLO for cos is "cosine" So needs to be cosine(f32[2,2]{1,0} %val)

dfki-jugr

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Added hlo/lhlo emitters for Abs, Ceil, Convert, Cos, Negate, Remainder, Sign and Tanh ops.

 ENTRY %AddReduce (x: f32[100,10], c: f32[]) -> f32[100] {       )"); } +TEST_F(LhloGenTest, Abs) {+  CompileAndVerifyIr(R"(+HloModule Abs+ENTRY %Abs (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %abs = f32[2,2]{1,0} abs(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @abs(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.abs"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Ceil) {+  CompileAndVerifyIr(R"(+HloModule Ceil+ENTRY %Ceil (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %ceil = f32[2,2]{1,0} ceil(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @ceil(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.ceil"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Convert) {+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> f32[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f32[2,2]) -> f64[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %convert = f64[2,2]{1,0} convert(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: f64[2,2]) -> f32[2,2] {+  %val = f64[2,2]{1,0} parameter(0)+  ROOT %convert = f32[2,2]{1,0} convert(f64[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i32[2,2]) -> i8[2,2] {+  %val = i32[2,2]{1,0} parameter(0)+  ROOT %convert = i8[2,2]{1,0} convert(i32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+  CompileAndVerifyIr(R"(+HloModule Convert+ENTRY %Convert (val: i8[2,2]) -> i32[2,2] {+  %val = i8[2,2]{1,0} parameter(0)+  ROOT %convert = i32[2,2]{1,0} convert(i8[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @convert(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.convert"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Cos) {+  CompileAndVerifyIr(R"(+HloModule Cos+ENTRY %Cos (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %cos = f32[2,2]{1,0} cos(f32[2,2]{1,0} %val)+})",+                     R"(+;CHECK: func @cos(%[[ARG0:.*]]: [[TYPE:.*]], %[[ARG1:.*]]: [[TYPE]]) {+;CHECK:   "xla_lhlo.cos"(%[[ARG0]], %[[ARG1]]) : ([[TYPE]], [[TYPE]]) -> ()+;CHECK: }+      )");+}++TEST_F(LhloGenTest, Neg) {+  CompileAndVerifyIr(R"(+HloModule Neg+ENTRY %Neg (val: f32[2,2]) -> f32[2,2] {+  %val = f32[2,2]{1,0} parameter(0)+  ROOT %neg = f32[2,2]{1,0} neg(f32[2,2]{1,0} %val)

neg -> negate for the HLO op. Similar to the cosine fix, you will also have to adjust func below.

dfki-jugr

comment created time in 3 months

pull request commenttensorflow/tensorflow

Added hlo/lhlo emitters for Abs, Ceil, Convert, Cos, Negate, Remainder, Sign and Tanh ops.

The convert op isn't specified correctly in the dialect, it currently expects to have the same input and output shape. But the whole point of the convert op is to be able to convert the types of shapes. The tests generated this error: FAILED: 'xla_lhlo.convert' op requires all operands to have the same type We will probably merge this request with a few fixes, but without the convert op.

dfki-jugr

comment created time in 3 months

Pull request review commenttensorflow/tensorflow

Add Label for XlaOp

 class XlaOpRegistry {     // operands and not their values.     bool is_metadata_op = false; +    string label;

use std::string please. I know that right now this file uses mostly string instead of std::string, but at some point this will be fixed with a large scale change. But I would prefer if we already start using std::string everywhere when we introduce new variables :)

Agoniii

comment created time in 4 months

Pull request review commenttensorflow/tensorflow

Add Label for XlaOp

 XlaOpRegistrationBuilder& XlaOpRegistrationBuilder::IsMetadataOp() {   return *this; } +XlaOpRegistrationBuilder& XlaOpRegistrationBuilder::Label(+    absl::string_view label) {+  registration_->label = string(label);

std::string please

Agoniii

comment created time in 4 months

issue commenttensorflow/tensorflow

Failed to run the unit test of bonus_tests

The name of the tf_cc_tests rule is never used to create an executable. If you check the definition, it creates executable tests for each of the entries in srcs (their name derived from name of their source file). So you can for example run: bazel --output_user_root=$build_dir test //tensorflow/core/kernels:adjust_contrast_op_test (this is the first of the created tests from the bonus_tests target).

Leslie-Fang

comment created time in 5 months

more