profile
viewpoint
Uday Bondhugula bondhugula Indian Institute of Science, PolyMage Labs Bangalore, India http://www.csa.iisc.ac.in/~uday

bondhugula/pluto 119

Pluto: An automatic polyhedral parallelizer and locality optimizer

periscop/cloog 20

The CLooG Code Generator in the Polytope Model

bondhugula/polymage-benchmarks 15

Base code and optimized code for the benchmarks used in the PolyMage paper published at ASPLOS 2015

periscop/clan 15

Chunky Loop Analyzer: A Polyhedral Representation Extraction Tool for High Level Programs

bondhugula/llvm-project 14

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

periscop/candl 8

Data Dependence Analyzer in the Polyhedral Model

periscop/openscop 8

A Specification and a Library for Data Exchange in Polyhedral Compilation Tools

periscop/piplib 6

Parametric Integer Programming Library

bondhugula/mlir 5

"Multi-Level Intermediate Representation" Compiler Infrastructure

bondhugula/smo 2

A storage optimization tool for regular loop nests

PR closed tensorflow/tensorflow

Reviewers
[MLIR] Fix hlo to lhlo conversion amid std.constant ops cla: yes size:M

Add a conversion pattern to convert std.constant to xla_lhlo.const - to run during HLO to LHLO conversion. Without such a pattern, xla_hlo to xla_lhlo conversion would fail when std.constant ops provide constant tensor operands to other xla_hlo ops. The tf-mlir-translate -hlo-to-mlir-hlo for example generates such std.constant ops from HLO constant nodes.

std.constant tensor generating ops are to be replaced by constant memref generating ops.

Fixes #39895.

Signed-off-by: Uday Bondhugula uday@polymagelabs.com

+58 -5

7 comments

3 changed files

bondhugula

pr closed time in 3 days

pull request commenttensorflow/tensorflow

[MLIR] Fix hlo to lhlo conversion amid std.constant ops

There isn't a clear solution to this issue and this has become moot / low priority - given that there are ways to avoid it trivially. I'm closing this. The bug it was trying to fix should remain open however - the behavior / error isn't proper.

bondhugula

comment created time in 3 days

pull request commenttensorflow/tensorflow

[MLIR]Fix verifier for TF::Conv2DOp and TF::Conv3DOp.

@smit-hinsu for visibility - as the author of the surrounding code.

pr4tgpt

comment created time in 17 days

push eventllvm/llvm-project

Tatiana Shpeisman

commit sha 9909ef292daddccbd3b1154cec173014b847880a

[mlir][scf] Fix a bug in scf::ForOp loop unroll with an epilogue Fixes a bug in formation and simplification of an epilogue loop generated during loop unroll of scf::ForOp (https://bugs.llvm.org/show_bug.cgi?id=46689) Differential Revision: https://reviews.llvm.org/D87583

view details

push time in 22 days

push eventpolymage-labs/mlirx

Uday Bondhugula

commit sha 2cb4b5fd9c060e5eeded8d9a2e7e5b4c4197f3d1

[NFC] MemrefShapeCastOp lowering - changes from upstream review Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

view details

Uday Bondhugula

commit sha d366ece2e3b49c2822bb882c087b86f79fa56da5

[NFC] AffineOps.cpp comment update

view details

Uday Bondhugula

commit sha fa0492fd6f9f9981f0f7d5a53f7f75fcf0a6b40d

[MLIR] [NFC] Update execute.region doc

view details

Uday Bondhugula

commit sha 3a183542d1ca9ad98b5340ebf23edf97ec9f208f

Missed execute region op changes Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

view details

Uday Bondhugula

commit sha 78ff995ed06ecc9d54f3d6acd1af9bd3c77294b3

[MLIR] Missed updates for memref shape cast op llvm lowering

view details

push time in a month

push eventpolymage-labs/mlirx

Dave Lee

commit sha 3b3b9ba1c7d89afe4909a42e2a795354bb79e062

[lldb/Commands] Fix outdated `breakpoint command add` help string Update the some examples in the help string for `breakpoint command add`. Python breakpoint commands have different output than what's shown in the help string. Notes: * Removed an example containing an inner function, as it seems more about a Python technique than about `command script add` * Updated `print x` to `print(x)` to be python 2/3 agnostic Differential Revision: https://reviews.llvm.org/D87807

view details

Dave Lee

commit sha b36bdfe5ca0c2b863248f327b03d41516b38dc11

[cmake] Centralize LLVM_ENABLE_WARNINGS option Configure default value of `LLVM_ENABLE_WARNINGS` in `HandleLLVMOptions.cmake`. `LLVM_ENABLE_WARNINGS` is documented as ON by default, but `HandleLLVMOptions` assumes the default has been set somewhere else. If it has not been explicitly set, then `HandleLLVMOptions` implicitly uses OFF as a default. This removes the various `option()` declarations in favor of a single declaration in `HandleLLVMOptions`. This will prevent the unwanted use of `-w` that is mentioned in a couple of the comments. Reviewed By: DavidTruby, #libunwind, JDevlieghere, compnerd Differential Revision: https://reviews.llvm.org/D87243

view details

Simon Pilgrim

commit sha 005f826a0546eb11890b7bd36fea6b8b1c5e3fc4

[SLP] Use for-range loops across ValueLists. NFCI. Also rename some existing loops that used a 'j' iterator to consistently use 'V'.

view details

Roman Lebedev

commit sha 83c2d10d3cae57f71e23193d62989725b9b9f2f2

[NFC][SCEV] Add tests for @llvm.abs intrinsic

view details

Roman Lebedev

commit sha 1bb7ab8c4a324aa380bddfc75069e24c19e2bdd0

[SCEV] Recognize @llvm.abs as smax(x, -x) As per alive2 (ignoring undef): ---------------------------------------- define i32 @src(i32 %x, i1 %y) { %0: %r = abs i32 %x, 0 ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %neg_x = mul i32 %x, 4294967295 %r = smax i32 %x, %neg_x ret i32 %r } Transformation seems to be correct! ---------------------------------------- define i32 @src(i32 %x, i1 %y) { %0: %r = abs i32 %x, 1 ret i32 %r } => define i32 @tgt(i32 %x, i1 %y) { %0: %neg_x = mul nsw i32 %x, 4294967295 %r = smax i32 %x, %neg_x ret i32 %r } Transformation seems to be correct!

view details

Roman Lebedev

commit sha 0592de550f5c9ca9de44ed2c5c549f6a3b1c32b7

[NFC][SCEV] Add tests for @llvm.*.sat intrinsics

view details

Roman Lebedev

commit sha fedc9549d50d80f74169ecce4d0d0648a62249f0

[SCEV] Recognize @llvm.usub.sat as `%x - (umin %x, %y)` ---------------------------------------- define i32 @src(i32 %x, i32 %y) { %0: %r = usub_sat i32 %x, %y ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %t0 = umin i32 %x, %y %r = sub nuw i32 %x, %t0 ret i32 %r } Transformation seems to be correct!

view details

Roman Lebedev

commit sha 64e2cb7e9605995d2efb625203cbd96db1404812

[SCEV] Recognize @llvm.uadd.sat as `%y + umin(%x, (-1 - %y))` ---------------------------------------- define i32 @src(i32 %x, i32 %y) { %0: %r = uadd_sat i32 %x, %y ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %t0 = sub nsw nuw i32 4294967295, %y %t1 = umin i32 %x, %t0 %r = add nuw i32 %t1, %y ret i32 %r } Transformation seems to be correct! The alternative, naive, lowering could be the following, although i don't think it's better, thought it will likely be needed for sadd/ssub/*shl: ---------------------------------------- define i32 @src(i32 %x, i32 %y) { %0: %r = uadd_sat i32 %x, %y ret i32 %r } => define i32 @tgt(i32 %x, i32 %y) { %0: %t0 = zext i32 %x to i33 %t1 = zext i32 %y to i33 %t2 = add nuw i33 %t0, %t1 %t3 = zext i32 4294967295 to i33 %t4 = umin i33 %t2, %t3 %r = trunc i33 %t4 to i32 ret i32 %r } Transformation seems to be correct!

view details

Ye Luo

commit sha 03111e5e7a8690300966a39f0aa2e4f2b4ec919a

[OpenMP] Protect unrecogonized CUDA error code If an error code can not be recognized by cuGetErrorString, errStr remains null and causes crashing at DP() printing. Protect this case. Reviewed By: jhuber6, tianshilei1992 Differential Revision: https://reviews.llvm.org/D87980

view details

Sanjay Patel

commit sha 1e6b240d7d336a36856268db5349468560e28a0e

[IRBuilder][VectorCombine] make and use a convenience function for unary shuffle; NFC This reduces code duplication for common construct. Follow-ups can use this in SLP, LoopVectorizer, and other passes.

view details

Sanjay Patel

commit sha a44238cb443f13c1e9fd42f6269f019d505ff5dd

[SLP] use unary shuffle creator to reduce code duplication; NFC

view details

David Tenty

commit sha d8540427419ec0c4b9bc02f432ef39c01898e826

[AIX][Clang][Driver] Add handling of shared option Reviewed By: jasonliu Differential Revision: https://reviews.llvm.org/D87914

view details

Arthur Eubanks

commit sha 746a2c3775658c4485a8e71a7d46ee55c30615b8

[ObjCARC] Initialize return value Mistakenly removed initialization of `Changed` in https://reviews.llvm.org/D87806.

view details

jerryyin

commit sha f87ceb63eb011e5cd653218af619097b58bf568f

[AMDGPU] Adding mutex to guard lld::elf::link interface use check-mlir target run tests simultaneously with multiple threads. This caused multiple threads to invoke the `lld::elf::link()` interface at the same time. Since the interface does not have a thread-safe implementation, add a metex to prevent multi-threaded access. I discovered this by looking the the failure stack trace. lld/ELF/symbolTable.cpp, SymbolTable::insert() hit into an assert with related to Epoch Trackers. The root cause is to due to there is no protection around the symMap (update) which is implemented in non-thread safe data structure: denseMap. Differential Revision: https://reviews.llvm.org/D88038

view details

Reid Kleckner

commit sha 3b3a16548568f5b6c4146ca5129eb6af5000e4ff

[MS] On x86_32, pass overaligned, non-copyable arguments indirectly This updates the C++ ABI argument classification code to use the logic from D72114, fixing an ABI incompatibility with MSVC. Part of PR44395. Differential Revision: https://reviews.llvm.org/D87923

view details

Stanislav Mekhanoshin

commit sha e8951474b1940bd81bc3bac8d506e08880ee35ea

[AMDGPU] Fixed typo in intrinsic comment. NFC.

view details

Fangrui Song

commit sha 6d637fa560f0196b93e377b98489661ecd7a1af0

[ELF][test] Delete large temporary files and make some temporary files smaller with two text segments Large files are cumbersome on some filesystems and can more easily trigger ENOSPC. Some tests use two text sections with output section addresses to test branch ranges. Use two text segments to prevent LLD from filling the gap and unnecessarily increasing the output size. With this change, there is no test/ELF temporary file larger than 100MiB. Reviewed By: psmith Differential Revision: https://reviews.llvm.org/D88037

view details

Roman Lebedev

commit sha 0ab99bb314203d8f3b40e805ffea03857ca5c21e

[NFC][SCEV] Cleanup lowering of @llvm.uadd.sat, (-1 - V) is just ~V

view details

Arthur Eubanks

commit sha f4f7df037e71fa77b06a37d86f2596db47d583d0

[DIE] Remove DeadInstEliminationPass This pass is like DeadCodeEliminationPass, but only does one pass through a function instead of iterating on users of eliminated instructions. DeadCodeEliminationPass should be used in all cases. Reviewed By: asbirlea Differential Revision: https://reviews.llvm.org/D87933

view details

Louis Dionne

commit sha 43270c65cf48484d8b8cee5044480f6f1b00281d

[libc++] Verify base substitutions earlier in the testing format This allows diagnosing missing substitution issues even when doing availability feature detection in the DSL.

view details

push time in a month

issue commenttensorflow/mlir-hlo

lhlo-copy-removal pass crash

Thanks, Uday. This issue is going to be fixed in general mlir::CopyRemoval pass and it's going to be downstream to Tensorflow soon.

@dfki-ehna I happened to try out the upstream -copy-removal pass (which is btw great to be working just using the copy op interface!) on an lmhlo dialect snippet, and I noticed this missed opportunity (looks like a regression from the earlier pass). I didn't dig further to double-check but looks like the copy should have been eliminated here. I can file an issue if that's the case.

tf-opt -copy-removal test.mlir doesn't remove it.

func @must_be_removed_first(%arg0: memref<2x2xf32>, %arg1: memref<2x2xf32>, %arg2: memref<2x2xf32>) {
    %0 = alloc() {temp = true} : memref<2x2xf32>
    "lmhlo.exponential"(%arg1, %arg2) : (memref<2x2xf32>, memref<2x2xf32>) -> ()
    "lmhlo.exponential"(%arg0, %0) : (memref<2x2xf32>, memref<2x2xf32>) -> ()
    "lmhlo.copy"(%0, %arg2) : (memref<2x2xf32>, memref<2x2xf32>) -> ()
    dealloc %0 : memref<2x2xf32>
    "lmhlo.terminator"() : () -> ()
  }
bondhugula

comment created time in a month

push eventllvm/llvm-project

Haruki Imai

commit sha c1f856803142a113fa094411fa4760512b919ef6

[MLIR] Fix for updating function signature in normalizing memrefs Normalizing memrefs failed when a caller of symbolic use in a function can not be casted to `CallOp`. This patch avoids the failure by checking the result of the casting. If the caller can not be casted to `CallOp`, it is skipped. Differential Revision: https://reviews.llvm.org/D87746

view details

push time in a month

push eventllvm/llvm-project

Haruki Imai

commit sha ff00b58392527419ea32d0b97575ef973c1bd085

[MLIR] Normalize memrefs in LoadOp and StoreOp of Standard Ops Added a trait, `MemRefsNormalizable` in LoadOp and StoreOp of Standard Ops to normalize input memrefs in LoadOp and StoreOp. Related revision: https://reviews.llvm.org/D86236 Differential Revision: https://reviews.llvm.org/D88156

view details

push time in a month

push eventllvm/llvm-project

Navdeep Kumar

commit sha 0602e8f77f8662c85155b8cf02937a2e71c01e12

[MLIR][Affine] Add parametric tile size support for affine.for tiling Add support to tile affine.for ops with parametric sizes (i.e., SSA values). Currently supports hyper-rectangular loop nests with constant lower bounds only. Move methods - moveLoopBody(*) - getTileableBands(*) - checkTilingLegality(*) - tilePerfectlyNested(*) - constructTiledIndexSetHyperRect(*) to allow reuse with constant tile size API. Add a test pass -test-affine -parametric-tile to test parametric tiling. Differential Revision: https://reviews.llvm.org/D87353

view details

push time in a month

push eventllvm/llvm-project

Abhishek Varma

commit sha 296e97ae8f7183c2f8737b9e6e68df4904dbfadf

[MLIR] Support for return values in Affine.For yield Add support for return values in affine.for yield along the same lines as scf.for and affine.parallel. Signed-off-by: Abhishek Varma <abhishek.varma@polymagelabs.com> Differential Revision: https://reviews.llvm.org/D87437

view details

push time in a month

push eventllvm/llvm-project

Uday Bondhugula

commit sha 9c40495a35a2cac89dd72db54892d6bd7a2abf0d

[MLIR][NFC] Value print update for block arguments Emit some more information when printing/dumping `Value`s of `BlockArgument` kind. This is purely to help for debugging purposes. Differential Revision: https://reviews.llvm.org/D87670

view details

push time in 2 months

issue commentbondhugula/fw_fpga

cases having the same bits

This code is from really from 14 years ago and haven't looked at it during this period! I wish I had added some comments (or better variable names) as I no longer immediately recall the details! ((It perhaps just does all three and having three different blocks was a way to logically group them. ) Please do take a look at the accompanying paper - it should be straightforward to deduce. If you think there is an error in this version, please feel free to post or raise a PR. Note that this design was validated experimentally on a real FPGA accelerated system (Cray XD1 back then).

Sana-Guezguez

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.+static void getAffineAccessMatrices(+    ArrayRef<Operation *> loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices) {++  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (auto loadOp = dyn_cast<AffineLoadOp>(srcOp))+      map = loadOp.getAffineMap();+    else if (auto storeOp = dyn_cast<AffineStoreOp>(srcOp))+      map = storeOp.getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    loopAccessMatrices[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for the l-th dim of this memref)+      // to get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      // Check if the `mapResult` is a constant expr. If yes, there is no need+      // to walk it. Instead, add the value to the constant b-vector element and+      // leave the row unchanged. The last column of an access matrix stores the+      // b-vector.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        loopAccessMatrices[srcOp][l].back() = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can in turn be a combination of many sub expressions.+          // Walk each of these sub-exprs to fully parse the `mapResult`.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates the last sibling loop from its fellow siblings. After separation,+/// it receives a copy of the common parent independent from its other siblings.+/// A loop nest like: \code+///     parent{forOpA, forOpB, lastSibling}+/// \endcode+/// becomes+/// \code+///     parent{lastSibling}, parent{forOpA, forOpB}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));+  // We need `siblings` as a SmallVector. We cannot use an ArrayRef here because+  // that would make each element in `siblings` a 'const' and this would prevent+  // us from calling getOperation() method.++  // We always separate the last sibling from the group. For this we'll need the+  // order in which all the siblings are arranged. We need this order to compare+  // loops with their cloned copy in `copyParentForOp`. Comparision using the+  // AffineForOp.getOperation() method does not work in this case.+  AffineForOp lastSibling = siblings.back();+  unsigned lastSiblingPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned siblingIndex = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (op.getOperation() == lastSibling.getOperation())+      lastSiblingPosition = siblingIndex;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(siblingIndex);+  });+  // Walk the cloned copy to erase all the other siblings.+  siblingIndex = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (siblingIndex != lastSiblingPosition &&+        siblingsIndices.count(siblingIndex))+      op.getOperation()->erase();+  });+  // Erase the `lastSibling` from the the original copy.+  lastSibling.getOperation()->erase();+}++/// Deals with imperfect loop nests where multiple loops appear as children+/// of some common parent loop. Converts all such imperfectly nested loops+/// in `funcOp` to perfectly nested ones by separating each sibling at a+/// time. That is, if two or more loops are present as siblings at some depth,+/// it will separate each of those siblings such that there is no common +/// parent left in the new structure. Each sibling receives a separate copy+/// of the common parent. This process is repeated until each parent has only +/// one child left.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  // TODO: Extend to other types of imperfectly nested loop nests.+  +  // Store the arrangement of all the for-loops in the `funcOp` body in a tree+  // structure. This makes storing the parent-child relationship an easy task. +  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  // A helper map for the `forTree`. Since `AffineForOp` cannot act as a for+  // a DenseMap, we've to use a map to convert to and from an affine.for to an+  // Operation* and vice-versa.+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);+      forOperations[op.getOperation()] = op;+    });+    // Separate one sibling at a time.+    for (auto &parentChildrenPair : forTree) {+      // This loop nest has no sibling problem. Check the next loop nest.+      if (parentChildrenPair.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[parentChildrenPair.first],+                           parentChildrenPair.second);+      // We need to walk the function again to create a new `forTree` since the+      // structure of the loop nests within the `funcOp` body has changed after+      // the separation.+      break;+    }+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest to collect all the load and store ops. The list+/// of all such ops is maintained in the private member `loadAndStoreOps`.+void LoopInterchange::getAllLoadStores() {+  loopVector[0].getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementSizes` with the size of the element types of all the memrefs+/// in the loop nest body. These are later used to check whether or not two+/// accesses are within a cacheLineSize/elementSize distance apart for a +/// successful reuse.+static void getElementSizes(ArrayRef<Operation *> loadAndStoreOps,+                            unsigned defaultElementSize,+                            DenseMap<Operation *, unsigned> &elementSizes) {++  MemRefType memRefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      memRefType = cast<AffineLoadOp>(*op).getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      memRefType = cast<AffineStoreOp>(*op).getMemRefType();+    }+    elementSizes[op] = memRefType.hasStaticShape()+                           ? getMemRefSizeInBytes(memRefType).getValue() /+                                 memRefType.getNumElements()+                           : defaultElementSize;+  }+}++/// Calculates the loop-carried-dependence vector for the given loop nest. A value+/// `true` at the i-th index means there is a loop carried dependence at depth i.+void LoopInterchange::getLoopCarriedDependenceVector() {++  // `loopCarriedDV` should have one entry for each loop.+  loopCarriedDV.resize(loopVector.size());+  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    for (unsigned j = 0; j < loadAndStoreOps.size(); ++j) {+      Operation *dstOp = loadAndStoreOps[j];+      for (unsigned depth = 1; depth <= loopVector.size() + 1; ++depth) {+        MemRefAccess srcAccess(srcOp), dstAccess(dstOp);+        FlatAffineConstraints dependenceConstraints;+        SmallVector<DependenceComponent, 2> depComps;+        DependenceResult result = checkMemrefAccessDependence(+            srcAccess, dstAccess, depth, &dependenceConstraints, &depComps);+        if (hasDependence(result)) {+          for (unsigned i = 0; i < depComps.size(); i++) {+            DependenceComponent depComp = depComps[i];+            if (depComp.lb.getValue() != 0 || depComp.ub.getValue() != 0)+              loopCarriedDV[i] = true;+          }+          // Dependence found. No need to check further.+          break;+        }+      }+    }+  }+}++/// Calculates the number of synchronizations needed in this loop permutation.+/// Those permutations having dependence satisfied on inner loops require

having -> that have

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.+static void getAffineAccessMatrices(+    ArrayRef<Operation *> loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices) {++  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (auto loadOp = dyn_cast<AffineLoadOp>(srcOp))+      map = loadOp.getAffineMap();+    else if (auto storeOp = dyn_cast<AffineStoreOp>(srcOp))+      map = storeOp.getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    loopAccessMatrices[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for the l-th dim of this memref)+      // to get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      // Check if the `mapResult` is a constant expr. If yes, there is no need+      // to walk it. Instead, add the value to the constant b-vector element and+      // leave the row unchanged. The last column of an access matrix stores the+      // b-vector.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        loopAccessMatrices[srcOp][l].back() = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can in turn be a combination of many sub expressions.+          // Walk each of these sub-exprs to fully parse the `mapResult`.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates the last sibling loop from its fellow siblings. After separation,+/// it receives a copy of the common parent independent from its other siblings.+/// A loop nest like: \code+///     parent{forOpA, forOpB, lastSibling}+/// \endcode+/// becomes+/// \code+///     parent{lastSibling}, parent{forOpA, forOpB}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));+  // We need `siblings` as a SmallVector. We cannot use an ArrayRef here because+  // that would make each element in `siblings` a 'const' and this would prevent+  // us from calling getOperation() method.++  // We always separate the last sibling from the group. For this we'll need the+  // order in which all the siblings are arranged. We need this order to compare+  // loops with their cloned copy in `copyParentForOp`. Comparision using the+  // AffineForOp.getOperation() method does not work in this case.+  AffineForOp lastSibling = siblings.back();+  unsigned lastSiblingPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned siblingIndex = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (op.getOperation() == lastSibling.getOperation())+      lastSiblingPosition = siblingIndex;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(siblingIndex);+  });+  // Walk the cloned copy to erase all the other siblings.+  siblingIndex = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (siblingIndex != lastSiblingPosition &&+        siblingsIndices.count(siblingIndex))+      op.getOperation()->erase();+  });+  // Erase the `lastSibling` from the the original copy.+  lastSibling.getOperation()->erase();+}++/// Deals with imperfect loop nests where multiple loops appear as children+/// of some common parent loop. Converts all such imperfectly nested loops+/// in `funcOp` to perfectly nested ones by separating each sibling at a+/// time. That is, if two or more loops are present as siblings at some depth,+/// it will separate each of those siblings such that there is no common +/// parent left in the new structure. Each sibling receives a separate copy+/// of the common parent. This process is repeated until each parent has only +/// one child left.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  // TODO: Extend to other types of imperfectly nested loop nests.+  +  // Store the arrangement of all the for-loops in the `funcOp` body in a tree+  // structure. This makes storing the parent-child relationship an easy task. +  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  // A helper map for the `forTree`. Since `AffineForOp` cannot act as a for+  // a DenseMap, we've to use a map to convert to and from an affine.for to an+  // Operation* and vice-versa.+  DenseMap<Operation *, AffineForOp> forOperations;

This is unnecessary. You can dyn_cast the Operation * to its AffineForOp.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.+static void getAffineAccessMatrices(+    ArrayRef<Operation *> loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices) {++  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (auto loadOp = dyn_cast<AffineLoadOp>(srcOp))+      map = loadOp.getAffineMap();+    else if (auto storeOp = dyn_cast<AffineStoreOp>(srcOp))+      map = storeOp.getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    loopAccessMatrices[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for the l-th dim of this memref)+      // to get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      // Check if the `mapResult` is a constant expr. If yes, there is no need+      // to walk it. Instead, add the value to the constant b-vector element and+      // leave the row unchanged. The last column of an access matrix stores the+      // b-vector.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        loopAccessMatrices[srcOp][l].back() = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can in turn be a combination of many sub expressions.+          // Walk each of these sub-exprs to fully parse the `mapResult`.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates the last sibling loop from its fellow siblings. After separation,+/// it receives a copy of the common parent independent from its other siblings.+/// A loop nest like: \code+///     parent{forOpA, forOpB, lastSibling}+/// \endcode+/// becomes+/// \code+///     parent{lastSibling}, parent{forOpA, forOpB}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));+  // We need `siblings` as a SmallVector. We cannot use an ArrayRef here because+  // that would make each element in `siblings` a 'const' and this would prevent+  // us from calling getOperation() method.++  // We always separate the last sibling from the group. For this we'll need the+  // order in which all the siblings are arranged. We need this order to compare+  // loops with their cloned copy in `copyParentForOp`. Comparision using the+  // AffineForOp.getOperation() method does not work in this case.+  AffineForOp lastSibling = siblings.back();+  unsigned lastSiblingPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned siblingIndex = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (op.getOperation() == lastSibling.getOperation())+      lastSiblingPosition = siblingIndex;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(siblingIndex);+  });+  // Walk the cloned copy to erase all the other siblings.+  siblingIndex = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (siblingIndex != lastSiblingPosition &&+        siblingsIndices.count(siblingIndex))+      op.getOperation()->erase();+  });+  // Erase the `lastSibling` from the the original copy.+  lastSibling.getOperation()->erase();+}++/// Deals with imperfect loop nests where multiple loops appear as children+/// of some common parent loop. Converts all such imperfectly nested loops+/// in `funcOp` to perfectly nested ones by separating each sibling at a+/// time. That is, if two or more loops are present as siblings at some depth,+/// it will separate each of those siblings such that there is no common +/// parent left in the new structure. Each sibling receives a separate copy+/// of the common parent. This process is repeated until each parent has only +/// one child left.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  // TODO: Extend to other types of imperfectly nested loop nests.+  +  // Store the arrangement of all the for-loops in the `funcOp` body in a tree+  // structure. This makes storing the parent-child relationship an easy task. +  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  // A helper map for the `forTree`. Since `AffineForOp` cannot act as a for+  // a DenseMap, we've to use a map to convert to and from an affine.for to an+  // Operation* and vice-versa.+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);+      forOperations[op.getOperation()] = op;+    });+    // Separate one sibling at a time.+    for (auto &parentChildrenPair : forTree) {+      // This loop nest has no sibling problem. Check the next loop nest.+      if (parentChildrenPair.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[parentChildrenPair.first],+                           parentChildrenPair.second);+      // We need to walk the function again to create a new `forTree` since the+      // structure of the loop nests within the `funcOp` body has changed after+      // the separation.+      break;+    }+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest to collect all the load and store ops. The list+/// of all such ops is maintained in the private member `loadAndStoreOps`.+void LoopInterchange::getAllLoadStores() {+  loopVector[0].getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementSizes` with the size of the element types of all the memrefs+/// in the loop nest body. These are later used to check whether or not two+/// accesses are within a cacheLineSize/elementSize distance apart for a +/// successful reuse.+static void getElementSizes(ArrayRef<Operation *> loadAndStoreOps,+                            unsigned defaultElementSize,+                            DenseMap<Operation *, unsigned> &elementSizes) {++  MemRefType memRefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      memRefType = cast<AffineLoadOp>(*op).getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      memRefType = cast<AffineStoreOp>(*op).getMemRefType();

dyn_cast.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.+static void getAffineAccessMatrices(+    ArrayRef<Operation *> loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices) {++  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (auto loadOp = dyn_cast<AffineLoadOp>(srcOp))+      map = loadOp.getAffineMap();+    else if (auto storeOp = dyn_cast<AffineStoreOp>(srcOp))+      map = storeOp.getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    loopAccessMatrices[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for the l-th dim of this memref)+      // to get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      // Check if the `mapResult` is a constant expr. If yes, there is no need+      // to walk it. Instead, add the value to the constant b-vector element and+      // leave the row unchanged. The last column of an access matrix stores the+      // b-vector.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        loopAccessMatrices[srcOp][l].back() = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can in turn be a combination of many sub expressions.+          // Walk each of these sub-exprs to fully parse the `mapResult`.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates the last sibling loop from its fellow siblings. After separation,+/// it receives a copy of the common parent independent from its other siblings.+/// A loop nest like: \code+///     parent{forOpA, forOpB, lastSibling}+/// \endcode+/// becomes+/// \code+///     parent{lastSibling}, parent{forOpA, forOpB}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));+  // We need `siblings` as a SmallVector. We cannot use an ArrayRef here because+  // that would make each element in `siblings` a 'const' and this would prevent+  // us from calling getOperation() method.++  // We always separate the last sibling from the group. For this we'll need the+  // order in which all the siblings are arranged. We need this order to compare+  // loops with their cloned copy in `copyParentForOp`. Comparision using the+  // AffineForOp.getOperation() method does not work in this case.+  AffineForOp lastSibling = siblings.back();+  unsigned lastSiblingPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned siblingIndex = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (op.getOperation() == lastSibling.getOperation())+      lastSiblingPosition = siblingIndex;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(siblingIndex);+  });+  // Walk the cloned copy to erase all the other siblings.+  siblingIndex = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (siblingIndex != lastSiblingPosition &&+        siblingsIndices.count(siblingIndex))+      op.getOperation()->erase();+  });+  // Erase the `lastSibling` from the the original copy.+  lastSibling.getOperation()->erase();+}++/// Deals with imperfect loop nests where multiple loops appear as children+/// of some common parent loop. Converts all such imperfectly nested loops+/// in `funcOp` to perfectly nested ones by separating each sibling at a+/// time. That is, if two or more loops are present as siblings at some depth,+/// it will separate each of those siblings such that there is no common +/// parent left in the new structure. Each sibling receives a separate copy+/// of the common parent. This process is repeated until each parent has only +/// one child left.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {

Pass Operation by pointer for consistency.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.+static void getAffineAccessMatrices(+    ArrayRef<Operation *> loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices) {++  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (auto loadOp = dyn_cast<AffineLoadOp>(srcOp))+      map = loadOp.getAffineMap();+    else if (auto storeOp = dyn_cast<AffineStoreOp>(srcOp))+      map = storeOp.getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    loopAccessMatrices[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for the l-th dim of this memref)+      // to get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      // Check if the `mapResult` is a constant expr. If yes, there is no need+      // to walk it. Instead, add the value to the constant b-vector element and+      // leave the row unchanged. The last column of an access matrix stores the+      // b-vector.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        loopAccessMatrices[srcOp][l].back() = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can in turn be a combination of many sub expressions.+          // Walk each of these sub-exprs to fully parse the `mapResult`.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates the last sibling loop from its fellow siblings. After separation,+/// it receives a copy of the common parent independent from its other siblings.+/// A loop nest like: \code+///     parent{forOpA, forOpB, lastSibling}+/// \endcode+/// becomes+/// \code+///     parent{lastSibling}, parent{forOpA, forOpB}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));+  // We need `siblings` as a SmallVector. We cannot use an ArrayRef here because+  // that would make each element in `siblings` a 'const' and this would prevent+  // us from calling getOperation() method.++  // We always separate the last sibling from the group. For this we'll need the+  // order in which all the siblings are arranged. We need this order to compare+  // loops with their cloned copy in `copyParentForOp`. Comparision using the+  // AffineForOp.getOperation() method does not work in this case.+  AffineForOp lastSibling = siblings.back();+  unsigned lastSiblingPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned siblingIndex = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (op.getOperation() == lastSibling.getOperation())+      lastSiblingPosition = siblingIndex;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(siblingIndex);+  });+  // Walk the cloned copy to erase all the other siblings.+  siblingIndex = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    siblingIndex++;+    if (siblingIndex != lastSiblingPosition &&+        siblingsIndices.count(siblingIndex))+      op.getOperation()->erase();+  });+  // Erase the `lastSibling` from the the original copy.+  lastSibling.getOperation()->erase();

You don't need getOperation().

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.+static void getAffineAccessMatrices(+    ArrayRef<Operation *> loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices) {++  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (auto loadOp = dyn_cast<AffineLoadOp>(srcOp))+      map = loadOp.getAffineMap();+    else if (auto storeOp = dyn_cast<AffineStoreOp>(srcOp))+      map = storeOp.getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    loopAccessMatrices[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for the l-th dim of this memref)+      // to get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      // Check if the `mapResult` is a constant expr. If yes, there is no need+      // to walk it. Instead, add the value to the constant b-vector element and+      // leave the row unchanged. The last column of an access matrix stores the+      // b-vector.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();

dyn_cast.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    }+  }+  }+}++/// Populates `loopAccessMatrices` with the access matrices (A|b) of all load +/// and store ops in the loop body. Please note that each affine access can be +/// represented as a linear system Ax+b (A is the affine access matrix, x is the +/// vector of loopIVs and b is the constant-term vector). `loopIndexMap` holds +/// depth locations of each loopIV in the original loop order.

This isn't true when you have floordiv/mod in accesses. What's the strategy in those cases. The test cases are also missing anything with a mod/div.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();+      isConstRhs = true;+    }+    if (row[rhsPosition] == 0 && !isConstRhs)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    // In the case of a mul expr, the lhs can only be a dim expr and the rhs can+    // only be a constant expr.+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    // This takes care of the cases like A[i] where i is a loopIV. Since it is+    // not a binary expr, there is no lhs/rhs.+    auto dimExpr = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    // Even though exprs like CeilDiv/FloorDiv and Mod can be considered as+    // binary exprs, the `rhs` in these exprs is always a constant as per the+    // rules of AffineExpr. These constant values do not make part of either+    // the vector-b or the matrix A. Thus, we don't need to care about `rhs`+    // in these cases.+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dimExpr.getPosition()]]] = 1;

dyn_cast.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      row.back() += rhs.cast<AffineConstantExpr>().getValue();

dyn_cast - otherwise, there would be an extra check.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    }+    // Update the loopIV only if it has not been encountered before. Please note+    // that it is possible that the same loopIV have been encountered before+    // while parsing other exprs. In that case, the appropriate coefficient is+    // already set.+    if (row[lhsPosition] == 0)+      row[lhsPosition] = 1;+    // The `rhs` may be a constant expr. In that case, no need to update the+    // `row`.+    bool isConstRhs = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();

dyn_cast

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.+  SmallVector<Operation *, 8> loadAndStoreOps;+};+} // namespace++/// Returns true if any affine-if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if the given loop nest has a rectangular-shaped iteration space.+bool LoopInterchange::isRectangularAffineForLoopNest() {+  for (AffineForOp forOp : loopVector) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Every value in +/// `operands` should either be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr, ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols.+  // The value at the last index of the `row` is an element of the constant+  // vector b.+  row.resize(loopIndexMap.size() + 1);+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Please note that in the case of an add operation, either both `lhs` and+    // `rhs` are dim exprs or the `lhs` is a dim expr and the `rhs` is a+    // constant expr.+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();

Use dyn_cast. A cast followed by an isa almost always means a dyn_cast could have been used.

HarshVardhanKumar

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;++  // List of all load/store ops in the loop nest body.

Triple comment please.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;++  /// Number of cache lines accessed by each loop in the loop nest.+  DenseMap<const AffineForOp *, uint64_t> cacheLinesAccessCounts;

These would be wrong. The AffineForOp is just a value type wrapper around the actual op. The same Operation could thus potentially have different AffineForOp *. You need to hash on Operation *. This also means you aren't getting any caching here.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop

Carried -> carried, ..

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);++  bool getBestPermutation(DenseMap<Value, unsigned> &loopIndexMap,+                          SmallVector<unsigned, 4> &bestPerm);++  // Loop Carried Dependence vector. A 'true' at index 'i' means that the loop+  // at depth 'i' carries a dependence.+  SmallVector<bool, 4> loopCarriedDV;++  // Iteration count of each loop in the loop nest.+  SmallVector<unsigned, 4> loopIterationCounts;++  // The loop nest.+  SmallVector<AffineForOp, 4> loopVector;

Unclear comment.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);++  uint64_t getNumCacheLinesTemporalReuse(+      ArrayRef<unsigned> permutation,+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      uint64_t maxTemporalReuse);

Inputs should be passed by const reference.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();++  void getCacheLineAccessCounts(+      DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+          &loopAccessMatrices,+      DenseMap<Operation *, unsigned> &elementSizes);++  uint64_t getNumCacheLinesSpatialReuse(ArrayRef<unsigned> perm);++  uint64_t getNumSyncs(ArrayRef<unsigned> perm);+

Likewise.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;++  bool isRectangularAffineForLoopNest();++  void getLoopCarriedDependenceVector();++  void getAllLoadStores();

Can drop spaces between these decl's.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the+  /// locality of each loop in a given loop nest.+  constexpr static unsigned kCacheLineSize = 64;++  /// Default element size to be used if a memref does not have a static shape.+  constexpr static unsigned kDefaultEltSize = 8;

Dynamic shape doesn't mean the element size is unknown. The elt size is always known as long as it's an int/float type. So this comment looks incorrect.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+// RUN: mlir-opt %s -affine-loop-interchange | FileCheck %s++func @interchange_for_spatial_locality(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// Interchanged for spatial locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:     %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++func @interchange_for_spatial_temporal(%A: memref<2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j] : memref<2048xf64>+      affine.load %A[%j] : memref<2048xf64>+      affine.load %A[%i] : memref<2048xf64>+    }+  }+  return+}+// More reuse with %j, %i order.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV0]]] : memref<2048xf64>+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV0]]] : memref<2048xf64>+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]]] : memref<2048xf64>+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++// CHECK-LABEL: func @matmul_ijk+func @matmul_ijk(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %a = affine.load %A[%i, %k] : memref<2048x2048xf64>+        %b = affine.load %B[%k, %j] : memref<2048x2048xf64>+        %ci = affine.load %C[%i, %j] : memref<2048x2048xf64>+        %p = mulf %a, %b : f64+        %co = addf %ci, %p : f64+        affine.store %co, %C[%i, %j] : memref<2048x2048xf64>+      }+    }+  }+  return+}++// Test whether the ikj permutation has been found.++// CHECK:      affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:        mulf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        addf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++// CHECK-LABEL: func @interchange_for_outer_parallelism+func @interchange_for_outer_parallelism(%A: memref<2048x2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %v = affine.load %A[%i, %j, %k] : memref<2048x2048x2048xf64>+        %p = mulf %v, %v : f64+        affine.store %p, %A[%i - 1, %j, %k] : memref<2048x2048x2048xf64>+      }+    }+  }+  return+}+// %j should become outermost - provides outer parallelism and locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = mulf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV1]] - 1, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++func @test_group_reuse(%A: memref<2048x2048xf64>, %B: memref<?x?xf64>, %C: memref<?x?xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%i, %j] : memref<2048x2048xf64>+      affine.store %v, %C[%i, %j] : memref<?x?xf64>+      %u1 = affine.load %A[%j, %i] : memref<2048x2048xf64>+      %u2 = affine.load %A[%j - 1, %i] : memref<2048x2048xf64>+      %u3 = affine.load %A[%j + 1, %i] : memref<2048x2048xf64>+      %s1 = addf %u1, %u2 : f64+      %s2 = addf %s1, %u3 : f64+      affine.store %s2, %B[%j, %i] : memref<?x?xf64>+    }+  }+  return+}+// Interchanged for better reuse.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV0]]] : memref<2048x2048xf64>+// CHECK:           affine.store %{{.*}}, %{{.*}}[%[[IV1]], %[[IV0]]] : memref<?x?xf64>+// CHECK:           affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<?x?xf64>++// -----++func @interchange_invalid(%A: memref<2048x2048xf64>) {+  affine.for %t = 0 to 2048 {+    affine.for %i = 0 to 2048 {+      %u1 = affine.load %A[%t - 1, %i] : memref<2048x2048xf64>+      %u2 = affine.load %A[%t - 1, %i + 1] : memref<2048x2048xf64>+      %u3 = affine.load %A[%t - 1, %i - 1] : memref<2048x2048xf64>+      %s1 = addf %u1, %u2 : f64+      %s2 = addf %s1, %u3 : f64+      affine.store %s2, %A[%t, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// Interchange is invalid.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK:         affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>++// -----++// Test for handling other than add/mul.++func @interchange_for_spatial_locality_mod(%A: memref<2048x2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %v = affine.load %A[%i mod 2, %k, %j] : memref<2048x2048x2048xf64>+        affine.store %v, %A[%i mod 2, %k, %j] : memref<2048x2048x2048xf64>+      }+    }+  }+  return+}+// Interchanged for spatial locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]] mod 2, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV1]] mod 2, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>++// -----++// Interchange is invalid due to dependences++func @invalid_loop_interchange(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %c0 = affine.load %A[%j + 1, %i - 1] : memref<2048x2048xf64>+      affine.store %c0, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]] + 1, %[[IV0]] - 1] : memref<2048x2048xf64>+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV1]], %[[IV0]]] : memref<2048x2048xf64>++// -----++// Test to make sure there are no crashes/aborts on things that aren't handled.++func @if_else(%A: memref<2048x2048xf64>) {+  %c0 = constant 0.0 : f64+  %c1 = constant 1.0 : f64+  affine.for %i = 0 to 2048 {+    affine.if affine_set<(d0) : (d0 - 1024 >= 0)> (%i) {+      affine.for %j = 0 to 2048 {+        affine.store %c0, %A[%i, %j] : memref<2048x2048xf64>+      }+    } else {+      affine.for %j = 0 to 2048 {+        affine.store %c1, %A[%i, %j] : memref<2048x2048xf64>+      }+    }+  }+  return+}++// -----++// Test case with dynamic loop bounds. The pass should do nothing.++func @non_rectangular_loopnest(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %k = affine_map<(d0)->(d0)>(%i) to 2048 {+      %v = affine.load %A[%k, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%k, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// -----++// Test for interchange on imperfect nests.++func @imperfect_nest(%A: memref<2048x2048xf64>) {+  %c0 = constant 0.0 : f64+  %c1 = constant 1.0 : f64+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 1024 {+      affine.store %c0, %A[%j, %i] : memref<2048x2048xf64>+    }+    affine.for %j = 1024 to 2048 {+      affine.store %c1, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}+// CHECK:      for %[[IV0.*]] = 0 to 1024 {+// CHECK-NEXT:   for %[[IV1.*]] = 0 to 2048 {+// CHECK-NEXT:     affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]]+// CHECK-NEXT:   }+// CHECK-NEXT: }+// CHECK-NEXT: for %[[IV0.*]] = 1024 to 2048 {+// CHECK-NEXT:   for %[[IV1.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]]+// CHECK-NEXT:   }+// CHECK-NEXT: }+++// -----++// Test for interchange on multilevel imperfect nests.++func @multilevel_imperfect_nest(%A: memref<2048x2048xf64>) {+  %c0 = constant 0.0 : f64+  %c1 = constant 1.0 : f64+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 1024 {+      affine.store %c0, %A[%j, %i] : memref<2048x2048xf64>+    }+    affine.for %j = 1024 to 2048 {+      affine.store %c1, %A[%j, %i] : memref<2048x2048xf64>+    }+    affine.for %j = 1024 to 2048 {+      affine.for %k = 0 to 256 {+        affine.store %c0, %A[%j, %k] : memref<2048x2048xf64>+      }+      affine.for %k = 256 to 512 {+        affine.store %c1, %A[%k, %i] : memref<2048x2048xf64>+      }+    }+  }+  return+}+// CHECK:       affine.for %[[IV0:.*]] = 0 to 1024 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:    affine.for %[[IV0:.*]] = 1024 to 2048 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:    affine.for %[[IV0:.*]] = 1024 to 2048 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 256 {+// CHECK-NEXT:        affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:          affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        }+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:    affine.for %[[IV0:.*]] = 256 to 512 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:        affine.for %[[IV2:.*]] = 1024 to 2048 {+// CHECK-NEXT:          affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        }+// CHECK-NEXT:      }+// CHECK-NEXT:    }++// -----++func @multi_nest_seq(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  // CHECK:     affine.for %[[IV0:.*]] = 0 to 2048 {+  // CHECK-NEXT: affine.for %[[IV1:.*]] = 0 to 2048 {+  // CHECK-NEXT:    %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+  // CHECK-NEXT:    affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+  // CHECK-NEXT:  }+  // CHECK-NEXT:}++  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %a = affine.load %A[%i, %k] : memref<2048x2048xf64>

I see no negative test cases for the case on non rectangular loop bounds.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>+using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+  void runOnAffineLoopNest();++private:+  /// Default cache line size(in bytes). Useful for getting a measure of the

size(in -> size (in

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+// RUN: mlir-opt %s -affine-loop-interchange | FileCheck %s++func @interchange_for_spatial_locality(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// Interchanged for spatial locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:     %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++func @interchange_for_spatial_temporal(%A: memref<2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j] : memref<2048xf64>+      affine.load %A[%j] : memref<2048xf64>+      affine.load %A[%i] : memref<2048xf64>+    }+  }+  return+}+// More reuse with %j, %i order.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV0]]] : memref<2048xf64>+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV0]]] : memref<2048xf64>+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]]] : memref<2048xf64>+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++// CHECK-LABEL: func @matmul_ijk+func @matmul_ijk(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %a = affine.load %A[%i, %k] : memref<2048x2048xf64>+        %b = affine.load %B[%k, %j] : memref<2048x2048xf64>+        %ci = affine.load %C[%i, %j] : memref<2048x2048xf64>+        %p = mulf %a, %b : f64+        %co = addf %ci, %p : f64+        affine.store %co, %C[%i, %j] : memref<2048x2048xf64>+      }+    }+  }+  return+}++// Test whether the ikj permutation has been found.++// CHECK:      affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:        mulf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        addf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++// CHECK-LABEL: func @interchange_for_outer_parallelism+func @interchange_for_outer_parallelism(%A: memref<2048x2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %v = affine.load %A[%i, %j, %k] : memref<2048x2048x2048xf64>+        %p = mulf %v, %v : f64+        affine.store %p, %A[%i - 1, %j, %k] : memref<2048x2048x2048xf64>+      }+    }+  }+  return+}+// %j should become outermost - provides outer parallelism and locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = mulf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV1]] - 1, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++func @test_group_reuse(%A: memref<2048x2048xf64>, %B: memref<?x?xf64>, %C: memref<?x?xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%i, %j] : memref<2048x2048xf64>+      affine.store %v, %C[%i, %j] : memref<?x?xf64>+      %u1 = affine.load %A[%j, %i] : memref<2048x2048xf64>+      %u2 = affine.load %A[%j - 1, %i] : memref<2048x2048xf64>+      %u3 = affine.load %A[%j + 1, %i] : memref<2048x2048xf64>+      %s1 = addf %u1, %u2 : f64+      %s2 = addf %s1, %u3 : f64+      affine.store %s2, %B[%j, %i] : memref<?x?xf64>+    }+  }+  return+}+// Interchanged for better reuse.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV0]]] : memref<2048x2048xf64>+// CHECK:           affine.store %{{.*}}, %{{.*}}[%[[IV1]], %[[IV0]]] : memref<?x?xf64>+// CHECK:           affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<?x?xf64>++// -----++func @interchange_invalid(%A: memref<2048x2048xf64>) {+  affine.for %t = 0 to 2048 {+    affine.for %i = 0 to 2048 {+      %u1 = affine.load %A[%t - 1, %i] : memref<2048x2048xf64>+      %u2 = affine.load %A[%t - 1, %i + 1] : memref<2048x2048xf64>+      %u3 = affine.load %A[%t - 1, %i - 1] : memref<2048x2048xf64>+      %s1 = addf %u1, %u2 : f64+      %s2 = addf %s1, %u3 : f64+      affine.store %s2, %A[%t, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// Interchange is invalid.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK:         affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>++// -----++// Test for handling other than add/mul.++func @interchange_for_spatial_locality_mod(%A: memref<2048x2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %v = affine.load %A[%i mod 2, %k, %j] : memref<2048x2048x2048xf64>+        affine.store %v, %A[%i mod 2, %k, %j] : memref<2048x2048x2048xf64>+      }+    }+  }+  return+}+// Interchanged for spatial locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]] mod 2, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV1]] mod 2, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>++// -----++// Interchange is invalid due to dependences

Nit: terminate with period.

HarshVardhanKumar

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+// RUN: mlir-opt %s -affine-loop-interchange | FileCheck %s++func @interchange_for_spatial_locality(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// Interchanged for spatial locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:     %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++func @interchange_for_spatial_temporal(%A: memref<2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j] : memref<2048xf64>+      affine.load %A[%j] : memref<2048xf64>+      affine.load %A[%i] : memref<2048xf64>+    }+  }+  return+}+// More reuse with %j, %i order.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV0]]] : memref<2048xf64>+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV0]]] : memref<2048xf64>+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]]] : memref<2048xf64>+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++// CHECK-LABEL: func @matmul_ijk+func @matmul_ijk(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %a = affine.load %A[%i, %k] : memref<2048x2048xf64>+        %b = affine.load %B[%k, %j] : memref<2048x2048xf64>+        %ci = affine.load %C[%i, %j] : memref<2048x2048xf64>+        %p = mulf %a, %b : f64+        %co = addf %ci, %p : f64+        affine.store %co, %C[%i, %j] : memref<2048x2048xf64>+      }+    }+  }+  return+}++// Test whether the ikj permutation has been found.++// CHECK:      affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:        mulf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        addf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV2]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++// CHECK-LABEL: func @interchange_for_outer_parallelism+func @interchange_for_outer_parallelism(%A: memref<2048x2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %v = affine.load %A[%i, %j, %k] : memref<2048x2048x2048xf64>+        %p = mulf %v, %v : f64+        affine.store %p, %A[%i - 1, %j, %k] : memref<2048x2048x2048xf64>+      }+    }+  }+  return+}+// %j should become outermost - provides outer parallelism and locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:        %{{.*}} = mulf %{{.*}}, %{{.*}} : f64+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV1]] - 1, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:  }++// -----++func @test_group_reuse(%A: memref<2048x2048xf64>, %B: memref<?x?xf64>, %C: memref<?x?xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%i, %j] : memref<2048x2048xf64>+      affine.store %v, %C[%i, %j] : memref<?x?xf64>+      %u1 = affine.load %A[%j, %i] : memref<2048x2048xf64>+      %u2 = affine.load %A[%j - 1, %i] : memref<2048x2048xf64>+      %u3 = affine.load %A[%j + 1, %i] : memref<2048x2048xf64>+      %s1 = addf %u1, %u2 : f64+      %s2 = addf %s1, %u3 : f64+      affine.store %s2, %B[%j, %i] : memref<?x?xf64>+    }+  }+  return+}+// Interchanged for better reuse.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]], %[[IV0]]] : memref<2048x2048xf64>+// CHECK:           affine.store %{{.*}}, %{{.*}}[%[[IV1]], %[[IV0]]] : memref<?x?xf64>+// CHECK:           affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<?x?xf64>++// -----++func @interchange_invalid(%A: memref<2048x2048xf64>) {+  affine.for %t = 0 to 2048 {+    affine.for %i = 0 to 2048 {+      %u1 = affine.load %A[%t - 1, %i] : memref<2048x2048xf64>+      %u2 = affine.load %A[%t - 1, %i + 1] : memref<2048x2048xf64>+      %u3 = affine.load %A[%t - 1, %i - 1] : memref<2048x2048xf64>+      %s1 = addf %u1, %u2 : f64+      %s2 = addf %s1, %u3 : f64+      affine.store %s2, %A[%t, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// Interchange is invalid.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK:         affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>++// -----++// Test for handling other than add/mul.++func @interchange_for_spatial_locality_mod(%A: memref<2048x2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %v = affine.load %A[%i mod 2, %k, %j] : memref<2048x2048x2048xf64>+        affine.store %v, %A[%i mod 2, %k, %j] : memref<2048x2048x2048xf64>+      }+    }+  }+  return+}+// Interchanged for spatial locality.+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:        %{{.*}} = affine.load %{{.*}}[%[[IV1]] mod 2, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV1]] mod 2, %[[IV0]], %[[IV2]]] : memref<2048x2048x2048xf64>++// -----++// Interchange is invalid due to dependences++func @invalid_loop_interchange(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %c0 = affine.load %A[%j + 1, %i - 1] : memref<2048x2048xf64>+      affine.store %c0, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}+// CHECK:       affine.for %[[IV0:.*]] = 0 to 2048 {+// CHECK-NEXT:    affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:      %{{.*}} = affine.load %{{.*}}[%[[IV1]] + 1, %[[IV0]] - 1] : memref<2048x2048xf64>+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV1]], %[[IV0]]] : memref<2048x2048xf64>++// -----++// Test to make sure there are no crashes/aborts on things that aren't handled.++func @if_else(%A: memref<2048x2048xf64>) {+  %c0 = constant 0.0 : f64+  %c1 = constant 1.0 : f64+  affine.for %i = 0 to 2048 {+    affine.if affine_set<(d0) : (d0 - 1024 >= 0)> (%i) {+      affine.for %j = 0 to 2048 {+        affine.store %c0, %A[%i, %j] : memref<2048x2048xf64>+      }+    } else {+      affine.for %j = 0 to 2048 {+        affine.store %c1, %A[%i, %j] : memref<2048x2048xf64>+      }+    }+  }+  return+}++// -----++// Test case with dynamic loop bounds. The pass should do nothing.++func @non_rectangular_loopnest(%A: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %k = affine_map<(d0)->(d0)>(%i) to 2048 {+      %v = affine.load %A[%k, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%k, %i] : memref<2048x2048xf64>+    }+  }+  return+}++// -----++// Test for interchange on imperfect nests.++func @imperfect_nest(%A: memref<2048x2048xf64>) {+  %c0 = constant 0.0 : f64+  %c1 = constant 1.0 : f64+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 1024 {+      affine.store %c0, %A[%j, %i] : memref<2048x2048xf64>+    }+    affine.for %j = 1024 to 2048 {+      affine.store %c1, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  return+}+// CHECK:      for %[[IV0.*]] = 0 to 1024 {+// CHECK-NEXT:   for %[[IV1.*]] = 0 to 2048 {+// CHECK-NEXT:     affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]]+// CHECK-NEXT:   }+// CHECK-NEXT: }+// CHECK-NEXT: for %[[IV0.*]] = 1024 to 2048 {+// CHECK-NEXT:   for %[[IV1.*]] = 0 to 2048 {+// CHECK-NEXT:      affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]]+// CHECK-NEXT:   }+// CHECK-NEXT: }+++// -----++// Test for interchange on multilevel imperfect nests.++func @multilevel_imperfect_nest(%A: memref<2048x2048xf64>) {+  %c0 = constant 0.0 : f64+  %c1 = constant 1.0 : f64+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 1024 {+      affine.store %c0, %A[%j, %i] : memref<2048x2048xf64>+    }+    affine.for %j = 1024 to 2048 {+      affine.store %c1, %A[%j, %i] : memref<2048x2048xf64>+    }+    affine.for %j = 1024 to 2048 {+      affine.for %k = 0 to 256 {+        affine.store %c0, %A[%j, %k] : memref<2048x2048xf64>+      }+      affine.for %k = 256 to 512 {+        affine.store %c1, %A[%k, %i] : memref<2048x2048xf64>+      }+    }+  }+  return+}+// CHECK:       affine.for %[[IV0:.*]] = 0 to 1024 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:    affine.for %[[IV0:.*]] = 1024 to 2048 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:        affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:    affine.for %[[IV0:.*]] = 1024 to 2048 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 256 {+// CHECK-NEXT:        affine.for %[[IV2:.*]] = 0 to 2048 {+// CHECK-NEXT:          affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        }+// CHECK-NEXT:      }+// CHECK-NEXT:    }+// CHECK-NEXT:    affine.for %[[IV0:.*]] = 256 to 512 {+// CHECK-NEXT:      affine.for %[[IV1:.*]] = 0 to 2048 {+// CHECK-NEXT:        affine.for %[[IV2:.*]] = 1024 to 2048 {+// CHECK-NEXT:          affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+// CHECK-NEXT:        }+// CHECK-NEXT:      }+// CHECK-NEXT:    }++// -----++func @multi_nest_seq(%A: memref<2048x2048xf64>, %B: memref<2048x2048xf64>, %C: memref<2048x2048xf64>) {+  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      %v = affine.load %A[%j, %i] : memref<2048x2048xf64>+      affine.store %v, %A[%j, %i] : memref<2048x2048xf64>+    }+  }+  // CHECK:     affine.for %[[IV0:.*]] = 0 to 2048 {+  // CHECK-NEXT: affine.for %[[IV1:.*]] = 0 to 2048 {+  // CHECK-NEXT:    %{{.*}} = affine.load %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+  // CHECK-NEXT:    affine.store %{{.*}}, %{{.*}}[%[[IV0]], %[[IV1]]] : memref<2048x2048xf64>+  // CHECK-NEXT:  }+  // CHECK-NEXT:}++  affine.for %i = 0 to 2048 {+    affine.for %j = 0 to 2048 {+      affine.for %k = 0 to 2048 {+        %a = affine.load %A[%i, %k] : memref<2048x2048xf64>+        %b = affine.load %B[%k, %j] : memref<2048x2048xf64>+        %ci = affine.load %C[%i, %j] : memref<2048x2048xf64>

All test cases have a constant upper bound. Please change some to use symbolic upper bounds.

HarshVardhanKumar

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent

push eventllvm/llvm-project

Uday Bondhugula

commit sha 430b47a17d2281bd566fc1aac19de80b99e6f0c6

[MLIR] Remove unused arg from affine tiling validity check Drop unused function arg from affine loop tiling validity check.

view details

push time in 2 months

issue commenttensorflow/mlir-hlo

lhlo-copy-removal pass crash

@bondhugula https://reviews.llvm.org/D87128.

Thanks!

bondhugula

comment created time in 2 months

PullRequestReviewEvent

Pull request review commentonnx/onnx-mlir

Rewrite shape and size OP (in progress)

 DenseElementsAttr createDenseElementsAttrFromFloatAttr(   return mlir::DenseElementsAttr::get(tensorType, llvm::makeArrayRef(values)); } +DenseElementsAttr createDenseElementsAttrFromShape(+    PatternRewriter &rewriter, Value value) {+  auto inType = value.getType().dyn_cast<ShapedType>();+  ;+  if (!inType)+    llvm_unreachable("Shaped type is execptd\n");+  auto shape = inType.getShape();+  SmallVector<int64_t, 1> dims(1, inType.getRank());+  SmallVector<int64_t, 4> values;+  for (auto s : shape) {+    values.push_back(s);+  }+  auto tensorType =+      mlir::RankedTensorType::get(dims, rewriter.getIntegerType(64));+  return mlir::DenseElementsAttr::get(tensorType, llvm::makeArrayRef(values));+}++DenseElementsAttr createDenseElementsAttrFromSize(+    PatternRewriter &rewriter, Value value) {+  auto inType = value.getType().dyn_cast<ShapedType>();+  ;+  if (!inType)+    llvm_unreachable("Shaped type is execptd\n");+  auto shape = inType.getShape();+  SmallVector<int64_t, 1> dims(1, 1);+  int64_t size = 1;+  for (auto s : shape) {+    size *= s;+  }

There is a getNumElements() accessor already available.

chentong319

comment created time in 2 months

Pull request review commentonnx/onnx-mlir

Rewrite shape and size OP (in progress)

 DenseElementsAttr createDenseElementsAttrFromFloatAttr(   return mlir::DenseElementsAttr::get(tensorType, llvm::makeArrayRef(values)); } +DenseElementsAttr createDenseElementsAttrFromShape(+    PatternRewriter &rewriter, Value value) {+  auto inType = value.getType().dyn_cast<ShapedType>();+  ;+  if (!inType)+    llvm_unreachable("Shaped type is execptd\n");+  auto shape = inType.getShape();+  SmallVector<int64_t, 1> dims(1, inType.getRank());

... dims = {inType.getRank()};

chentong319

comment created time in 2 months

Pull request review commentonnx/onnx-mlir

Rewrite shape and size OP (in progress)

 DenseElementsAttr createDenseElementsAttrFromFloatAttr(   return mlir::DenseElementsAttr::get(tensorType, llvm::makeArrayRef(values)); } +DenseElementsAttr createDenseElementsAttrFromShape(+    PatternRewriter &rewriter, Value value) {+  auto inType = value.getType().dyn_cast<ShapedType>();+  ;+  if (!inType)+    llvm_unreachable("Shaped type is execptd\n");+  auto shape = inType.getShape();+  SmallVector<int64_t, 1> dims(1, inType.getRank());+  SmallVector<int64_t, 4> values;

You can compactly use the filling ctor.

SmallVector<int64_t, 4> values(shape.begin(), shape.end());
chentong319

comment created time in 2 months

Pull request review commentonnx/onnx-mlir

Rewrite shape and size OP (in progress)

 DenseElementsAttr createDenseElementsAttrFromFloatAttr(   return mlir::DenseElementsAttr::get(tensorType, llvm::makeArrayRef(values)); } +DenseElementsAttr createDenseElementsAttrFromShape(+    PatternRewriter &rewriter, Value value) {+  auto inType = value.getType().dyn_cast<ShapedType>();+  ;+  if (!inType)+    llvm_unreachable("Shaped type is execptd\n");

If you always expect a shaped type, you can use: value.getType().cast<ShapedType>().

chentong319

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent

Pull request review commentonnx/onnx-mlir

Transpose optimization

 using namespace mlir;  namespace { +//===----------------------------------------------------------------------===//+// Support for transpose patterns.+//===----------------------------------------------------------------------===//++/// Compute the combined permute pattern from a pair of permute patterns.+ArrayAttr CombinedTransposePattern(PatternRewriter &rewriter,+    ArrayAttr &firstPermAttr, ArrayAttr &secondPermAttr) {+  // Read first permute vectors.+  SmallVector<int64_t, 4> initialPerm;+  for (auto firstPermVal : firstPermAttr.getValue())+    initialPerm.emplace_back(firstPermVal.cast<IntegerAttr>().getInt());+  // Read second permute vector. Use it as an index in the first permute+  // vector.+  SmallVector<int64_t, 4> resPerm;+  for (auto secondPermVal : secondPermAttr.getValue()) {+    auto index = secondPermVal.cast<IntegerAttr>().getInt();+    resPerm.emplace_back(initialPerm[index]);+  }+  // Convert to Array of Attributes.+  ArrayRef<int64_t> resPermRefs(resPerm);+  return rewriter.getI64ArrayAttr(resPermRefs);+}++/// Test if the permute pattern correspond to an identity pattern.+/// Identity patterns are {0, 1, 2, ... , rank -1}.+bool IsIdentityPermuteVector(ArrayAttr &permAttr) {

All attributes in MLIR are immutable and are wrappers around storage, and can be passed as POD.

AlexandreEichenberger

comment created time in 2 months

PullRequestReviewEvent
PullRequestReviewEvent

push eventllvm/llvm-project

Vincent Zhao

commit sha 28a7dfa33d979e5ff3ed2d975c71b08d611fe6b6

[MLIR] Fixed missing constraint append when adding an AffineIfOp domain The prior diff that introduced `addAffineIfOpDomain` missed appending constraints from the ifOp domain. This revision fixes this problem. Differential Revision: https://reviews.llvm.org/D86421

view details

push time in 2 months

issue openedtensorflow/mlir-hlo

lhlo-copy-removal pass crash

I'm not sure whether issues can be posted on this repo. If not, I can move it to tensorflow proper.

This can be reproduced with a recent commit (d4dcba1340f363762cc6003d4ed1f4db2df61858) and in all certainty with the trunk as well.

Input:

func @func_op_long(%arg0: memref<4xf32>, %arg1: memref<4xf32>, %arg2: memref<4xf32>) {
    %0 = alloc() : memref<4xf32>
    affine.for %arg3 = 0 to 4 {
      %5 = affine.load %arg0[%arg3] : memref<4xf32>
      %6 = affine.load %arg1[%arg3] : memref<4xf32>
      %7 = cmpf "ogt", %5, %6 : f32
      %8 = select %7, %5, %6 : f32
      affine.store %8, %0[%arg3] : memref<4xf32>
    }
    "lmhlo.copy"(%0, %arg2) : (memref<4xf32>, memref<4xf32>) -> ()
    return
  }
$ mlir-hlo-opt -lhlo-copy-removal   /tmp/crash.mlir 
mlir-hlo-opt: external/llvm-project/mlir/lib/IR/Operation.cpp:330: bool mlir::Operation::isBeforeInBlock(mlir::Operation*): Assertion `other && other->block == block && "Expected other operation to have the same parent block."' failed.
PLEASE submit a bug report to  and include the crash backtrace.
Stack dump:
0.	Program arguments: bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt -lhlo-copy-removal /tmp/crash.mlir 
 #0 0x00000000014c1d7d llvm::sys::PrintStackTrace(llvm::raw_ostream&) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14c1d7d)
 #1 0x00000000014bfaed llvm::sys::RunSignalHandlers() (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14bfaed)
 #2 0x00000000014c041d SignalHandler(int) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14c041d)
 #3 0x00007ff0c1bfcdd0 __restore_rt (/lib64/libpthread.so.0+0x12dd0)
 #4 0x00007ff0c164770f raise (/lib64/libc.so.6+0x3770f)
 #5 0x00007ff0c1631b25 abort (/lib64/libc.so.6+0x21b25)
 #6 0x00007ff0c16319f9 _nl_load_domain.cold.0 (/lib64/libc.so.6+0x219f9)
 #7 0x00007ff0c163fcc6 (/lib64/libc.so.6+0x2fcc6)
 #8 0x00000000014526f3 mlir::Operation::isBeforeInBlock(mlir::Operation*) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14526f3)
 #9 0x00000000009f0acf _ZN4llvm12function_refIFvPN4mlir9OperationEEE11callback_fnIZNS1_6detail14walkOperationsIZNS1_5lmhlo12_GLOBAL__N_119LhloCopyRemovalPass14runOnOperationEvEUlNS9_6CopyOpEE_SC_vEENSt9enable_ifIXaantsrSt7is_sameIT0_S3_E5valuesrSF_IT1_vE5valueESI_E4typeES3_OT_EUlS3_E_EEvlS3_ (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x9f0acf)
#10 0x00000000014821e7 mlir::detail::walkOperations(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14821e7)
#11 0x00000000014821e7 mlir::detail::walkOperations(mlir::Operation*, llvm::function_ref<void (mlir::Operation*)>) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x14821e7)
#12 0x00000000009f073f mlir::lmhlo::(anonymous namespace)::LhloCopyRemovalPass::runOnOperation() (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x9f073f)
#13 0x00000000013d37ce mlir::Pass::run(mlir::Operation*, mlir::AnalysisManager) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x13d37ce)
#14 0x00000000013d38ba mlir::OpPassManager::run(mlir::Operation*, mlir::AnalysisManager) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x13d38ba)
#15 0x00000000013da139 mlir::PassManager::run(mlir::ModuleOp) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x13da139)
#16 0x0000000000c40320 performActions(llvm::raw_ostream&, bool, bool, llvm::SourceMgr&, mlir::MLIRContext*, mlir::PassPipelineCLParser const&) (.constprop.101) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc40320)
#17 0x0000000000c40b89 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, bool, bool, bool, bool, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc40b89)
#18 0x0000000000c40cd0 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer> >, mlir::PassPipelineCLParser const&, mlir::DialectRegistry&, bool, bool, bool, bool, bool) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc40cd0)
#19 0x0000000000c4149d mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&, bool) (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0xc4149d)
#20 0x000000000096a885 main (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x96a885)
#21 0x00007ff0c16336a3 __libc_start_main (/lib64/libc.so.6+0x236a3)
#22 0x000000000096402e _start (bazel-bin/tensorflow/compiler/mlir/hlo/mlir-hlo-opt+0x96402e)
Aborted (core dumped)

The input isn't really the expected one for this pass, but this is a bug apparently stemming from an assumption on the input. A check / bail out would have been fine.

@dfki-ehna, @joker-eph

created time in 2 months

push eventllvm/llvm-project

Alexandre E. Eichenberger

commit sha a14a2805b04d49bfbbff6f79f141738c67ad14fd

[MLIR] MemRef Normalization for Dialects When dealing with dialects that will results in function calls to external libraries, it is important to be able to handle maps as some dialects may require mapped data. Before this patch, the detection of whether normalization can apply or not, operations are compared to an explicit list of operations (`alloc`, `dealloc`, `return`) or to the presence of specific operation interfaces (`AffineReadOpInterface`, `AffineWriteOpInterface`, `AffineDMAStartOp`, or `AffineDMAWaitOp`). This patch add a trait, `MemRefsNormalizable` to determine if an operation can have its `memrefs` normalized. This trait can be used in turn by dialects to assert that such operations are compatible with normalization of `memrefs` with nontrivial memory layout specification. An example is given in the literal tests. Differential Revision: https://reviews.llvm.org/D86236

view details

push time in 2 months

push eventpolymage-labs/mlirx

kuterd

commit sha 65fcc0ee3185c684f0a4b7a3bdf14bf3d206eeb7

[Attributor] Function seed allow list - Adds a command line option to seed only selected functions. - Makes seed allow listing exclusive to assertions enabled builds. Reviewed By: sstefan1 Differential Revision: https://reviews.llvm.org/D86129

view details

Nicolai Hähnle

commit sha b37db11d95d87aa53426ce753410677407974a85

MachineSSAUpdater: Allow initialization with just a register class The register class is required for inserting PHIs, but the "current virtual register" isn't actually used for anything, so let's remove it while we're at it. Differential Revision: https://reviews.llvm.org/D85602 Change-Id: I1e647f31570ef21a7ea8e20db3454178e98a6a8b

view details

Arthur Eubanks

commit sha b79889c2b143890463dca015432da29d3833572d

[opt][NewPM] Add basic-aa in legacy PM compatibility mode The legacy PM alias analysis pipeline by default includes basic-aa. When running `opt -foo-pass` under the NPM and -disable-basic-aa is not specified, use basic-aa. This decreases the number of check-llvm failures under NPM from 913 to 752. Reviewed By: ychen, asbirlea Differential Revision: https://reviews.llvm.org/D86167

view details

Paul C. Anagnostopoulos

commit sha e0c01e6cb07133f0bb155a168d967cf854f03ffa

New TableGen Programmer's Reference document This new TableGen Programmer's Reference document replaces the current Language Introduction and Language Reference documents. It brings all the TableGen reference information into one document. As an experiment, I numbered the sections in the document. See what you think about that. Reviewed By: lattner Differential Revision: https://reviews.llvm.org/D85838 (changes by Nicolai Hähnle <nicolai.haehnle@amd.com>: - fixed build error due to toctree in docs/LangRef/index.rst - fixed reference to ProgRef) Change-Id: Ifbdfa39768b8a460aae2873103d31c7b347aff00

view details

Nicolai Hähnle

commit sha 17cd34409a3ab1c46ff55960b7b89c11e1d5674d

Fix two bugs in TGParser::ParseValue TGParser::ParseValue contains two recursive calls, one to parse the RHS of a list paste operator and one to parse the RHS of a paste operator in a class/def name. Both of these calls neglect to check the return value to see if it is null (because of some error). This causes a crash in the next line of code, which uses the return value. The code now checks for null returns. Differential Revision: https://reviews.llvm.org/D85852

view details

Jonas Devlieghere

commit sha d3a49b03a57bb7448620c31f493932018e752c0d

[lldb] Remove --rerun-all-issues as its functionality no longer exists The logic behind --rerun-all-issues was removed when we switched to LIT as the test driver. This patch just removes the dotest option and corresponding entry in configuration.py.

view details

Christopher Tetreault

commit sha 5eff21c8ff2486dccb0c45a925b387eeec83282b

[NFC][documentation] clarify comment in test test referenced a relative path to a file, but the path was not correct relative to the project the test is in Differential Revision: https://reviews.llvm.org/D86368

view details

Roman Lebedev

commit sha 503deec2183d466dad64b763bab4e15fd8804239

Temporairly revert "[SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline" As disscussed in post-commit review starting with https://reviews.llvm.org/D84108#2227365 while this appears to be mostly a win overall, especially code-size-wise, this appears to shake //certain// code pattens in a way that is extremely unfavorable for performance (+30% runtime regression) on certain CPU's (i personally can't reproduce). So until the behaviour is better understood, and a path forward is mapped, let's back this out for now. This reverts commit 1d51dc38d89bd33fb8874e242ab87b265b4dec1c.

view details

Paul C. Anagnostopoulos

commit sha 196e6f9f18933ed33eee39a1c9350ccce6b18e2c

Replace TableGen range piece punctuator with '...' The TableGen range piece punctuator is currently '-' (e.g., {0-9}), which interacts oddly with the fact that an integer literal's sign is part of the literal. This patch replaces the '-' with the new punctuator '...'. The '-' punctuator is deprecated. Differential Revision: https://reviews.llvm.org/D85585 Change-Id: I3d53d14e23f878b142d8f84590dd465a0fb6c09c

view details

António Afonso

commit sha 02bf5632a94da6c3570df002804f8d3f79c11bfc

Fix swig scripts install target name LLVM install component targets needs to be in the form of: install-{target}[-stripped] I tested with: ``` cmake ... -DLLVM_ENABLE_PROJECTS="clang;lldb" -DLLVM_DISTRIBUTION_COMPONENTS="lldb;liblldb;lldb-python-scripts;" ... DESTDIR=... ninja install-distribution ``` @JDevlieghere `finish_swig_python_scripts` is a really weird name for a distribution component, any reason that it has to be this way? Differential Revision: https://reviews.llvm.org/D86235

view details

Fangrui Song

commit sha 72ddaedddafc26b5671d56d71b1bccf7f46f65b4

[Attributor][test] Add REQUIRES: asserts after D86129

view details

Alina Sbirlea

commit sha f55ad3973dec62b1dd6dbe9c4eb81c5e883e3628

[DomTree] Extend update API to allow a post CFG view. Extend the `applyUpdates` in DominatorTree to allow a post CFG view, different from the current CFG. This patch implements the functionality of updating an already up to date DT, to the desired PostCFGView. Combining a set of updates towards an up to date DT and a PostCFGView is not yet supported. Differential Revision: https://reviews.llvm.org/D85472

view details

Josh Stone

commit sha b26b32b5d3b85812a12f5e3bf011428612f78e19

lld: link libatomic if needed for Timer D80298 made Timer::total atomic, but this requires linking libatomic on some targets. Reviewed By: aaronpuchert Differential Revision: https://reviews.llvm.org/D85691

view details

Azharuddin Mohammed

commit sha 6a64079699e7b56badd292e39cad4b8bfe941aec

Fix llvm/test/tools/lto/hide-linkonce-odr.ll Remove unnecessary dependency on libSystem.

view details

Jonas Devlieghere

commit sha 86fc1933099d8818c7d7559ae41e5903a1daf9bd

[lldb] Don't pass --rerun-all-issues on Windows. The functionality has been removed for a while and now the dotest argument has been removed asll.

view details

Sourabh Singh Tomar

commit sha f91d18eaa946b2d2ea5a9334fb099c3e409ad2d1

[DebugInfo][flang]Added support for representing Fortran assumed length strings This patch adds support for representing Fortran `character(n)`. Primarily patch is based out of D54114 with appropriate modifications. Test case IR is generated using our downstream classic-flang. We're in process of upstreaming flang PR's but classic-flang has dependencies on llvm, so this has to get in first. Patch includes functional test case for both IR and corresponding dwarf, furthermore it has been manually tested as well using GDB. Source snippet: ``` program assumedLength call sub('Hello') call sub('Goodbye') contains subroutine sub(string) implicit none character(len=*), intent(in) :: string print *, string end subroutine sub end program assumedLength ``` GDB: ``` (gdb) ptype string type = character (5) (gdb) p string $1 = 'Hello' ``` Reviewed By: aprantl, schweitz Differential Revision: https://reviews.llvm.org/D86305

view details

Sourabh Singh Tomar

commit sha 12edd4b36475170d445ac93da34e4883f23a8361

Fix arm bot failure after f91d18eaa946b2 llc doesn't seem to automatically pick default `--triple`. using `%llc_dwarf` should fix this. Builder: http://lab.llvm.org:8011/builders/clang-cmake-armv7-quick/builds/20310 Error log: bin/llc: error: : error: unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.

view details

Uday Bondhugula

commit sha b8cc449b849e7954159d3b2588f20066b243e4af

[MLIR][NFC] Update MLIR vim syntax file - std ops + types Update vim syntax file to include more std ops, and for int types. Differential Revision: https://reviews.llvm.org/D86370

view details

Fangrui Song

commit sha 7646a67104d5981483c971719457e44bed764af3

[DebugInfo][test] Move distringtype.ll to X86/ subdir to fix failures when X86 target is not built

view details

George Mitenkov

commit sha b65ba70479986aba5e06126417ba483165031093

[MLIR][SPIRVToLLVM] Updated the documentation for the conversion This patch updates the SPIR-V to LLVM conversion manual. Particularly, the following sections are added: - `spv.EntryPoint`/`spv.ExecutionMode` handling - Mapping for `spv.AccessChain` - Change in allowed storage classes for `spv.globalVariable` - Change of the runner section name Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D86288

view details

push time in 2 months

PullRequestReviewEvent

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha aaed01bdb99945d671ac28bb0d3203cc50028b87

PR #42508: [MLIR] Erase dead lmhlo.constant ops Imported from GitHub PR https://github.com/tensorflow/tensorflow/pull/42508 An lmhlo.constant op on an memref that is locally allocated and with no users other than dealloc's can be deleted. Add a canonicalization pattern for this. Copybara import of the project: -- 8758c409a15f567e7cb8e1077faa020f5705c85a by Uday Bondhugula <uday@polymagelabs.com>: [MLIR] Erase dead lmhlo.constant ops An lmhlo.constant op on an memref that is locally allocated and with no other users (other than dealloc's) can be deleted. Add a canonicalization patter for this. COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/tensorflow/pull/42508 from polymage-labs:lhlo_constant_erase 8758c409a15f567e7cb8e1077faa020f5705c85a PiperOrigin-RevId: 328042416 Change-Id: I27f9b5b5297bbf6fe81aff589f009197b75f49eb

view details

Yuanzhong Xu

commit sha 2e8dec076f49a3b05e4fa51616ab7ae30b98d984

[XLA:SPMD] Avoid unnecessary collective permutes 1. Try to reuse the original target tiled sharding when finding compatible target from partial sharding. 2. If the HLO is a broadcast, check if data is already the same between source/target pairs. PiperOrigin-RevId: 328043490 Change-Id: I69dec53c50cb6cedf586afafc5181cd1ee29cdc6

view details

Mehdi Amini

commit sha 01b030b77623c5fa00a43640f77af2a43572d02c

Integrate LLVM at llvm/llvm-project@f164534ca8e0 Updates LLVM usage to match [f164534ca8e0](https://github.com/llvm/llvm-project/commit/f164534ca8e0) PiperOrigin-RevId: 328046788 Change-Id: I714164211a50e0d273ec49046c66f7e484989428

view details

Akshay Modi

commit sha 10332bb88796092990f7f3a1e97553258f57e763

PFor inputs should be ndarrays. PiperOrigin-RevId: 328068227 Change-Id: Ia084d946f3a0e5d071d7e8fec4263d1da26d9671

view details

Chao Mei

commit sha b57f22382d97c61a644ba1c3a3d69f21d06504be

Use BuiltinOpResolverWithoutDefaultDelegates instead of BuiltinOpResolver for unit tests of xnnpack delegate itself to prepare for enabling xnnpack delegate by default across all platforms in the next 2.4.0 release. PiperOrigin-RevId: 328068258 Change-Id: I3459bc3e7f25d2925da65fba3e19ac2bad57fff1

view details

Mehdi Amini

commit sha 536d5658f5f0eb04067d1ed7cc084f62a6aa2932

Integrate LLVM at llvm/llvm-project@f6decfa36d89 Updates LLVM usage to match [f6decfa36d89](https://github.com/llvm/llvm-project/commit/f6decfa36d89) PiperOrigin-RevId: 328073633 Change-Id: I5cd74bcf36c453cf073766f910a0f8442b66cb93

view details

A. Unique TensorFlower

commit sha d4dcba1340f363762cc6003d4ed1f4db2df61858

Use BuiltinOpResolverWithoutDefaultDelegates instead of BuiltinOpResolver for unit tests of xnnpack delegate itself to prepare for enabling xnnpack delegate by default across all platforms in the next 2.4.0 release. PiperOrigin-RevId: 328076934 Change-Id: I69e21a6fbbe1b0e7146669ccd6481b774dcd9d2e

view details

Uday Bondhugula

commit sha 5994915525ec2e932125aa1f133ce2260ba100af

[MLIR] Add folder for mhlo get_dimension_size Add folder for mhlo GetDimensionSizeOp. get_dimension_size folds to a constant when the corresponding tensor dimension size is statically known / constant.

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Add folder for mhlo get_dimension_size

Rebased and fixed conflict.

bondhugula

comment created time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lmhlo.constant ops

Actually I think we will be able to integrate directly.

Anyway, rebased and pushed.

bondhugula

comment created time in 2 months

push eventpolymage-labs/tensorflow

Hanhan Wang

commit sha fb1ed49e98a71cfa55de32ba94089ea6f325600e

Enhance lowering reshape op to Linalg. Handle non-expansion and non-collapsion cases by rewriting it to two reshape ops. PiperOrigin-RevId: 327926863 Change-Id: I2b9f406d505ab69d9e25e892f75f38aa03467e1e

view details

A. Unique TensorFlower

commit sha 1f62e9104a7e78b215b5c0984bf8b902f7283e2b

compat: Update forward compatibility horizon to 2020-08-22 PiperOrigin-RevId: 327934745 Change-Id: I5b03985ca1fe0858e9a47becb5dd6615a600da90

view details

A. Unique TensorFlower

commit sha c352bafb9a66a30ea232329bb33e4bc1b151f699

Update GraphDef version to 501. PiperOrigin-RevId: 327934747 Change-Id: I80eca5e53d3c3d1f3bc9996418d3713812cebd92

view details

Fergus Henderson

commit sha 885a34acbf0e67dd19aa9ad1c446e952bf066c20

(lite) Change layout of hide_symbols_with_allowlist.sh to conform to the Google Shell Style guide <https://google.github.io/styleguide/shellguide.html#s5.4-loops> PiperOrigin-RevId: 327935590 Change-Id: I9046a7fbf51fdbcd510633513d982e27e471ceff

view details

Mehdi Amini

commit sha f6cb841c0fedf7408559b983a5340a42a45fe9d0

Explicitly load Mhlo dialect in HLO importer (NFC) MLIR is moving to require explicitly loading of Dialect before creating entities in a Dialect. PiperOrigin-RevId: 327996308 Change-Id: Iba1de332fbd2c7d4d6a336b54ef999decc520ed3

view details

Berkin Ilbeyi

commit sha f3cd4e4a4f9e9c169a1ed68d0c53f7b7c1050a1a

[XLA] Fix "lambda-expression in unevaluated context" error. PiperOrigin-RevId: 327999499 Change-Id: Iccd368a8784550a0a14f146924ee8845132e0b38

view details

Yanhui Liang

commit sha 0c2421920724c599164007cd306b77ffe439bd24

Internal change for Keras benchmarks. PiperOrigin-RevId: 327999616 Change-Id: Ie46cc103cad75561bd863c8d477eaa5c9f319452

view details

Priya Gupta

commit sha cb4cb2dd78afac46f5e32a337759ae21d88c6efd

Add MultiWorkerMirroredStratgy to custom training loop tests. PiperOrigin-RevId: 328002300 Change-Id: I5713bc15bb0d7a8647b1097fe81570ace30cb1c5

view details

A. Unique TensorFlower

commit sha 1b6ff7950025c18a40b462176742c9e63761580a

Update GraphDef version to 502. PiperOrigin-RevId: 328009562 Change-Id: I462f5a36c28b5e71ce562e67c3226feea5d2fb7d

view details

A. Unique TensorFlower

commit sha cd00fde218710400725237e684bf0a2a7d9f100b

compat: Update forward compatibility horizon to 2020-08-23 PiperOrigin-RevId: 328009565 Change-Id: If7bb6f781e9a5381e35e98408272a5cdb42f64c8

view details

Eugene Burmako

commit sha 88d4492d7537211583e12291591b14c638ebb742

Explicitly load standard dialect in HLO importer (NFC) MLIR is moving to require explicitly loading of Dialect before creating entities in a Dialect. PiperOrigin-RevId: 328037037 Change-Id: Ib46275b26e8f77aab0fbd0f70cd2a48844dc360c

view details

Uday Bondhugula

commit sha 7085d3473663bc26418ea7650b22bda3649c08d0

[MLIR] Erase dead lmhlo.constant ops An lmhlo.constant op on an memref that is locally allocated and with no other users (other than dealloc's) can be deleted. Add a canonicalization patter for this.

view details

push time in 2 months

push eventpolymage-labs/tensorflow

Evgeniy Polyakov

commit sha 6e2c61a8374ea94a58d235055d4926679739cf81

Export TypeOf(), it is very useful to determine size of the underlying type when going to preallocate are for tf.ReadTensor()

view details

Evgeniy Polyakov

commit sha 8e369b870aec1a30777a3f54a9088ea08df85df4

Added Reshape() tensor method

view details

Eugene Kuznetsov

commit sha 6b249a8a5c00b3dcf2db0145e785d9b902908d10

HSACO cache Deleting temporary files after compilation

view details

Gianluca Baratti

commit sha 79a29bcbb742647506aa25e9630676b8e2945b28

Merge pull request #1 from tensorflow/master Alignment

view details

codeadmin_peritiae

commit sha d6c0858665de6036de24991b29d74b182cfcf5ae

Added a "note" in tf.where documentation suggesting a workaround for issue #38349

view details

Daniel Nguyen

commit sha c5ef52c5f0c698b76133eae0aa93d83fa7ab9f79

added draft of function

view details

Mahmoud Abuzaina

commit sha 7fbbbe1a9198a307485cac42960a70847c57bba7

Enabling native format in Conv fwd

view details

Mahmoud Abuzaina

commit sha 353935b9925c3dd0783cbf661119f799336d5718

Enabling native foramt in Conv bwd

view details

Kaixi Hou

commit sha bb315c52e06163beeb61400fb347536a71ce8710

Fix a conv3d dgrad type issue

view details

Eugene Kuznetsov

commit sha a190fee2a5d696065c618fe014445b244d07bde2

Reviewer requested changes

view details

Daniel Nguyen

commit sha 0a79e7111037c4bb793964708acc27f4e7cc12ee

finished implementation and passes tests

view details

Deven Desai

commit sha 4b058d62a60b53ce52304d0450cdbd334570b03e

[ROCm] Explicitly specifying dtype=np.float32 for *ExpandedBatch subtests in conv_ops_3d_test The following commit adds the *ExpandedBatch subtests in the unit test `conv_ops_3d_test` https://github.com/tensorflow/tensorflow/commit/549e69ca1316cd6bc54cbbe28dd9340fdd7b8e76 Those unit tests currently fail on the ROCm platform, because the dtype is not explicitly specified in the capp to `np.asarray` within the `_CreateNumpyTensor`. This defaults the datatype for the data/filter tensors to `double/float64` and ROCm does not have support for it, wich leads to those subtests failing. This PR/commit adds an explicit `dtype=np.float32` argument to above mentioned call to `np.asarray`, thus making the data/filter tensors to be of `float32` type, which makes those subtests pass on the ROCm platform. Changing the dtype from `float64` to `float32` does change what the subtests are testing, so this change should be ok.

view details

Mahmoud Abuzaina

commit sha 3d0dda22efd911b1f0c01b48f5eac4d639a1f473

Merge branch 'master' into mabuzain/native-fmt-conv-fwd

view details

Mahmoud Abuzaina

commit sha f2a7ebef0413312e13b2a7aa4924847316090b18

Merge branch 'master' into mabuzain/native-fmt-conv-bwd

view details

Katherine Tian

commit sha 576863f05063d2e68285da81853cd70549305fa9

don't return erased element and clean tests

view details

Katherine Tian

commit sha c1b4fcd787d16f64e6c4b341094524f4dc58a8dc

clean up

view details

Katherine Tian

commit sha 00b44494f75b78afc37a0082f8fdff255886e9cf

clean up

view details

codeadmin_peritiae

commit sha 24d58d6c02f72bd9f5cb440abdd4f24d7b607192

Type mismatch fixed

view details

Daniel Nguyen

commit sha aa88605eae286960f52d1dc3fdee06238221d6d2

clean up only

view details

Katherine Tian

commit sha 5d619193aa975e931b2c691e86ce3524743a1d19

clean up

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Add folder for mhlo get_dimension_size

Seems like in conflicts, can you rebase?

Done.

bondhugula

comment created time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lmhlo.constant ops

@bondhugula please fix ubuntu sanity build failures ?

@rthadur The build failure is unrelated to my revision.

bondhugula

comment created time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lmhlo.constant ops

The build still fails and the failure is unrelated to this PR. The failing test is: //tensorflow/compiler/xla/service:memory_space_assignment_test

bondhugula

comment created time in 2 months

push eventllvm/llvm-project

Uday Bondhugula

commit sha b8cc449b849e7954159d3b2588f20066b243e4af

[MLIR][NFC] Update MLIR vim syntax file - std ops + types Update vim syntax file to include more std ops, and for int types. Differential Revision: https://reviews.llvm.org/D86370

view details

push time in 2 months

push eventpolymage-labs/tensorflow

Evgeniy Polyakov

commit sha 6e2c61a8374ea94a58d235055d4926679739cf81

Export TypeOf(), it is very useful to determine size of the underlying type when going to preallocate are for tf.ReadTensor()

view details

Evgeniy Polyakov

commit sha 8e369b870aec1a30777a3f54a9088ea08df85df4

Added Reshape() tensor method

view details

Eugene Kuznetsov

commit sha 6b249a8a5c00b3dcf2db0145e785d9b902908d10

HSACO cache Deleting temporary files after compilation

view details

Gianluca Baratti

commit sha 79a29bcbb742647506aa25e9630676b8e2945b28

Merge pull request #1 from tensorflow/master Alignment

view details

codeadmin_peritiae

commit sha d6c0858665de6036de24991b29d74b182cfcf5ae

Added a "note" in tf.where documentation suggesting a workaround for issue #38349

view details

Daniel Nguyen

commit sha c5ef52c5f0c698b76133eae0aa93d83fa7ab9f79

added draft of function

view details

Mahmoud Abuzaina

commit sha 7fbbbe1a9198a307485cac42960a70847c57bba7

Enabling native format in Conv fwd

view details

Mahmoud Abuzaina

commit sha 353935b9925c3dd0783cbf661119f799336d5718

Enabling native foramt in Conv bwd

view details

Kaixi Hou

commit sha bb315c52e06163beeb61400fb347536a71ce8710

Fix a conv3d dgrad type issue

view details

Eugene Kuznetsov

commit sha a190fee2a5d696065c618fe014445b244d07bde2

Reviewer requested changes

view details

Daniel Nguyen

commit sha 0a79e7111037c4bb793964708acc27f4e7cc12ee

finished implementation and passes tests

view details

Deven Desai

commit sha 4b058d62a60b53ce52304d0450cdbd334570b03e

[ROCm] Explicitly specifying dtype=np.float32 for *ExpandedBatch subtests in conv_ops_3d_test The following commit adds the *ExpandedBatch subtests in the unit test `conv_ops_3d_test` https://github.com/tensorflow/tensorflow/commit/549e69ca1316cd6bc54cbbe28dd9340fdd7b8e76 Those unit tests currently fail on the ROCm platform, because the dtype is not explicitly specified in the capp to `np.asarray` within the `_CreateNumpyTensor`. This defaults the datatype for the data/filter tensors to `double/float64` and ROCm does not have support for it, wich leads to those subtests failing. This PR/commit adds an explicit `dtype=np.float32` argument to above mentioned call to `np.asarray`, thus making the data/filter tensors to be of `float32` type, which makes those subtests pass on the ROCm platform. Changing the dtype from `float64` to `float32` does change what the subtests are testing, so this change should be ok.

view details

Mahmoud Abuzaina

commit sha 3d0dda22efd911b1f0c01b48f5eac4d639a1f473

Merge branch 'master' into mabuzain/native-fmt-conv-fwd

view details

Mahmoud Abuzaina

commit sha f2a7ebef0413312e13b2a7aa4924847316090b18

Merge branch 'master' into mabuzain/native-fmt-conv-bwd

view details

Katherine Tian

commit sha 576863f05063d2e68285da81853cd70549305fa9

don't return erased element and clean tests

view details

Katherine Tian

commit sha c1b4fcd787d16f64e6c4b341094524f4dc58a8dc

clean up

view details

Katherine Tian

commit sha 00b44494f75b78afc37a0082f8fdff255886e9cf

clean up

view details

codeadmin_peritiae

commit sha 24d58d6c02f72bd9f5cb440abdd4f24d7b607192

Type mismatch fixed

view details

Daniel Nguyen

commit sha aa88605eae286960f52d1dc3fdee06238221d6d2

clean up only

view details

Katherine Tian

commit sha 5d619193aa975e931b2c691e86ce3524743a1d19

clean up

view details

push time in 2 months

push eventpolymage-labs/mlirx

Bevin Hansson

commit sha 956582aa165804dd8335879c3a7f833901e5424c

[Sema] Iteratively strip sugar when removing address spaces. ASTContext::removeAddrSpaceQualType does not properly deal with sugar. QualTypes derive their ASes from the AS on the canonical type, not the type itself. However, removeAddrSpaceQualType only strips the outermost qualifiers, which means that it can fail to remove addrspace qualifiers if there is sugar in the way. Change the function to desugar types until the address space really no longer exists on the corresponding QualType. This should guarantee the removal of the address space. This fixes the erroneous behavior in D62574. Reviewed By: rjmccall, svenvh Differential Revision: https://reviews.llvm.org/D83325

view details

Gousemoodhin Nadaf

commit sha d4408fe17f33bcd664ec8f468abfd1094e84a7c1

[clang] Do not crash for unsupported fixed point to floating point conversion - Fixed point to floating point conversion is unimplemented. - If one of the operands has a floating type and the other operand has a fixed-point type, the function handleFloatConversion() is called because one of the operands has a floating type, but we do not handle fixed point type in this function (Implementation of fixed point to floating point conversion is missing), due to this compiler crashes. In order to avoid compiler crash, when one of the operands has a floating type and the other operand has a fixed-point type, return NULL. - FIXME: Implementation of fixed point to floating point conversion. - I am going to resolve FIXME in followup patches. - Add the test case. Reviewed By: ebevhan Differential Revision: https://reviews.llvm.org/D81904

view details

Jay Foad

commit sha fa2b836ea393dc4d24d2fced0ea78b7527f77de9

[GlobalISel] Add G_ABS This is equivalent to the new llvm.abs intrinsic added by D84125 with is_int_min_poison=0. Differential Revision: https://reviews.llvm.org/D85718

view details

Whitney Tsang

commit sha aa994d9867e38ad12a5d43edcfb8d53a26b73020

[NFC][LoopUnrollAndJam] Use BasicBlock::replacePhiUsesWith instead of static function updatePHIBlocks. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D85673

view details

Tim Keith

commit sha cf715717aa8cb17d98177af3ce63c7e20e8d25a3

[flang] Allow compiler directives in more places Allow compiler directives in the implicit-part and before USE statements in the specification-part. Differential Revision: https://reviews.llvm.org/D85693

view details

Matt Arsenault

commit sha 0dc4c36d3aa1c1bcae4aa00e7808722ebfd22f6d

AMDGPU/GlobalISel: Manually select llvm.amdgcn.writelane Fixup the special case constant bus handling pre-gfx10.

view details

Jonas Devlieghere

commit sha c135744b1df394f51b6a08bc562f99a1236e772c

[lldb/CMake] Separate CMake code for Lua and Python (NFC) Separate the CMake logic for Lua and Python to clearly distinguish between code specific to either scripting language and the code shared by both. What this patch does is: - Move Python specific code into the bindings/python subdirectory. - Move the Lua specific code into the bindings/lua subdirectory. - Add the _python suffix to Python specific functions/targets. - Fix a dependency issue that would check the binding instead of whether the scripting language is enabled. Note that this patch also changes where the bindings are generated, which might affect downstream projects that check them in. Differential revision: https://reviews.llvm.org/D85708

view details

Simon Pilgrim

commit sha fe1f36986b23a67c218d7ca24741d5ebd6886473

[X86][SSE] combineShuffleWithHorizOp - avoid unnecessary subtraction. NFCI. We can safely replace ((M - NumElts) % NumEltsPerLane) with (M % NumEltsPerLane) as the modulo result will be the same.

view details

Xing GUO

commit sha 45a4f4c806669c60adc28a63b19f4e46b99c5efb

[DWARFYAML] Teach yaml2obj emit the correct line table program. The following issues are addressed in this patch. 1. The operands of DW_LNE_set_discriminator should be an ULEB128 number rather than an address. 2. Test the emitted opcodes. Reviewed By: jhenderson Differential Revision: https://reviews.llvm.org/D85717

view details

Eric Christopher

commit sha 8155cb27a2327834ae7f0d320dc0e26f108891a8

Fold Opcode into assert uses to fix an unused variable warning without asserts.

view details

Yitzhak Mandelbaum

commit sha 645dd1b3bf8d976683c72b9faf501d6f0b16326e

[libTooling] Cleanup and reorder `RewriteRule.h`. This patch lifts `RootID` out of the `RewriteRule` class so that constructs (e.g. inline functions) can that refer to the root id don't need to depend on the `RewriteRule` class. With this dependency, the patch is able to collect all `ASTEdit` helper function declarations together with the class declaration, before the introduction of the `RewriteRule` class. In the process, we also adjust some of the comments. This patch is essentially a NFC. Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D85733

view details

David Goldman

commit sha cb29c33984bf40beebd22edf80a5034cf8849307

[clangd][ObjC] Improve xrefs for protocols and classes Summary: Previously clangd would jump to forward declarations for protocols and classes instead of their definition/implementation. Reviewers: sammccall Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, kadircet, usaxena95, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D83501

view details

Nikita Popov

commit sha d110d4aaff31198cd455b68617978019a8339773

[InstSimplify] Forbid undef folds in expandBinOp This is the replacement for D84250 based on D84792. As we recursively fold with the same value twice, we need to disable undef folds, to prevent an undef from being folded to two different values. Reverting rG00f3579aea6e3d4a4b7464c3db47294f71cef9e4 and using the test case from https://reviews.llvm.org/D83360#2145793, it no longer performs the incorrect fold. Differential Revision: https://reviews.llvm.org/D85684

view details

Yitzhak Mandelbaum

commit sha d8c1f43dcc949fda5ce37a122d1a0d92975de82c

[libTooling] Move RewriteRule include edits to ASTEdit granularity. Currently, changes to includes are applied to an entire rule. However, include changes may be specific to particular edits within a rule (for example, they may apply to one file but not another). Also, include changes may need to carry metadata, just like other changes. So, we make include changes first-class edits. Reviewed By: tdl-g Differential Revision: https://reviews.llvm.org/D85734

view details

Lang Hames

commit sha 989d8dc9fe201eaa2c323d92bc39c00ee53f5012

[llvm-jitlink] Fix a file comment.

view details

Lang Hames

commit sha eed19c8c7e7a7a44e4a417b8df7afce5c4ae738c

[ORC] Move file-descriptor based raw byte channel into a public header. This will enable re-use in other llvm tools.

view details

Matt Arsenault

commit sha 8dd2eb10bbc40610b8943cfb04a81e9c7dbc71e1

GlobalISel: Fix typo

view details

Simon Pilgrim

commit sha 2655bd51d6a350b1aa71566fa9cbaad64990336a

[X86][SSE] combineShuffleWithHorizOp - canonicalize SHUFFLE(HOP(X,Y),HOP(Y,X)) -> SHUFFLE(HOP(X,Y)) Attempt to canonicalize binary shuffles of HOPs with commuted operands to an unary shuffle.

view details

Simon Pilgrim

commit sha b9aaf32f46494695d1b20c08730c1111536e17f8

Fix MSVC "not all control paths return a value" warning. NFC.

view details

jasonliu

commit sha 0dc5e0cd393d1bf451c27c1a2d8471a4df0f42b0

[XCOFF][llvm-readobj] Move XCOFF test to XCOFF directory Summary: COFF and XCOFF in llvm are very different and serves different platform. Since we have different Dumper.cpp file in llvm-readobj's implementation, we should have separate testing directory for them too. Reviewed By: jhenderson, DiggerLin Differential Revision: https://reviews.llvm.org/D85675

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lmhlo.constant ops

//tensorflow/tools/ci_build:gen_ci_sanity_out is failing here.

bondhugula

comment created time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Add folder for mhlo get_dimension_size

@sherhut @joker-eph The internal builds here may fail here since a lot of ops would get folded away (since shapes are typically constant, and more operands would become constant with this folding).

bondhugula

comment created time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Add folder for mhlo get_dimension_size

Fixed dialect name in test cases: xla_hlo -> mhlo

bondhugula

comment created time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha 4969511dbd62e20c550350c9d9f72dba9661936d

[MLIR] Add folder for mhlo get_dimension_size Add folder for mhlo GetDimensionSizeOp. get_dimension_size folds to a constant when the corresponding tensor dimension size is statically known / constant.

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lhlo.constant ops

Fixed dialect names int test cases: xla_lhlo -> lmhlo

bondhugula

comment created time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha 3643169c9e68843a9ada37d741f5c7543a6f25f0

[MLIR] Erase dead lmhlo.constant ops An lmhlo.constant op on an memref that is locally allocated and with no other users (other than dealloc's) can be deleted. Add a canonicalization patter for this.

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lhlo.constant ops

Can you add a TODO in the pattern?

Done. In fact, the generalization would go beyond dead stores -- to also transparently cover other ops like DMA operations transferring to that memref.

bondhugula

comment created time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha 480063bd8b0d23f890f4a36d10f489ffbc50a347

[MLIR] Erase dead lhlo.constant ops An mlhlo.constant op on an memref that is locally allocated and with no other users (other than dealloc's) can be deleted. Add a canonicalization patter for this.

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lhlo.constant ops

That's exactly what I had in mind and was also related to the question on the memory effect interface I posted on discord. It'd require a bit more thought to design that.

bondhugula

comment created time in 2 months

push eventllvm/llvm-project

Arjun P

commit sha 33f574672f40fb94c818901208824303350df55e

[MLIR] Redundancy detection for FlatAffineConstraints using Simplex This patch adds the capability to perform constraint redundancy checks for `FlatAffineConstraints` using `Simplex`, via a new member function `FlatAffineConstraints::removeRedundantConstraints`. The pre-existing redundancy detection algorithm runs a full rational emptiness check for each inequality separately for checking redundancy. Leveraging the existing `Simplex` infrastructure, in this patch we have an algorithm for redundancy checks that can check each constraint by performing pivots on the tableau, which provides an alternative to running Fourier-Motzkin elimination for each constraint separately. Differential Revision: https://reviews.llvm.org/D84935

view details

push time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Add folder for mhlo get_dimension_size

@sherhut @stellaraccident @River707

bondhugula

comment created time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha ae581ca0ba906934dc045ee34931ea61fd2bde85

[MLIR] Add folder for mhlo get_dimension_size Add folder for mhlo GetDimensionSizeOp. get_dimension_size folds to a constant when the corresponding tensor dimension size is statically known / constant.

view details

push time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha c777fba7c3d2703e92a48a31e40f36ff41eb1fe0

[MLIR] Erase dead lhlo.constant ops An mlhlo.constant op on an memref that is locally allocated and with no other users (other than dealloc's) can be deleted. Add a canonicalization patter for this.

view details

push time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha d150a91b905abc556ef8d50e946c08b09a0fec8d

[MLIR] Add folder for xla_hlo get_dimension_size Add folder for mhlo GetDimensionSizeOp (xla_hlo::get_dimension_size). get_dimension_size folds to a constant when the corresponding tensor dimension size is statically known / constant.

view details

push time in 2 months

PR opened tensorflow/tensorflow

[MLIR-LAIR] Add folder for xla_hlo get_dimension_size

Add folder for xla_hlo GetDimensionSizeOp (xla_hlo::get_dimension_size). get_dimension_size folds to a constant when the corresponding tensor dimension size is statically known / constant.

+24 -0

0 comment

3 changed files

pr created time in 2 months

pull request commenttensorflow/tensorflow

[MLIR] Erase dead lhlo.constant ops

@River707 @sherhut @stellaraccident for visibility.

bondhugula

comment created time in 2 months

PR opened tensorflow/tensorflow

[MLIR] Erase dead lhlo.constant ops

An xla_lho.constant op on an memref that is locally allocated and with no users other than dealloc's can be deleted. Add a canonicalization pattern for this.

+54 -0

0 comment

3 changed files

pr created time in 2 months

push eventpolymage-labs/tensorflow

Uday Bondhugula

commit sha c2a31b3d7d26f5f8909b7c59293e7b4ad7cc1281

[MLIR] Erase dead lhlo.constant ops An xla_lho.constant op on an memref that is locally allocated and with no other users (other than dealloc's) can be deleted. Add a canonicalization patter for this.

view details

push time in 2 months

create barnchpolymage-labs/tensorflow

branch : get_dimension_size_fold

created branch time in 2 months

create barnchpolymage-labs/tensorflow

branch : lhlo_constant_erase

created branch time in 2 months

pull request commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

Test on 8-d loop nest: Time taken was 0.090387s. The loop nest used for test was:

Was it on the release build? 90ms would be a lot for a single loop nest.

And how did you time it? You should use -print-pass-timing.

HarshVardhanKumar

comment created time in 2 months

pull request commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

Test on 8-d loop nest: Time taken was 0.090387s. The loop nest used for test was:

Was it on the release build? 90ms would be a lot for a single loop nest.

HarshVardhanKumar

comment created time in 2 months

issue commentbazelbuild/bazel

bazel sync command should not download unused external dependencies

Another nice improvement would be if one could do bazel sync @repo-name - often I just want to update one or two external dependnecies unconditionally and having to wait potentially many many minutes for that if a lot of deps have to be redownloaded is quite expensive.

I'm just checking if anything like this was added. If I there is an external dependency configured as a git_repository and that git repo branch being pointed to has updates that we would like to sync with, could one make bazel just fetch those? Thanks!

EladWeinson

comment created time in 2 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.

Please don't keep the resize outside - it should be here.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// so as to minimize the frequence of synchronization. The pass works for both+// perfectly nested and implerfectly nested loops (any level of nesting).+// However in presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+}++/// Returns True if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Returns true if this entire column of `matrix` is zero.+static bool checkColumnIsZero(ArrayRef<SmallVector<int64_t, 4>> matrix,+                              unsigned column) {+  for (const SmallVector<int64_t, 4> &row : matrix) {+    if (row[column] != 0)+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr &expr, SmallVector<int64_t, 4> &row,+                                int64_t &constantVectorValue,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;+    +    switch (lhs.getKind()) {+    case AffineExprKind::DimId: {+      auto dim = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dim.getPosition()]];+      break;+    }+    case AffineExprKind::Constant: {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;+    +    isConstSubExpr = false;+    switch (rhs.getKind()) {+    case AffineExprKind::DimId: {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+      break;+    }+    case AffineExprKind::Constant: {+      // If the RHS is a constant, add this constant to constantVectorValue.+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    AffineExpr rhs = modExpr.getRHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+    // RHS in this case is always a constant or a symbol. For a constant, we+    // don't need to modify the access matrix.+  }

Add a comment on the cases you don't care about or convert this to if/else? Switch may be misleading.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,+                            DenseMap<Operation *, unsigned> &elementsSize) {+  // Assumption: In cases where element size is difficult to obtain, assume a+  // default value of 8 bytes.+  constexpr unsigned defaultEltSize = 8;+  MemRefType memrefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      AffineLoadOp loadOp = cast<AffineLoadOp>(*op);+      memrefType = loadOp.getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      AffineStoreOp storeOp = cast<AffineStoreOp>(*op);+      memrefType = storeOp.getMemRefType();+    }+    elementsSize[op] = memrefType.hasStaticShape()+                           ? getMemRefSizeInBytes(memrefType).getValue() /+                                 memrefType.getNumElements()+                           : defaultEltSize;+  }+}++/// Calculates the loop-carried-dependence vector for this loop nest rooted at+/// `rootForOp`. A value `true` at i-th index means that loop at depth i in the+/// loop nest carries a dependence.+static void getLoopCarriedDependenceVector(+    AffineForOp &rootForOp, ArrayRef<Operation *> loadAndStoreOps,+    unsigned loopNestSize, SmallVector<bool, 4> &loopCarriedDependenceVector) {++  // Resize the `loopCarriedDependenceVector` to fit entire loop nest.+  loopCarriedDependenceVector.resize(loopNestSize);+  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    for (unsigned j = 0; j < loadAndStoreOps.size(); ++j) {+      Operation *dstOp = loadAndStoreOps[j];+      for (unsigned depth = 1; depth <= loopNestSize + 1; ++depth) {+        MemRefAccess srcAccess(srcOp), dstAccess(dstOp);+        FlatAffineConstraints dependenceConstraints;+        SmallVector<DependenceComponent, 2> depComps;+        DependenceResult result = checkMemrefAccessDependence(+            srcAccess, dstAccess, depth, &dependenceConstraints, &depComps);+        if (hasDependence(result)) {+          for (unsigned i = 0; i < depComps.size(); i++) {+            DependenceComponent depComp = depComps[i];+            if (depComp.lb.getValue() != 0 || depComp.ub.getValue() != 0)+              loopCarriedDependenceVector[i] = true;+          }+          // Dependence found. No need to check further.+          break;+        }+      }+    }+  }+}++/// Calculates a representative cost of this permutation for parallelism on+/// multicores. A permutation having more free outer loops gets a smaller cost.+static uint64_t getParallelismCost(ArrayRef<unsigned> perm,+                                   ArrayRef<bool> loopCarriedDV,+                                   ArrayRef<unsigned> iterCounts) {+  uint64_t totalcost = 0;+  uint64_t thisLoopcost = 1;+  for (unsigned i = 0; i < perm.size(); i++) {+    if (!loopCarriedDV[perm[i]])+      continue;+    thisLoopcost = 1;+    for (unsigned j = i + 1; j < perm.size(); j++)+      thisLoopcost *= iterCounts[perm[j]];+    totalcost += thisLoopcost;+  }+  return totalcost;+}++/// Calculates a representative temporal reuse cost for a given permutation of+/// the loop nest. A lower value returned means higher temporal reuse.+static uint64_t getTemporalReuseCost(+    ArrayRef<unsigned> permutation, ArrayRef<unsigned> loopIterationCounts,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,uint64_t sentinel) {++  // Initially, we assume no temporal reuse. The cost is a big value - sentinel.+  uint64_t cost = sentinel;++  // Start with the innermost loop to check if the access matrix for an op has+  // all zeros in the respective column. If yes, there is a O(n) reuse. The+  // reuse gets multiplied for each loop index until the first loop index with+  // no reuse is encountered.+  uint64_t temporalReuse = 1;+  for (auto &accessMatrixOpPair : loopAccessMatrices) {+    temporalReuse = 1;+    SmallVector<SmallVector<int64_t, 4>, 4> accessMatrix =+        accessMatrixOpPair.second;+    for (int i = permutation.size() - 1; i >= 0; i--) {+      bool isColumnAllZeros = true;+      for (SmallVector<int64_t, 4> &row : accessMatrix) {+        if (row[permutation[i]] != 0) {+          isColumnAllZeros = false;+          break;+        }+      }+      if (!isColumnAllZeros)+        break;+      temporalReuse *= loopIterationCounts[permutation[i]];+    }+    // Increase in temporalReuse decreases the cost.+    cost -= temporalReuse;+  }+  return cost;+}++/// Removes `dstOp` from its current group and inserts it into the `srcOp`'s+/// group. Updates `groupId` to reflect the changes.+static void insertIntoReferenceGroup(+    Operation *srcOp, Operation* dstOp, DenseMap<Operation*, unsigned> &groupId,+    SmallVector<llvm::SmallSet<Operation *, 8>, 8> &referenceGroups) {+  referenceGroups[groupId[srcOp]].insert(+      referenceGroups[groupId[dstOp]].begin(),+      referenceGroups[groupId[dstOp]].end());+  referenceGroups.erase(referenceGroups.begin() + groupId[dstOp]);+  // Insert operation results in same group-id for both instructions.+  groupId[dstOp] = groupId[srcOp];+}++/// Groups ops in `loadAndStoreOps` into `referenceGroups` based on whether or+/// not they exhibit group-temporal or group-spatial reuse with respect to an+/// affine.for op present at depth `innermostIndex` in the original loop nest.+///+/// Please refer Steve Carr et. al for a detailed description.+/// https://dl.acm.org/doi/abs/10.1145/195470.195557+static void buildReferenceGroups(+    SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    DenseMap<Operation *, unsigned> &elementsSize, unsigned maxDepth,+    unsigned innermostIndex,+    SmallVector<llvm::SmallSet<Operation *, 8>, 8> &referenceGroups) {++  // Two accesses ref1 and ref2 belong to the same reference group with respect+  // to a loop if :+  // Criteria 1: There exists a dependence l and+  //     1.1 l is a loop-independent dependence or+  //     1.2 l's component for `forOp` is a small constant d (|d|<=2) and all+  //     other entries are zero.+  // OR+  // Criteria 2: ref1 and ref2 refer to the same array and differ by at most d1+  // in the last subscript dimension, where d1 <= cache line size in terms of+  // array elements. All other subscripts must be identical.+  //+  // We start with all accesses having their own group. Thus, if an access is+  // not part of any group-reuse, it still has it's own group. Doing this takes+  // care of self-spatial reuse.+  constexpr unsigned cache_line_size = 64;

Use LLVM/MLIR style naming - camel back. Please see style guide on MLIR page. Also, this could be a static member in the pass itself so that it's easy to find and later set. kCacheLineSize.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,+                            DenseMap<Operation *, unsigned> &elementsSize) {+  // Assumption: In cases where element size is difficult to obtain, assume a+  // default value of 8 bytes.+  constexpr unsigned defaultEltSize = 8;+  MemRefType memrefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      AffineLoadOp loadOp = cast<AffineLoadOp>(*op);+      memrefType = loadOp.getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      AffineStoreOp storeOp = cast<AffineStoreOp>(*op);+      memrefType = storeOp.getMemRefType();+    }+    elementsSize[op] = memrefType.hasStaticShape()+                           ? getMemRefSizeInBytes(memrefType).getValue() /+                                 memrefType.getNumElements()+                           : defaultEltSize;+  }+}++/// Calculates the loop-carried-dependence vector for this loop nest rooted at+/// `rootForOp`. A value `true` at i-th index means that loop at depth i in the+/// loop nest carries a dependence.+static void getLoopCarriedDependenceVector(+    AffineForOp &rootForOp, ArrayRef<Operation *> loadAndStoreOps,+    unsigned loopNestSize, SmallVector<bool, 4> &loopCarriedDependenceVector) {++  // Resize the `loopCarriedDependenceVector` to fit entire loop nest.+  loopCarriedDependenceVector.resize(loopNestSize);+  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    for (unsigned j = 0; j < loadAndStoreOps.size(); ++j) {+      Operation *dstOp = loadAndStoreOps[j];+      for (unsigned depth = 1; depth <= loopNestSize + 1; ++depth) {+        MemRefAccess srcAccess(srcOp), dstAccess(dstOp);+        FlatAffineConstraints dependenceConstraints;+        SmallVector<DependenceComponent, 2> depComps;+        DependenceResult result = checkMemrefAccessDependence(+            srcAccess, dstAccess, depth, &dependenceConstraints, &depComps);+        if (hasDependence(result)) {+          for (unsigned i = 0; i < depComps.size(); i++) {+            DependenceComponent depComp = depComps[i];+            if (depComp.lb.getValue() != 0 || depComp.ub.getValue() != 0)+              loopCarriedDependenceVector[i] = true;+          }+          // Dependence found. No need to check further.+          break;+        }+      }+    }+  }+}++/// Calculates a representative cost of this permutation for parallelism on+/// multicores. A permutation having more free outer loops gets a smaller cost.+static uint64_t getParallelismCost(ArrayRef<unsigned> perm,+                                   ArrayRef<bool> loopCarriedDV,+                                   ArrayRef<unsigned> iterCounts) {+  uint64_t totalcost = 0;+  uint64_t thisLoopcost = 1;+  for (unsigned i = 0; i < perm.size(); i++) {+    if (!loopCarriedDV[perm[i]])+      continue;+    thisLoopcost = 1;+    for (unsigned j = i + 1; j < perm.size(); j++)+      thisLoopcost *= iterCounts[perm[j]];+    totalcost += thisLoopcost;+  }+  return totalcost;+}++/// Calculates a representative temporal reuse cost for a given permutation of+/// the loop nest. A lower value returned means higher temporal reuse.+static uint64_t getTemporalReuseCost(+    ArrayRef<unsigned> permutation, ArrayRef<unsigned> loopIterationCounts,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,uint64_t sentinel) {++  // Initially, we assume no temporal reuse. The cost is a big value - sentinel.+  uint64_t cost = sentinel;++  // Start with the innermost loop to check if the access matrix for an op has+  // all zeros in the respective column. If yes, there is a O(n) reuse. The+  // reuse gets multiplied for each loop index until the first loop index with+  // no reuse is encountered.+  uint64_t temporalReuse = 1;+  for (auto &accessMatrixOpPair : loopAccessMatrices) {+    temporalReuse = 1;+    SmallVector<SmallVector<int64_t, 4>, 4> accessMatrix =+        accessMatrixOpPair.second;+    for (int i = permutation.size() - 1; i >= 0; i--) {+      bool isColumnAllZeros = true;+      for (SmallVector<int64_t, 4> &row : accessMatrix) {+        if (row[permutation[i]] != 0) {+          isColumnAllZeros = false;+          break;+        }+      }+      if (!isColumnAllZeros)+        break;+      temporalReuse *= loopIterationCounts[permutation[i]];+    }+    // Increase in temporalReuse decreases the cost.+    cost -= temporalReuse;+  }+  return cost;+}++/// Removes `dstOp` from its current group and inserts it into the `srcOp`'s+/// group. Updates `groupId` to reflect the changes.+static void insertIntoReferenceGroup(+    Operation *srcOp, Operation* dstOp, DenseMap<Operation*, unsigned> &groupId,+    SmallVector<llvm::SmallSet<Operation *, 8>, 8> &referenceGroups) {+  referenceGroups[groupId[srcOp]].insert(+      referenceGroups[groupId[dstOp]].begin(),+      referenceGroups[groupId[dstOp]].end());+  referenceGroups.erase(referenceGroups.begin() + groupId[dstOp]);+  // Insert operation results in same group-id for both instructions.+  groupId[dstOp] = groupId[srcOp];+}++/// Groups ops in `loadAndStoreOps` into `referenceGroups` based on whether or+/// not they exhibit group-temporal or group-spatial reuse with respect to an+/// affine.for op present at depth `innermostIndex` in the original loop nest.+///+/// Please refer Steve Carr et. al for a detailed description.+/// https://dl.acm.org/doi/abs/10.1145/195470.195557

Include the paper title.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,+                            DenseMap<Operation *, unsigned> &elementsSize) {+  // Assumption: In cases where element size is difficult to obtain, assume a+  // default value of 8 bytes.+  constexpr unsigned defaultEltSize = 8;+  MemRefType memrefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      AffineLoadOp loadOp = cast<AffineLoadOp>(*op);+      memrefType = loadOp.getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      AffineStoreOp storeOp = cast<AffineStoreOp>(*op);+      memrefType = storeOp.getMemRefType();+    }+    elementsSize[op] = memrefType.hasStaticShape()+                           ? getMemRefSizeInBytes(memrefType).getValue() /+                                 memrefType.getNumElements()+                           : defaultEltSize;+  }+}++/// Calculates the loop-carried-dependence vector for this loop nest rooted at+/// `rootForOp`. A value `true` at i-th index means that loop at depth i in the+/// loop nest carries a dependence.+static void getLoopCarriedDependenceVector(+    AffineForOp &rootForOp, ArrayRef<Operation *> loadAndStoreOps,+    unsigned loopNestSize, SmallVector<bool, 4> &loopCarriedDependenceVector) {++  // Resize the `loopCarriedDependenceVector` to fit entire loop nest.+  loopCarriedDependenceVector.resize(loopNestSize);+  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    for (unsigned j = 0; j < loadAndStoreOps.size(); ++j) {+      Operation *dstOp = loadAndStoreOps[j];+      for (unsigned depth = 1; depth <= loopNestSize + 1; ++depth) {+        MemRefAccess srcAccess(srcOp), dstAccess(dstOp);+        FlatAffineConstraints dependenceConstraints;+        SmallVector<DependenceComponent, 2> depComps;+        DependenceResult result = checkMemrefAccessDependence(+            srcAccess, dstAccess, depth, &dependenceConstraints, &depComps);+        if (hasDependence(result)) {+          for (unsigned i = 0; i < depComps.size(); i++) {+            DependenceComponent depComp = depComps[i];+            if (depComp.lb.getValue() != 0 || depComp.ub.getValue() != 0)+              loopCarriedDependenceVector[i] = true;+          }+          // Dependence found. No need to check further.+          break;+        }+      }+    }+  }+}++/// Calculates a representative cost of this permutation for parallelism on+/// multicores. A permutation having more free outer loops gets a smaller cost.+static uint64_t getParallelismCost(ArrayRef<unsigned> perm,+                                   ArrayRef<bool> loopCarriedDV,+                                   ArrayRef<unsigned> iterCounts) {+  uint64_t totalcost = 0;+  uint64_t thisLoopcost = 1;+  for (unsigned i = 0; i < perm.size(); i++) {+    if (!loopCarriedDV[perm[i]])+      continue;+    thisLoopcost = 1;+    for (unsigned j = i + 1; j < perm.size(); j++)+      thisLoopcost *= iterCounts[perm[j]];+    totalcost += thisLoopcost;+  }+  return totalcost;+}++/// Calculates a representative temporal reuse cost for a given permutation of+/// the loop nest. A lower value returned means higher temporal reuse.+static uint64_t getTemporalReuseCost(+    ArrayRef<unsigned> permutation, ArrayRef<unsigned> loopIterationCounts,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,uint64_t sentinel) {++  // Initially, we assume no temporal reuse. The cost is a big value - sentinel.+  uint64_t cost = sentinel;++  // Start with the innermost loop to check if the access matrix for an op has+  // all zeros in the respective column. If yes, there is a O(n) reuse. The+  // reuse gets multiplied for each loop index until the first loop index with+  // no reuse is encountered.+  uint64_t temporalReuse = 1;+  for (auto &accessMatrixOpPair : loopAccessMatrices) {+    temporalReuse = 1;+    SmallVector<SmallVector<int64_t, 4>, 4> accessMatrix =+        accessMatrixOpPair.second;+    for (int i = permutation.size() - 1; i >= 0; i--) {+      bool isColumnAllZeros = true;+      for (SmallVector<int64_t, 4> &row : accessMatrix) {+        if (row[permutation[i]] != 0) {+          isColumnAllZeros = false;+          break;+        }+      }+      if (!isColumnAllZeros)+        break;+      temporalReuse *= loopIterationCounts[permutation[i]];+    }+    // Increase in temporalReuse decreases the cost.+    cost -= temporalReuse;+  }+  return cost;+}++/// Removes `dstOp` from its current group and inserts it into the `srcOp`'s+/// group. Updates `groupId` to reflect the changes.+static void insertIntoReferenceGroup(+    Operation *srcOp, Operation* dstOp, DenseMap<Operation*, unsigned> &groupId,+    SmallVector<llvm::SmallSet<Operation *, 8>, 8> &referenceGroups) {+  referenceGroups[groupId[srcOp]].insert(+      referenceGroups[groupId[dstOp]].begin(),+      referenceGroups[groupId[dstOp]].end());+  referenceGroups.erase(referenceGroups.begin() + groupId[dstOp]);+  // Insert operation results in same group-id for both instructions.+  groupId[dstOp] = groupId[srcOp];+}++/// Groups ops in `loadAndStoreOps` into `referenceGroups` based on whether or+/// not they exhibit group-temporal or group-spatial reuse with respect to an+/// affine.for op present at depth `innermostIndex` in the original loop nest.+///+/// Please refer Steve Carr et. al for a detailed description.+/// https://dl.acm.org/doi/abs/10.1145/195470.195557+static void buildReferenceGroups(+    SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    DenseMap<Operation *, unsigned> &elementsSize, unsigned maxDepth,+    unsigned innermostIndex,+    SmallVector<llvm::SmallSet<Operation *, 8>, 8> &referenceGroups) {

Inputs should be ArrayRef, outputs should be SmallVector ref. Anywhere as well - I won't repeat it at other places. Please scan through and fix.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,+                            DenseMap<Operation *, unsigned> &elementsSize) {+  // Assumption: In cases where element size is difficult to obtain, assume a+  // default value of 8 bytes.+  constexpr unsigned defaultEltSize = 8;+  MemRefType memrefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      AffineLoadOp loadOp = cast<AffineLoadOp>(*op);+      memrefType = loadOp.getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      AffineStoreOp storeOp = cast<AffineStoreOp>(*op);+      memrefType = storeOp.getMemRefType();+    }+    elementsSize[op] = memrefType.hasStaticShape()+                           ? getMemRefSizeInBytes(memrefType).getValue() /+                                 memrefType.getNumElements()+                           : defaultEltSize;+  }+}++/// Calculates the loop-carried-dependence vector for this loop nest rooted at+/// `rootForOp`. A value `true` at i-th index means that loop at depth i in the+/// loop nest carries a dependence.+static void getLoopCarriedDependenceVector(+    AffineForOp &rootForOp, ArrayRef<Operation *> loadAndStoreOps,+    unsigned loopNestSize, SmallVector<bool, 4> &loopCarriedDependenceVector) {++  // Resize the `loopCarriedDependenceVector` to fit entire loop nest.

Rephrase this - you aren't resizing a single vector.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,+                            DenseMap<Operation *, unsigned> &elementsSize) {+  // Assumption: In cases where element size is difficult to obtain, assume a+  // default value of 8 bytes.+  constexpr unsigned defaultEltSize = 8;+  MemRefType memrefType;+  for (Operation *op : loadAndStoreOps) {+    if (isa<AffineLoadOp>(op)) {+      AffineLoadOp loadOp = cast<AffineLoadOp>(*op);+      memrefType = loadOp.getMemRefType();+    } else if (isa<AffineStoreOp>(op)) {+      AffineStoreOp storeOp = cast<AffineStoreOp>(*op);+      memrefType = storeOp.getMemRefType();+    }+    elementsSize[op] = memrefType.hasStaticShape()+                           ? getMemRefSizeInBytes(memrefType).getValue() /+                                 memrefType.getNumElements()+                           : defaultEltSize;+  }+}++/// Calculates the loop-carried-dependence vector for this loop nest rooted at+/// `rootForOp`. A value `true` at i-th index means that loop at depth i in the+/// loop nest carries a dependence.+static void getLoopCarriedDependenceVector(+    AffineForOp &rootForOp, ArrayRef<Operation *> loadAndStoreOps,+    unsigned loopNestSize, SmallVector<bool, 4> &loopCarriedDependenceVector) {++  // Resize the `loopCarriedDependenceVector` to fit entire loop nest.+  loopCarriedDependenceVector.resize(loopNestSize);+  for (unsigned i = 0; i < loadAndStoreOps.size(); ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    for (unsigned j = 0; j < loadAndStoreOps.size(); ++j) {+      Operation *dstOp = loadAndStoreOps[j];+      for (unsigned depth = 1; depth <= loopNestSize + 1; ++depth) {+        MemRefAccess srcAccess(srcOp), dstAccess(dstOp);+        FlatAffineConstraints dependenceConstraints;+        SmallVector<DependenceComponent, 2> depComps;+        DependenceResult result = checkMemrefAccessDependence(+            srcAccess, dstAccess, depth, &dependenceConstraints, &depComps);+        if (hasDependence(result)) {+          for (unsigned i = 0; i < depComps.size(); i++) {+            DependenceComponent depComp = depComps[i];+            if (depComp.lb.getValue() != 0 || depComp.ub.getValue() != 0)+              loopCarriedDependenceVector[i] = true;+          }+          // Dependence found. No need to check further.+          break;+        }+      }+    }+  }+}++/// Calculates a representative cost of this permutation for parallelism on+/// multicores. A permutation having more free outer loops gets a smaller cost.+static uint64_t getParallelismCost(ArrayRef<unsigned> perm,+                                   ArrayRef<bool> loopCarriedDV,+                                   ArrayRef<unsigned> iterCounts) {+  uint64_t totalcost = 0;+  uint64_t thisLoopcost = 1;+  for (unsigned i = 0; i < perm.size(); i++) {+    if (!loopCarriedDV[perm[i]])+      continue;+    thisLoopcost = 1;+    for (unsigned j = i + 1; j < perm.size(); j++)+      thisLoopcost *= iterCounts[perm[j]];+    totalcost += thisLoopcost;+  }+  return totalcost;+}++/// Calculates a representative temporal reuse cost for a given permutation of+/// the loop nest. A lower value returned means higher temporal reuse.+static uint64_t getTemporalReuseCost(+    ArrayRef<unsigned> permutation, ArrayRef<unsigned> loopIterationCounts,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,uint64_t sentinel) {++  // Initially, we assume no temporal reuse. The cost is a big value - sentinel.+  uint64_t cost = sentinel;++  // Start with the innermost loop to check if the access matrix for an op has+  // all zeros in the respective column. If yes, there is a O(n) reuse. The+  // reuse gets multiplied for each loop index until the first loop index with+  // no reuse is encountered.+  uint64_t temporalReuse = 1;+  for (auto &accessMatrixOpPair : loopAccessMatrices) {+    temporalReuse = 1;+    SmallVector<SmallVector<int64_t, 4>, 4> accessMatrix =+        accessMatrixOpPair.second;+    for (int i = permutation.size() - 1; i >= 0; i--) {+      bool isColumnAllZeros = true;+      for (SmallVector<int64_t, 4> &row : accessMatrix) {+        if (row[permutation[i]] != 0) {+          isColumnAllZeros = false;+          break;+        }+      }+      if (!isColumnAllZeros)+        break;+      temporalReuse *= loopIterationCounts[permutation[i]];+    }+    // Increase in temporalReuse decreases the cost.+    cost -= temporalReuse;+  }+  return cost;+}++/// Removes `dstOp` from its current group and inserts it into the `srcOp`'s+/// group. Updates `groupId` to reflect the changes.+static void insertIntoReferenceGroup(+    Operation *srcOp, Operation* dstOp, DenseMap<Operation*, unsigned> &groupId,+    SmallVector<llvm::SmallSet<Operation *, 8>, 8> &referenceGroups) {+  referenceGroups[groupId[srcOp]].insert(+      referenceGroups[groupId[dstOp]].begin(),+      referenceGroups[groupId[dstOp]].end());+  referenceGroups.erase(referenceGroups.begin() + groupId[dstOp]);+  // Insert operation results in same group-id for both instructions.+  groupId[dstOp] = groupId[srcOp];+}++/// Groups ops in `loadAndStoreOps` into `referenceGroups` based on whether or+/// not they exhibit group-temporal or group-spatial reuse with respect to an+/// affine.for op present at depth `innermostIndex` in the original loop nest.+///+/// Please refer Steve Carr et. al for a detailed description.

refer to

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,+                            DenseMap<Operation *, unsigned> &elementsSize) {+  // Assumption: In cases where element size is difficult to obtain, assume a+  // default value of 8 bytes.+  constexpr unsigned defaultEltSize = 8;

This can be a static field in the Pass class.

HarshVardhanKumar

comment created time in 3 months

Pull request review commentHarshVardhanKumar/llvm-project

Add AffineLoopInterchange pass in the Affine Dialect.

+//===- AffineLoopInterchange.cpp - Pass to perform loop interchange-----===//+//+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.+// See https://llvm.org/LICENSE.txt for license information.+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception+//+//===----------------------------------------------------------------------===//+//+// This file implements a loop interchange pass that optimizes for locality+// (spatial and temporal - both self and group) and parallelism for multicores,+// to minimize the frequency of synchronization. The pass works for both+// perfectly nested and imperfectly nested loops (any level of nesting). However+// in the presence of affine.if statements and/or non-rectangular iteration+// space, the pass simply bails out - leaving the original loop nest unchanged.+// The pass is triggered by the command line flag -affine-loop-interchange.+//+//===----------------------------------------------------------------------===//++#include "PassDetail.h"+#include "mlir/Analysis/AffineAnalysis.h"+#include "mlir/Analysis/Utils.h"+#include "mlir/Dialect/Affine/IR/AffineOps.h"+#include "mlir/Dialect/Affine/Passes.h"+#include "mlir/Transforms/LoopUtils.h"+#include "llvm/ADT/SmallSet.h"+#include "llvm/ADT/SmallVector.h"+#include <algorithm>+#include <cmath>+#include <numeric>++using namespace mlir;+namespace {+struct LoopInterchange : public AffineLoopInterchangeBase<LoopInterchange> {+  void runOnFunction() override;+  void handleImperfectlyNestedAffineLoops(Operation &funcOp);+};+} // namespace++/// Returns true if any affine.if op found in the loop nest rooted at `forOp`+static bool hasAffineIfStatement(AffineForOp &forOp) {+  auto walkResult =+      forOp.walk([&](AffineIfOp op) { return WalkResult::interrupt(); });+  return walkResult.wasInterrupted();+}++/// Checks if this `loopNest` has a rectangular-shaped iteration space.+static bool isRectangularAffineForLoopNest(ArrayRef<AffineForOp> loopNest) {+  for (AffineForOp forOp : loopNest) {+    if (!forOp.hasConstantUpperBound() || !forOp.hasConstantLowerBound())+      return false;+  }+  return true;+}++/// Fills `row` with the coefficients of loopIVs in `expr`. Any constant terms+/// encountered in `expr` are added to `constantVectorValue`. Every value in+/// `operands` should be a loopIV or a terminal symbol.+static void prepareCoeffientRow(AffineExpr expr,+                                ArrayRef<Value> operands,+                                DenseMap<Value, unsigned> &loopIndexMap,+                                int64_t &constantVectorValue,+                                SmallVector<int64_t, 4> &row) {+  // TODO: Implement support for terminal symbols in `expr`.+  switch (expr.getKind()) {+  case AffineExprKind::Add: {+    // Is this sub-expr a constant? If yes, no need to modify `row`. Start by+    // assuming the sub-expr is not a constant.+    bool isConstSubExpr = false;+    AffineBinaryOpExpr addExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = addExpr.getLHS();+    AffineExpr rhs = addExpr.getRHS();+    unsigned lhsPosition = 0;+    unsigned rhsPosition = 0;++    if (lhs.isa<AffineDimExpr>()) {+      auto dimExpr = lhs.cast<AffineDimExpr>();+      lhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (lhs.isa<AffineConstantExpr>()) {+      constantVectorValue += lhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[lhsPosition] == 0 && !isConstSubExpr)+      row[lhsPosition] = 1;++    isConstSubExpr = false;+    if (rhs.isa<AffineDimExpr>()) {+      auto dimExpr = rhs.cast<AffineDimExpr>();+      rhsPosition = loopIndexMap[operands[dimExpr.getPosition()]];+    } else if (rhs.isa<AffineConstantExpr>()) {+      constantVectorValue += rhs.cast<AffineConstantExpr>().getValue();+      isConstSubExpr = true;+    }+    if (row[rhsPosition] == 0 && !isConstSubExpr)+      row[rhsPosition] = 1;+    break;+  }+  case AffineExprKind::Mul: {+    AffineBinaryOpExpr mulExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = mulExpr.getLHS();+    AffineExpr rhs = mulExpr.getRHS();+    unsigned dimIdPos = 0;+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      dimIdPos = loopIndexMap[operands[dim.getPosition()]];+    }+    // RHS in this case should always be a constant.+    if (rhs.isa<AffineConstantExpr>()) {+      row[dimIdPos] = rhs.cast<AffineConstantExpr>().getValue();+    }+    break;+  }+  case AffineExprKind::DimId: {+    auto dim = expr.cast<AffineDimExpr>();+    row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    constantVectorValue += 0;+    break;+  }+  case AffineExprKind::CeilDiv:+  case AffineExprKind::FloorDiv:+  case AffineExprKind::Mod: {+    auto modExpr = expr.cast<AffineBinaryOpExpr>();+    AffineExpr lhs = modExpr.getLHS();+    if (lhs.isa<AffineDimExpr>()) {+      auto dim = lhs.cast<AffineDimExpr>();+      row[loopIndexMap[operands[dim.getPosition()]]] = 1;+    }+  }+  }+}++/// For a memref access function Ax+b, it calculates both A and b and stores+/// these to `loopAccessMatrices`(collection of As) and `constVector`(b)+/// The param `loopIndexMap` is used for getting the position for coefficients+/// of loopIVs (vector x in Ax+b) in each row of matrix A.+static void getAffineAccessMatrices(+    AffineForOp &rootForOp, SmallVector<Operation *, 8> &loadAndStoreOps,+    DenseMap<Value, unsigned> &loopIndexMap,+    DenseMap<Operation *, SmallVector<SmallVector<int64_t, 4>, 4>>+        &loopAccessMatrices,+    DenseMap<Operation *, SmallVector<int64_t, 4>> &constVector,+    unsigned AffineForOpLoopNestSize) {++  unsigned numOps = loadAndStoreOps.size();+  for (unsigned i = 0; i < numOps; ++i) {+    Operation *srcOp = loadAndStoreOps[i];+    MemRefAccess srcAccess(srcOp);+    AffineMap map;+    if (isa<AffineLoadOp>(srcOp))+      map = cast<AffineLoadOp>(srcOp).getAffineMap();+    else if (isa<AffineStoreOp>(srcOp))+      map = cast<AffineStoreOp>(srcOp).getAffineMap();+    SmallVector<Value, 8> operands(srcAccess.indices.begin(),+                                   srcAccess.indices.end());+    fullyComposeAffineMapAndOperands(&map, &operands);+    map = simplifyAffineMap(map);+    canonicalizeMapAndOperands(&map, &operands);+    ArrayRef<AffineExpr> mapResults = map.getResults();+    // Number of rows in an accessMatrix = Number of dimensions in the memref+    // object. Number of Columns = (noDimIDs + noSymbols).+    loopAccessMatrices[srcOp].resize(mapResults.size());+    constVector[srcOp].resize(mapResults.size());+    for (unsigned l = 0; l < mapResults.size(); l++) {+      // Parse the l-th map result(access expr for l-th dim of this memref) to+      // get the l-th row of this op's access matrix.+      AffineExpr mapResult = mapResults[l];+      loopAccessMatrices[srcOp][l].resize(std::max(+          AffineForOpLoopNestSize, map.getNumDims() + map.getNumSymbols()));+      // Check if `mapResult` is a constant expr. If yes, no need to walk it.+      // Instead, add the value to the constVector and leave the row unchanged.+      if (mapResult.isa<AffineConstantExpr>()) {+        auto constExpr = mapResult.cast<AffineConstantExpr>();+        constVector[srcOp][l] = constExpr.getValue();+      } else {+        mapResult.walk([&](AffineExpr expr) {+          // Each expr can be a combination of many affine expressions.+          prepareCoeffientRow(expr, operands, loopIndexMap,+                              constVector[srcOp][l],+                              loopAccessMatrices[srcOp][l]);+        });+      }+    }+  }+}++/// Separates `forOpA` from its siblings. After separation, `forOpA` receives+/// a copy of its parent independent from other siblings. A loop nest such as:+/// \code+///     parent{forOpA, forOpB, forOpC}+/// \endcode+/// becomes+/// \code+///     parent{forOpA}, parent{forOpB, forOpC}+/// \endcode+static void separateSiblingLoops(AffineForOp &parentForOp, AffineForOp &forOpA,+                                 SmallVector<AffineForOp, 4> &siblings) {++  OpBuilder builder(parentForOp.getOperation()->getBlock(),+                    std::next(Block::iterator(parentForOp)));+  AffineForOp copyParentForOp = cast<AffineForOp>(builder.clone(*parentForOp));++  // Note the order in which `forOpA` and all other siblings are visited. We+  // need this order to compare affine.for ops within `parentForOp` with their+  // copy in `copyParentForOp`. Comparing forOp.getOperation() does not work in+  // that case.+  unsigned forOpAPosition = 0;+  llvm::SmallSet<unsigned, 8> siblingsIndices;+  unsigned index = 0;+  parentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (op.getOperation() == forOpA.getOperation())+      forOpAPosition = index;+    for (unsigned i = 0; i < siblings.size(); i++)+      if (op.getOperation() == siblings[i].getOperation())+        siblingsIndices.insert(index);+  });+  // Walk the copy of parentOp to erase all siblings other than `forOpA`.+  index = 0;+  copyParentForOp.getOperation()->walk([&](AffineForOp op) {+    index++;+    if (index != forOpAPosition && siblingsIndices.count(index))+      op.getOperation()->erase();+  });+  // Erase `forOpA` from the original copy.+  forOpA.getOperation()->erase();+}++/// Converts all imperfectly nested loop nests in `funcOp` to perfectly+/// nested loop nests by loop splitting.+void LoopInterchange::handleImperfectlyNestedAffineLoops(Operation &funcOp) {+  SmallVector<AffineForOp, 4> loopNest;+  DenseMap<Operation *, SmallVector<AffineForOp, 4>> forTree;+  DenseMap<Operation *, AffineForOp> forOperations;++  // Stop splitting when each parent has only one child left.+  bool oneChild = false;+  while (!oneChild) {+    oneChild = true;+    // Walk the function to create a tree of affine.for operations.+    funcOp.walk([&](AffineForOp op) {+      loopNest.push_back(op);+      if (op.getParentOp()->getName().getStringRef() == "affine.for")+        forTree[op.getOperation()->getParentOp()].push_back(op);++      forOperations[op.getOperation()] = op;+    });+    // Separate one of the sibling at a time.+    for (auto &loopNest : forTree) {+      // This loop nest has no siblings problem. Check the next loop nest.+      if (loopNest.second.size() < 2)+        continue;+      oneChild = false;+      separateSiblingLoops(forOperations[loopNest.first],+                           loopNest.second.back(), loopNest.second);+      // We need to walk the function again since the structure of loop nests+      // within the funcOp body has changed.+      break;+    }+    loopNest.clear();+    forTree.clear();+    forOperations.clear();+  }+  return;+}++/// Scans the loop nest rooted at `rootForOp` and collects all affine.load and+/// affine.store ops. Fills `loadAndStoreOps` with all such ops.+static void getAllLoadStores(AffineForOp rootForOp,+                             SmallVector<Operation *, 8> &loadAndStoreOps) {+  rootForOp.getOperation()->walk([&](Operation *op) {+    if (isa<AffineLoadOp>(op) || isa<AffineStoreOp>(op)) {+      loadAndStoreOps.push_back(op);+    }+  });+}++/// Fills `elementsSize` with the size of element types of respective memrefs+/// accessed by the ops in `loadAndStoreOps`. These will be later used to+/// check if two accesses are within a cache_line_size/element_size distance+/// apart for a useful locality.+static void getElementsSize(SmallVector<Operation *, 8> &loadAndStoreOps,

loadAndStoreOps is an input to function -> ArrayRef.

HarshVardhanKumar

comment created time in 3 months

more