profile
viewpoint
Benjamin Kramer d0k Munich, Germany

llvm/llvm-project 3802

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

d0k/mappedfile 9

a simple C++ class for read-only memory-mapped files

d0k/ednaunpack 7

unpacker for the installer of the adventure game “Edna bricht aus”

d0k/malprogramm 3

a very old drawing program of mine ported to wxWidgets

d0k/ninifile 3

ini file manipulation library written in c#

d0k/mirrormeta 2

el-cheapo metalink generator + library in python

d0k/xxo 2

network tic-tac-toe written in java and swing

d0k/discordlicht 1

modified fnordlicht (avr mood light) (now obsolete)

d0k/hanoigl 1

towers of hanoi visualization with OpenGL

d0k/BA-Benjamin-Kramer 0

Bachelor's Thesis

push eventllvm/llvm-project

Benjamin Kramer

commit sha fc466f87804f97b322394ef3b9db43ea3febcc15

Make test not write to the source directory

view details

push time in 3 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 8c893cac3f65cecf5b5a05dc32cfbfe4b82cc8e0

[ORC] Remove spammy debug print

view details

push time in 4 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 3ac37eb9a93a4009f58c29497aa141fc103f4c45

Silence compiler warnings mlir/lib/Parser/Parser.cpp:4484:15: warning: 'parseAssignmentList' overrides a member function but is not marked 'override' [-Winconsistent-missing-override] ParseResult parseAssignmentList(SmallVectorImpl<OperandType> &lhs, ^ mlir/include/mlir/IR/OpImplementation.h:662:3: note: overridden virtual function is here parseAssignmentList(SmallVectorImpl<OperandType> &lhs, ^ mlir/lib/Parser/Parser.cpp:4488:12: warning: unused variable 'type' [-Wunused-variable] Type type; ^

view details

push time in 4 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha bc1947a6f51fec9239248043d1a85afa3ce586aa

Add a basic tiling pass for parallel loops This exploits the fact that the iterations of parallel loops are independent so tiling becomes just an index transformation. This pass only tiles the innermost loop of a loop nest. The ultimate goal is to allow vectorization of the tiled loops, but I don't think we're there yet with the current rewriting, as the tiled loops don't have a constant trip count. Differential Revision: https://reviews.llvm.org/D74954

view details

push time in 4 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 44bbc767000494c5702ca49c870e6642a93bbb02

Drop a constexpr in favor of const, MSVC complains. lib\Target\Hexagon\HexagonGenDFAPacketizer.inc(109): error C2131: expression did not evaluate to a constant

view details

push time in 10 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 9e4b761aba01391bf3966a1a61eab6b5c76c70ad

Move DFA tables into the read-only data segmant.

view details

push time in 10 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 564a9de28ed432b0e758b691b6095e421969de60

Hide implementation details. NFC>

view details

push time in 11 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha f4c59c0f97cd64668c50547f23249017689717af

[wasm] Unbreak after 5fc5c7db38672c8962879b6fdce68393181c5e08. NFCI.

view details

push time in 11 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 5fc5c7db38672c8962879b6fdce68393181c5e08

Strength reduce vectors into arrays. NFCI.

view details

push time in 11 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 6704960f7c282714d5963aad8f7b379fa13289ea

[ADT] Use inherited ctors to forward to base. NFCI.

view details

push time in 11 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha 7355364f63eac9d20c0abb5ce213ba478e8ea8f1

Put back makeArrayRef to make GCC 5 happy

view details

push time in 20 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha ec93c758ced7fa8ae1b5042039dd326ef1db45ef

Drop some uses of StringLiteral in favor of StringRef StringRef can be used in constexpr contexts, so StringLiteral isn't necessary anymore.

view details

push time in 20 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha ef83d46b6b428fa1c8614cd28ab6fe3f07f8d075

Use heterogenous lookup for std;:map<std::string with a StringRef. NFCI.

view details

push time in 20 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha e4230a9f6c518209cf0d9fdac1764dadd525b513

ArrayRef'ize spillCalleeSavedRegisters. NFCI.

view details

push time in 20 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha b68b8be8e2b930056d134a2632dd7a80ad0c701c

[mlir-tblgen] Stop leaking PredNodes Technically a leak in tblgen is harmless, but this makes asan builds of mlir very noisy. Just use a SpecificBumpPtrAllocator that knows how to clean up after itself.

view details

push time in 22 days

issue closedtensorflow/tensorflow

Errors of building XLA AOT example

System information

  • Have I written custom code: No
  • OS Platform and Distribution: Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: 2.0.0
  • Python version: 3.6.7
  • Bazel version (if compiling from source): 0.26.1
  • GCC/Compiler version (if compiling from source): 5.4.0
  • CUDA/cuDNN version: 10.0/7.5
  • GPU model and memory:RTX2080Ti with 11GB

Describe the current behavior I followed this guide to test xla example, all the code are the same with this guide. In the last step, when building the cc_binary, it gives some errors

bazel build //tensorflow/compiler/aot/tests:my_binary --verbose_failures
INFO: Analyzed target //tensorflow/compiler/aot/tests:my_binary (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
ERROR: /home/zoud/workspace/local/tf_build/tensorflow-2.0.0-cc/tensorflow/compiler/aot/tests/BUILD:16:1: Linking of rule '//tensorflow/compiler/aot/tests:my_binary' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/zoud/.cache/bazel/_bazel_zoud/8f979226e66ca56e3b01def87b6ccec0/execroot/org_tensorflow && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    LD_LIBRARY_PATH=:/usr/local/cuda/lib64:/home/zoud/program/TensorRT-5.1.5.0/lib \
    PATH=/usr/lib/jvm/jdk1.8.0_221/bin:/usr/lib/jvm/jdk1.8.0_221/jre/bin:/home/zoud/program/anaconda3/envs/tf_2.0.0_src/bin:/home/zoud/program/anaconda3/condabin:/usr/lib/jvm/jdk1.8.0_221/bin:/usr/lib/jvm/jdk1.8.0_221/jre/bin:/home/zoud/bin:/home/zoud/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/usr/local/cuda/bin:/home/zoud/program/MATLAB/R2017b/bin:/home/zoud/program/upx-3.95-amd64_linux:/home/zoud/program/bazel-0.26.1:/usr/local/cuda/bin:/home/zoud/program/MATLAB/R2017b/bin:/home/zoud/program/upx-3.95-amd64_linux:/home/zoud/program/bazel-0.26.1 \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/home/zoud/program/anaconda3/envs/tf_2.0.0_src/bin/python \
    PYTHON_LIB_PATH=/home/zoud/program/anaconda3/envs/tf_2.0.0_src/lib/python3.6/site-packages \
    TF_CONFIGURE_IOS=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=6.1,7.5 \
    TF_NEED_CUDA=1 \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/k8-opt/bin/tensorflow/compiler/aot/tests/my_binary -pthread -Wl,-no-as-needed -pie -Wl,-z,relro,-z,now '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -no-canonical-prefixes -fno-canonical-system-headers -B/usr/bin -Wl,--gc-sections -Wl,@bazel-out/k8-opt/bin/tensorflow/compiler/aot/tests/my_binary-2.params)
Execution platform: @bazel_tools//platforms:host_platform
/usr/bin/ld: bazel-out/k8-opt/bin/external/com_google_absl/absl/strings/libstrings.a(charconv.o): undefined reference to symbol 'nanf@@GLIBC_2.2.5'
//lib/x86_64-linux-gnu/libm.so.6: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
Target //tensorflow/compiler/aot/tests:my_binary failed to build
INFO: Elapsed time: 0.308s, Critical Path: 0.12s
INFO: 0 processes.
FAILED: Build did NOT complete successfully

Maybe my GCC version is too low? I noticed official environment of building TF2 is GCC7.3.

Another question, I don't understand the meaning of step1, why modify the tf2xla.proto, this file seems never be used in the following steps.

closed time in 25 days

7oud

issue commenttensorflow/tensorflow

Errors of building XLA AOT example

Closing this as I didn't manage to reproduce it and GCC5 is really old. The error message is saying that there's a -lm missing on the linker command line, but I don't know how that can happen.

7oud

comment created time in 25 days

push eventllvm/llvm-project

Benjamin Kramer

commit sha c2b7e4e88a1a19b2a51f120716118aad130f4279

Rewrite test not to rely on StrEq with StringRef StrEq has some magic inside that should do the explicit conversion from StringRef to std::string, but apparently this doesn't work with GCC 5. Just use EXPECT_EQ, it does the same thing with less magic.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 01213f90700dbb98a0dbcc01da8fdb89f6db5617

[clang-tidy] Initialize token before handing it to the lexer Found by msan.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 0ee4b027d37e45391bdd872911c61756d0958722

Fix an implicit conversion in clang-tidy. GCC 5 complains about it.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 4e3f4f03f3e4dccfac6212a66d54d584fea328a2

[ASTMatchers] StringRef'ify hasName This was just inconvenient, and we make a copy anyways.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 757bdc64d33df61467a7122f22ea76cf163c8dca

Fix clang unnittest build with GCC 5

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 49ad3f6143227ac5f4d0e061b564b65d63bd0363

One more bugpoitn fix for GCC5

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 42a25e7fe6ff0eb74c7d91151983fc3fd0d5d10c

Try harder to fix bugpoint with GCC5

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha cd87e207ec7c1d6ea38bf05b8a4e887a1940f37f

Make bugpoint work with gcc5 again.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha bd31243a34da8a045c642ddc77b27b0a45a9bf1e

Fix more implicit conversions. Getting closer to having clang working with gcc 5 again

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha bb39b52950e77e650fbdd86f7d5e4b89ff0aac4d

Fix conversions in clang and examples

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 2b36e85542d24161ff4460cb4f0da635e9f5ab62

GCC5 buildbot made it to clang. Fix implicit conversions it found.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 2d92336db0087ad295401865d7749d4d1cfe4846

Another stab at making the gold plugin compile again

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha a9bc7b83a402f2bf7d7c55ac4c9e9a2fb2b3ea13

Another round of GCC5 fixes.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 735f90fe42e55935035d842752e01361b5216c11

Fix one round of implicit conversions found by g++5.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 8b6320c79d4bf9a585f0533bb6007ff0697a9920

Address implicit conversions detected by g++ 5 only.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha ddf77f10a301d04ab47ede3bed596b21cda44794

One more batch of things found by g++ 6

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 0d401fa36b532b7d766fd51368b9afb88ad46d1a

Fix a couple more implicit conversions that Clang doesn't diagnose.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 5976067d2c5c00969e5e211048aec1d2aaccb366

A bunch more implicit string conversions that my Clang didn't detect.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 05c19705d8edc05cc85cfef3b4e2cd172fc873a8

[tblgen] Fix implicit conversion only diagnosed by g++ 6

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha a153d78c7eb079bcba5ebb37fc1ab9b3c82b99a4

[Driver] Fix implicit conversion guarded by #ifdef _WIN32

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha c9909c22fe337a2634f27e22705785f979d7447f

Fix implicit conversions in example code.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 19580c3755a1dc198005839a73a7bad5c108f203

Fix implicit conversion in the lldb Python plugin

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 159709f04fa55674480da2db5c10d086c6297ca9

[Support] Fix implicit std::string conversions on Win32.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 777180a32b61070a10dd330b4f038bf24e916af1

[ADT] Make StringRef's std::string conversion operator explicit This has the same behavior as converting std::string_view to std::string. This is an expensive conversion, so explicit conversions are helpful for avoiding unneccessary string copies.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha adcd02683856c30ba6f349279509acecd90063df

Make llvm::StringRef to std::string conversions explicit. This is how it should've been and brings it more in line with std::string_view. There should be no functional change here. This is mostly mechanical from a custom clang-tidy check, with a lot of manual fixups. It uncovers a lot of minor inefficiencies. This doesn't actually modify StringRef yet, I'll do that in a follow-up.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 2e4977965b57c53db81e729e390dbda6807ef7fc

[ADT] Implicitly convert between StringRef and std::string_view when we have C++17 This makes the types almost seamlessly interchangeable in C++17 codebases. Eventually we want to replace StringRef with the standard type, but that requires C++17 being the default and a huge refactoring job as StringRef has a lot more functionality.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha fba7574cb9416db270efc6621190b3d587124454

[docs] Clarify llvm.used semantics with less awkward wording

view details

push time in a month

CommitCommentEvent

push eventllvm/llvm-project

Benjamin Kramer

commit sha 90c01357b8171e6131fbb904f4c7ebfabd7ede04

[mlir] Shrink-wrap anonymous namespaces around the classes it's supposed to enclose. NFC. The coding standards prefer smaller anonymous namespaces with free functions just being static and in the global namespace.

view details

push time in a month

CommitCommentEvent

push eventllvm/llvm-project

Benjamin Kramer

commit sha 81f385b0c6ea37dd7195a65be162c75bbdef29d2

Make dropTriviallyDeadConstantArrays not quadratic Only look at the operands of dead constant arrays instead of all constant arrays again.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 0133cc60e4e230ee2c176c23eff5aa2f4ee17a75

Revert "[mlir] Create a gpu.module operation for the GPU Dialect." This reverts commit 4624a1e8ac8a3f69cc887403b976f538f587744a. Causing problems downstream.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 06cfcdcca7de9c88a1e885eff0d0c4c07090ad48

[AArch64][SVE] Fold variable into assert to silence unused variable warnings in Release builds

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha df186507e1d07c3ddba091a076ba7a33dbdc5867

Make helper functions static or move them into anonymous namespaces. NFC.

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha a2cd4fe6bf2a4e37d5f69b0b19cb1134a14e2970

Unbreak the mlir build after 202ab273e6eca134b69882f100c666fcd3affbcf

view details

push time in a month

push eventllvm/llvm-project

Benjamin Kramer

commit sha 7c7ca515837305f5d14033aee1191c254b86063c

Remove copy ctors identical to the default one. NFC. Those do nothing but make the type no longer trivial to the compiler.

view details

push time in 2 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha e49c3c8f2ef97bdf256ca76f3d001eeb79361d56

Sprinkle some constexpr on default ctors so the compiler can diagnose unused instances. NFCI.

view details

push time in 2 months

push eventllvm/llvm-project

Peng Guo

commit sha cfd849840134c4632c2f4fa498dfb93c47825b24

[MIR] Fix cyclic dependency of MIR formatter Summary: Move MIR formatter pointer from TargetMachine to TargetInstrInfo to avoid cyclic dependency between target & codegen. Reviewers: dsanders, bkramer, arsenm Subscribers: wdng, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D72485

view details

Benjamin Kramer

commit sha 498856fca5b9306f545554aeec93c7c058f03eb3

[LV] Silence unused variable warning in Release builds. NFC.

view details

push time in 2 months

push eventllvm/llvm-project

Yannick Brehon

commit sha aa189ed25fbd861b07eb5d5116dfd8e33e2b1991

Fix compatibility with python3 of clang-include-fixer.py clang-include-fixer was recently updated to be python3-compatible. However, an exception handling clause was improperly using the deprecated `message` property of Exception classes, so the code was not yet entirely python3-compatible. Differential Revision: https://reviews.llvm.org/D70902

view details

push time in 3 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 66237889a79f728fffc96394740b975774de26bf

[include-fixer] Python 3 support for clang-include-fixer.py Patch by Yannick Brehon!

view details

push time in 3 months

fork d0k/jax

Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

fork in 3 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 446acafb82b5c116b6c94c11d4ac4db7641fa58d

Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" Summary: Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" This reverts commit 5f026b6d9e882941fde9b7e5dc0a2d807f7f24f5. We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled. I'm still trying to come up with a useful/small/external example, but for now, the following IR: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@" @1 = private unnamed_addr constant [4 x i8] c"\00\00\00?" ; Function Attrs: uwtable define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %buffer_table, i64* noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.2 = alloca i64 %fusion.invar_address.dim.1 = alloca i64 %fusion.invar_address.dim.0 = alloca i64 %fusion.1.invar_address.dim.2 = alloca i64 %fusion.1.invar_address.dim.1 = alloca i64 %fusion.1.invar_address.dim.0 = alloca i64 %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = load i8*, i8** %0, !invariant.load !0, !dereferenceable !1, !align !2 %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]* %2 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %3 = load i8*, i8** %2, !invariant.load !0, !dereferenceable !1, !align !2 %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]* store i64 0, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_header.dim.0: ; preds = %fusion.1.loop_exit.dim.1, %entry %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0 %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2 br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0 fusion.1.loop_body.dim.0: ; preds = %fusion.1.loop_header.dim.0 store i64 0, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_header.dim.1: ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0 %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1 %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1 br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1 fusion.1.loop_body.dim.1: ; preds = %fusion.1.loop_header.dim.1 store i64 0, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_header.dim.2: ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1 %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2 %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4 br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2 fusion.1.loop_body.dim.2: ; preds = %fusion.1.loop_header.dim.2 %7 = load float, float* bitcast ([4 x i8]* @0 to float*) %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %9 = load float, float* %8, !invariant.load !0, !noalias !3 %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %11 = load float, float* %10, !invariant.load !0, !noalias !3 %12 = fmul float %9, %11 %13 = fmul float %7, %12 %14 = call float @llvm.log.f32(float %13) %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 store float %14, float* %15, !alias.scope !7, !noalias !8 %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1 store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_exit.dim.2: ; preds = %fusion.1.loop_header.dim.2 %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1 store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_exit.dim.1: ; preds = %fusion.1.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1 store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_exit.dim.0: ; preds = %fusion.1.loop_header.dim.0 %16 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %17 = load i8*, i8** %16, !invariant.load !0, !dereferenceable !9, !align !2 %parameter.1 = bitcast i8* %17 to float* %18 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %19 = load i8*, i8** %18, !invariant.load !0, !dereferenceable !10, !align !2 %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]* %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 0 %21 = load i8*, i8** %20, !invariant.load !0, !dereferenceable !11, !align !2 %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]* store i64 0, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0 %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0 %22 = icmp uge i64 %fusion.indvar.dim.0, 2 br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1 %23 = icmp uge i64 %fusion.indvar.dim.1, 3 br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 store i64 0, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_header.dim.2: ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1 %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2 %24 = icmp uge i64 %fusion.indvar.dim.2, 4 br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2 fusion.loop_body.dim.2: ; preds = %fusion.loop_header.dim.2 %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %26 = add nuw nsw i64 0, %25 %27 = udiv i64 %26, 4 %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %29 = add nuw nsw i64 0, %28 %30 = udiv i64 %29, 2 %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26 %32 = load float, float* %31, !alias.scope !7, !noalias !8 %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1 %34 = add nuw nsw i64 0, %33 %35 = udiv i64 %34, 3 %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3 %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0 %38 = load float, float* %37, !invariant.load !0, !noalias !3 %39 = fsub float %36, %38 %40 = fmul float %39, %39 %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %42 = add nuw nsw i64 0, %41 %43 = udiv i64 %42, 4 %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %45 = add nuw nsw i64 0, %44 %46 = udiv i64 %45, 2 %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %48 = load float, float* %47, !invariant.load !0, !noalias !3 %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %50 = load float, float* %49, !invariant.load !0, !noalias !3 %51 = fmul float %48, %50 %52 = fdiv float %40, %51 %53 = fadd float %32, %52 %54 = fneg float %53 %55 = load float, float* bitcast ([4 x i8]* @1 to float*) %56 = fmul float %54, %55 %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]]* %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2 store float %56, float* %57, !alias.scope !8, !noalias !12 %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1 store i64 %invar.inc5, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_exit.dim.2: ; preds = %fusion.loop_header.dim.2 %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc4, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc3, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 %58 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %59 = load i8*, i8** %58, !invariant.load !0, !dereferenceable !2, !align !2 %tuple.30 = bitcast i8* %59 to [1 x i8*]* %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8* %61 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.30, i64 0, i64 0 store i8* %60, i8** %61, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare float @llvm.log.f32(float) #1 attributes #0 = { uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` gets (correctly) optimized to the one below without the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = fmul <4 x float> %7, %7 %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %39 = fdiv <8 x float> %shuffle30, %shuffle31 %40 = fadd <8 x float> %shuffle, %39 %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %42 = bitcast i8* %26 to <8 x float>* store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12 %43 = getelementptr inbounds i8, i8* %26, i64 32 %44 = fdiv <4 x float> %37, %38 %45 = fadd <4 x float> %10, %44 %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %47 = bitcast i8* %43 to <4 x float>* store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %48 = bitcast float* %.phi.trans.insert to <4 x float>* %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8 %50 = bitcast float* %.phi.trans.insert12 to <4 x float>* %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %52 = getelementptr inbounds i8, i8* %26, i64 48 %53 = fmul <4 x float> %51, %51 %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %54 = fdiv <8 x float> %shuffle30, %shuffle31.1 %55 = fadd <8 x float> %shuffle.1, %54 %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %57 = bitcast i8* %52 to <8 x float>* store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12 %58 = getelementptr inbounds i8, i8* %26, i64 80 %59 = fdiv <4 x float> %37, %53 %60 = fadd <4 x float> %49, %59 %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %62 = bitcast i8* %58 to <4 x float>* store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12 %63 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %64 = bitcast i8** %63 to [1 x i8*]** %65 = load [1 x i8*]*, [1 x i8*]** %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %66 = getelementptr inbounds [1 x i8*], [1 x i8*]* %65, i64 0, i64 0 store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) #1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` and (incorrectly) optimized to the one below with the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3 %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3 %40 = fmul <4 x float> %7, %7 %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %42 = fdiv <8 x float> %shuffle32, %41 %43 = fadd <8 x float> %shuffle, %42 %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %45 = bitcast i8* %26 to <8 x float>* store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12 %46 = extractelement <4 x float> %10, i32 0 %47 = getelementptr inbounds i8, i8* %26, i64 32 %48 = extractelement <4 x float> %10, i32 1 %49 = extractelement <4 x float> %10, i32 2 %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8 %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3 %52 = fmul float %51, %51 %53 = insertelement <4 x float> undef, float %52, i32 3 %54 = fdiv <4 x float> %37, %53 %55 = insertelement <4 x float> undef, float %46, i32 0 %56 = insertelement <4 x float> %55, float %48, i32 1 %57 = insertelement <4 x float> %56, float %49, i32 2 %58 = insertelement <4 x float> %57, float %50, i32 3 %59 = fadd <4 x float> %58, %54 %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %61 = bitcast i8* %47 to <4 x float>* store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %62 = bitcast float* %.phi.trans.insert to <4 x float>* %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8 %64 = bitcast float* %.phi.trans.insert12 to <4 x float>* %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %66 = getelementptr inbounds i8, i8* %26, i64 48 %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3 %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3 %69 = fmul <4 x float> %65, %65 %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %71 = fdiv <8 x float> %shuffle32, %70 %72 = fadd <8 x float> %shuffle.1, %71 %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %74 = bitcast i8* %66 to <8 x float>* store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12 %75 = extractelement <4 x float> %69, i32 0 %76 = extractelement <4 x float> %63, i32 0 %77 = getelementptr inbounds i8, i8* %26, i64 80 %78 = extractelement <4 x float> %69, i32 1 %79 = extractelement <4 x float> %63, i32 1 %80 = extractelement <4 x float> %69, i32 2 %81 = extractelement <4 x float> %63, i32 2 %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8 %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3 %84 = fmul float %83, %83 %85 = insertelement <4 x float> undef, float %75, i32 0 %86 = insertelement <4 x float> %85, float %78, i32 1 %87 = insertelement <4 x float> %86, float %80, i32 2 %88 = insertelement <4 x float> %87, float %84, i32 3 %89 = fdiv <4 x float> %37, %88 %90 = insertelement <4 x float> undef, float %76, i32 0 %91 = insertelement <4 x float> %90, float %79, i32 1 %92 = insertelement <4 x float> %91, float %81, i32 2 %93 = insertelement <4 x float> %92, float %82, i32 3 %94 = fadd <4 x float> %93, %89 %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %96 = bitcast i8* %77 to <4 x float>* store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12 %97 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %98 = bitcast i8** %97 to [1 x i8*]** %99 = load [1 x i8*]*, [1 x i8*]** %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %100 = getelementptr inbounds [1 x i8*], [1 x i8*]* %99, i64 0, i64 0 store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) #1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` This results in bad numerical answers when used through XLA. Again, it's not that easy to give a small fully-reproducible example, but the misscompare is: ``` Expected literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -28.2577019, -inf }, { nan, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) Actual literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -inf, -inf }, { inf, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) ``` Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc Subscribers: hiraditya, Charusso, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70516

view details

push time in 3 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha cd4811360e2a1d23578073c6c99b2ef8ba276289

[ValueTracking] Add a basic version of isKnownNonInfinity and use it to detect more NoNaNs

view details

push time in 3 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 360f661733245ec15be4fc10c413f683c3cdd13f

Revert "[ThinLTO] Add correctness check for RO/WO variable import" This reverts commit a2292cc537b561416c21e8d4017715d652c144cc. Breaks clang selfhost w/ThinLTO.

view details

push time in 3 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 6c94068da99ae694a14f2484a2c9ac74a22bf61a

[Driver] Remove unused variable. NFC.

view details

push time in 4 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha eb12b3b8a3e5f41a6ab84f94dfc85551f92bc2ea

Silence warning, PyMODINIT_FUNC already contains extern "C" PythonReadline.h:22:12: warning: duplicate 'extern' declaration specifier [-Wduplicate-decl-specifier]

view details

push time in 4 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 5f158d8e21bed00a6d7377742660397bd4765456

[X86] Gate select->fmin/fmax transform on NoSignedZeros instead of UnsafeFPMath

view details

push time in 4 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 00e53d912dd768047a4fdc6e0e9b3ac7f0bcc5e5

[X86] Specifically limit fmin/fmax commutativity to NoNaNs + NoSignedZeros The backend UnsafeFPMath flag is not a superset of all the others, so limit it to the exact bits needed.

view details

push time in 4 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha d3ec06d219788801380af1948c7f7ef9d3c6100b

Revert "[LV] Apply sink-after & interleave-groups as VPlan transformations (NFC)" This reverts commit 2be17087f8c38934b7fc9208ae6cf4e9b4d44f4b. Fails ASAN.

view details

push time in 4 months

push eventllvm/llvm-project

Shu-Chun Weng

commit sha 5e307808557f4786c6438c9cfd67784073c5a3b7

Correct size_t format specifier Differential Revision: https://reviews.llvm.org/D69455

view details

push time in 4 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha 6f0bb7703705e8e63966fae96e5a2f9a8312b0b2

[InstCombine] Fold one-use variable into assert Avoids warnings in Release builds. NFC.

view details

push time in 4 months

push eventllvm/llvm-project

Benjamin Kramer

commit sha bfa3f0c316655d0140abb4e90f82242a7c2b4ea4

Hide implementation details in anonymous namespaces. NFC.

view details

push time in 4 months

more