addptr operand produced by an unsupported operation: `divsi` #16

manbearian · 2023-10-13T22:14:06Z

created from #7.

I don't know what original Triton code looked like that created this, but there is a division in the address expression.

repros.zip

triton-shared-opt -triton-to-linalg 5.mlir
triton-shared-opt -triton-to-linalg 32.mlir
triton-shared-opt -triton-to-linalg 35.mlir
triton-shared-opt -triton-to-linalg 41.mlir
triton-shared-opt -triton-to-linalg 72.mlir
triton-shared-opt -triton-to-linalg 88.mlir

Error output:

%21 = "arith.divsi"(%17, %9) {MetaUse} : (tensor<8x1xi32>, tensor<8x1xi32>) -> tensor<8x1xi32>
encountered addptr operand produced by an unsupported operation
UNREACHABLE executed at /home/ianb/src/triton/third_party/triton_shared/lib/Analysis/PtrAnalysis.cpp:641!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt -triton-to-linalg /home/ianb/test/ttirs_linalg_failed/5.mlir
 #0 0x00005625d486037b llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x483637b)
 #1 0x00005625d485e0b4 SignalHandler(int) Signals.cpp:0:0
 #2 0x00007fc808d471f0 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x141f0)
 #3 0x00007fc8087f1fbb raise ./signal/../sysdeps/unix/sysv/linux/raise.c:50:1
 #4 0x00007fc8087d7864 abort ./stdlib/abort.c:81:7
 #5 0x00005625d479d6aa (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x47736aa)
 #6 0x00005625d14b60ab /home/ianb/src/triton/third_party/triton_shared/lib/Analysis/PtrAnalysis.cpp:640:5
 #7 0x00005625d14b7c16 mlir::Value::operator bool() const /home/ianb/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/mlir/IR/Value.h:117:43
 #8 0x00005625d14b7c16 mlir::triton::PtrAnalysis::visitOperandAddptr(mlir::triton::AddPtrOp, mlir::triton::PtrState&, mlir::Location, mlir::ConversionPatternRewriter&, llvm::SmallDenseMap<mlir::Value, mlir::triton::PtrState, 4u, llvm::DenseMapInfo<mlir::Value, void>, llvm::detail::DenseMapPair<mlir::Value, mlir::triton::PtrState>> const&) /home/ianb/src/triton/third_party/triton_shared/lib/Analysis/PtrAnalysis.cpp:528:3
 #9 0x00005625d14b8714 mlir::triton::PtrAnalysis::rewriteAddptrOp(mlir::triton::AddPtrOp, mlir::ConversionPatternRewriter&, llvm::SmallDenseMap<mlir::Value, mlir::triton::PtrState, 4u, llvm::DenseMapInfo<mlir::Value, void>, llvm::detail::DenseMapPair<mlir::Value, mlir::triton::PtrState>>&) /home/ianb/src/triton/third_party/triton_shared/lib/Analysis/PtrAnalysis.cpp:689:26
#10 0x00005625d13e43e1 llvm::SmallDenseMap<mlir::Value, mlir::triton::PtrState, 4u, llvm::DenseMapInfo<mlir::Value, void>, llvm::detail::DenseMapPair<mlir::Value, mlir::triton::PtrState>>::~SmallDenseMap() /home/ianb/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/llvm/ADT/DenseMap.h:960:11
#11 0x00005625d13e43e1 (anonymous namespace)::AddPtrConverter::matchAndRewrite(mlir::triton::AddPtrOp, mlir::triton::AddPtrOpAdaptor, mlir::ConversionPatternRewriter&) const /home/ianb/src/triton/third_party/triton_shared/lib/Conversion/TritonToLinalg/TritonToLinalg.cpp:436:3
#12 0x00005625d12d62a7 mlir::OpConversionPattern<mlir::triton::AddPtrOp>::matchAndRewrite(mlir::Operation*, llvm::ArrayRef<mlir::Value>, mlir::ConversionPatternRewriter&) const /home/ianb/.triton/llvm/llvm+mlir-17.0.0-x86_64-linux-gnu-ubuntu-18.04-release/include/mlir/Transforms/DialectConversion.h:536:73
#13 0x00005625d39b35b1 mlir::ConversionPattern::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&) const (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x39895b1)
#14 0x00005625d39ff6b2 mlir::PatternApplicator::matchAndRewrite(mlir::Operation*, mlir::PatternRewriter&, llvm::function_ref<bool (mlir::Pattern const&)>, llvm::function_ref<void (mlir::Pattern const&)>, llvm::function_ref<mlir::LogicalResult (mlir::Pattern const&)>) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x39d56b2)
#15 0x00005625d39bfd19 (anonymous namespace)::OperationLegalizer::legalize(mlir::Operation*, mlir::ConversionPatternRewriter&) DialectConversion.cpp:0:0
#16 0x00005625d39c02f0 (anonymous namespace)::OperationConverter::convertOperations(llvm::ArrayRef<mlir::Operation*>, llvm::function_ref<void (mlir::Diagnostic&)>) DialectConversion.cpp:0:0
#17 0x00005625d39c26b0 mlir::applyFullConversion(mlir::Operation*, mlir::ConversionTarget&, mlir::FrozenRewritePatternSet const&) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x39986b0)
#18 0x00005625d13dc1b2 (anonymous namespace)::TritonToLinalgPass::runOnOperation() /home/ianb/src/triton/third_party/triton_shared/lib/Conversion/TritonToLinalg/TritonToLinalgPass.cpp:194:16
#19 0x00005625d1412991 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x13e8991)
#20 0x00005625d14131e1 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x13e91e1)
#21 0x00005625d1413d4a mlir::PassManager::run(mlir::Operation*) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x13e9d4a)
#22 0x00005625d1403f7b performActions(llvm::raw_ostream&, std::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) MlirOptMain.cpp:0:0
#23 0x00005625d1404ab5 processBuffer(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::MlirOptMainConfig const&, mlir::DialectRegistry&, llvm::ThreadPool*) MlirOptMain.cpp:0:0
#24 0x00005625d1404ba0 mlir::LogicalResult llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::'lambda'(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>(long, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) MlirOptMain.cpp:0:0
#25 0x00005625d474b4c5 mlir::splitAndProcessBuffer(std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<mlir::LogicalResult (std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, bool, bool) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x47214c5)
#26 0x00005625d1402aa3 mlir::MlirOptMain(llvm::raw_ostream&, std::unique_ptr<llvm::MemoryBuffer, std::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x13d8aa3)
#27 0x00005625d1404ed3 mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x13daed3)
#28 0x00005625d0405d5b main /home/ianb/src/triton/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt.cpp:16:33
#29 0x00007fc8087d9565 __libc_start_main ./csu/../csu/libc-start.c:332:16
#30 0x00005625d0405c5e _start (build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt+0x3dbc5e)
find: ‘build/cmake.linux-x86_64-cpython-3.8/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt’ terminated by signal 6

The text was updated successfully, but these errors were encountered:

yuanfz98 · 2023-11-15T06:33:51Z

Add a python script:

import triton
import triton.language as tl

@triton.jit
def triton_(out_ptr0, xnumel, XBLOCK : tl.constexpr):
    xnumel = 2688
    xoffset = tl.program_id(0) * XBLOCK
    xindex = xoffset + tl.arange(0, XBLOCK)[:]
    xmask = xindex < xnumel
    x0 = xindex % 7
    x1 = (xindex // 7)
    tmp0 = 0.0
    tl.store(out_ptr0 + (x0 + (72*x1)), tmp0, xmask)

src = triton.compiler.ASTSource(
    fn=triton_,
    signature="*fp32,i32",
    constants={"XBLOCK": 32}
)
ret = triton.compile(
    src,
)
print(ret.asm["ttir"])

module {
  tt.func public @triton_(%arg0: !tt.ptr<f32>, %arg1: i32) attributes {noinline = false} {
    %cst = arith.constant dense<0.000000e+00> : tensor<32xf32>
    %cst_0 = arith.constant dense<72> : tensor<32xi32>
    %cst_1 = arith.constant dense<7> : tensor<32xi32>
    %cst_2 = arith.constant dense<2688> : tensor<32xi32>
    %c32_i32 = arith.constant 32 : i32
    %0 = tt.get_program_id x : i32
    %1 = arith.muli %0, %c32_i32 : i32
    %2 = tt.make_range {end = 32 : i32, start = 0 : i32} : tensor<32xi32>
    %3 = tt.splat %1 : i32 -> tensor<32xi32>
    %4 = arith.addi %3, %2 : tensor<32xi32>
    %5 = arith.cmpi slt, %4, %cst_2 : tensor<32xi32>
    %6 = arith.remsi %4, %cst_1 : tensor<32xi32>
    %7 = arith.divsi %4, %cst_1 : tensor<32xi32>
    %8 = arith.muli %7, %cst_0 : tensor<32xi32>
    %9 = arith.addi %6, %8 : tensor<32xi32>
    %10 = tt.splat %arg0 : !tt.ptr<f32> -> tensor<32x!tt.ptr<f32>>
    %11 = tt.addptr %10, %9 : tensor<32x!tt.ptr<f32>>, tensor<32xi32>
    tt.store %11, %cst, %5 : tensor<32x!tt.ptr<f32>>
    tt.return
  }
}

%20 = "arith.constant"() <{value = 7 : index}> : () -> index
%15 = "arith.divsi"(%12, %5) {MetaUse} : (tensor<32xi32>, tensor<32xi32>) -> tensor<32xi32>
encountered addptr operand produced by an unsupported operation
UNREACHABLE executed at /workspace/hongjing/triton2/third_party/triton_shared/lib/Analysis/PtrAnalysis.cpp:640!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.      Program arguments: /workspace/hongjing/triton2/python/build/cmake.linux-x86_64-cpython-3.10/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt /tmp/tmpoizjes5t/tt.mlir --triton-to-linalg -o /tmp/tmpoizjes5t/ttshared.mlir
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  triton-shared-opt 0x000055e5bb21f2e7
1  triton-shared-opt 0x000055e5bb21ce0e
2  triton-shared-opt 0x000055e5bb21f99f
3  libc.so.6         0x00007ffad8065520
4  libc.so.6         0x00007ffad80b99fc pthread_kill + 300
5  libc.so.6         0x00007ffad8065476 raise + 22
6  libc.so.6         0x00007ffad804b7f3 abort + 211
7  triton-shared-opt 0x000055e5bb1d84a1
8  triton-shared-opt 0x000055e5ba2038bb
9  triton-shared-opt 0x000055e5ba2017ff
10 triton-shared-opt 0x000055e5ba203608
11 triton-shared-opt 0x000055e5ba20168b
12 triton-shared-opt 0x000055e5ba2035ac
13 triton-shared-opt 0x000055e5ba202c3b
14 triton-shared-opt 0x000055e5ba203da8
15 triton-shared-opt 0x000055e5ba12f280
16 triton-shared-opt 0x000055e5b9f6775d
17 triton-shared-opt 0x000055e5bad5af10
18 triton-shared-opt 0x000055e5bad8c9a4
19 triton-shared-opt 0x000055e5bad894bf
20 triton-shared-opt 0x000055e5bad684d5
21 triton-shared-opt 0x000055e5bad5e104
22 triton-shared-opt 0x000055e5bad61403
23 triton-shared-opt 0x000055e5ba124689
24 triton-shared-opt 0x000055e5ba172c26
25 triton-shared-opt 0x000055e5ba1733c1
26 triton-shared-opt 0x000055e5ba1757fb
27 triton-shared-opt 0x000055e5ba16fb69
28 triton-shared-opt 0x000055e5ba16ed8d
29 triton-shared-opt 0x000055e5bb1a40d9
30 triton-shared-opt 0x000055e5ba16a27a
31 triton-shared-opt 0x000055e5ba16a781
32 triton-shared-opt 0x000055e5b8e5493b
33 libc.so.6         0x00007ffad804cd90
34 libc.so.6         0x00007ffad804ce40 __libc_start_main + 128
35 triton-shared-opt 0x000055e5b8e54805
Traceback (most recent call last):
  File "/workspace/hongjing/triton-shared2/python/examples/divsi.py", line 18, in <module>
    ret = triton.compile(triton_, signature="*fp32,i32", constants={"XBLOCK": 32}, device_type="cpu")
  File "/workspace/hongjing/triton2/python/triton/compiler/compiler.py", line 527, in compile
    next_module = compile_kernel(module)
  File "/workspace/hongjing/triton2/python/triton/third_party/cpu/__init__.py", line 287, in <lambda>
    lambda src: _optimize_ttsharedir(_ttir_to_ttsharedir(src)))
  File "/workspace/hongjing/triton2/python/triton/third_party/cpu/__init__.py", line 34, in _ttir_to_ttsharedir
    subprocess.check_call([triton_shared_opt_path, src_path, "--triton-to-linalg", "-o", dst_path])
  File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/workspace/hongjing/triton2/python/build/cmake.linux-x86_64-cpython-3.10/third_party/triton_shared/tools/triton-shared-opt/triton-shared-opt', '/tmp/tmpoizjes5t/tt.mlir', '--triton-to-linalg', '-o', '/tmp/tmpoizjes5t/ttshared.mlir']' died with <Signals.SIGABRT: 6>.

yuanfz98 · 2023-11-17T02:06:40Z

Currently I am looking at this issue, please assign it to me.

blaine-rister · 2024-04-26T23:04:56Z

@yuanfz98 @nhat-nguyen I wanted to check on this. There's some discussion of this pattern being supported in #62. Does it work in the latest version?

fhossein-quic · 2024-05-17T14:19:35Z

@yuanfz98 @nhat-nguyen, I want to follow up on this as well. @blaine-rister, the issue still exists in the latest version.

nhat-nguyen · 2024-05-17T16:54:49Z

@fhossein-quic Sorry for the delayed response. We can't statically determine the shape of the memory loads when there are div ops in the pointer arithmetic sequence. @haishanzzzz and his team at Meta have begun to think about a potential fallback mode for these cases.

blaine-rister · 2024-06-17T21:32:14Z

Was this triton kernel generated by Torch Inductor?

@fhossein-quic @nhat-nguyen I thought you might be interested in pytorch/pytorch#125077. We are beefing up Inductor's codegen so it won't use mod/div to compute indices nearly as often.

nhat-nguyen · 2024-06-18T08:51:45Z

@blaine-rister This code is indeed generated from torch-inductor! Thank you for the link and appreciate the improvements in the codegen! Do you know if the improvements will help in this case #138? We have rather complex codegen from torch-inductor throughout most of the basic operations such as singleton broadcasting, reshape,... I will definitely go through your chain of PR this week to understand more about the improvements. Thanks again!

blaine-rister · 2024-06-18T16:47:50Z

@nhat-nguyen that PR should work for broadcasts. I'm not sure about reshape--I think it mostly depends on what you do with the result of the reshape. AFAIK reshape by itself doesn't change the underlying data, but it affects the semantics of subsequent operations on that data.

The basic pattern seems similar to what's described in #138. Basically, we pattern match on mod/div indexing expressions, trying to determine that this is the same iteration order as some ND block. Then we solve for the shape of that block.

There are some complicated scenarios where this analysis fails, e.g. torch.tile.

Right now, to take advantage of that PR you have to follow several restrictions:

Enable block pointers (config.triton.use_block_ptr=True)
Stick to certain shapes (powers of 2, multiples of the maximum block size, a few other cases)

We have some ideas on how to expand it beyond those so you'll see the benefits on all shapes, with or without block pointers.

blaine-rister · 2024-08-08T19:06:13Z

@nhat-nguyen pytorch/pytorch#132937 should expand Inductor's block pointer support for pointwise kernels. You need to use the new config.triton.prefer_nd_tiling=True to enable this. It doesn't work for reductions yet, but I'm planning on looking at that next.

Nullkooland · 2024-12-25T08:01:48Z

@fhossein-quic Sorry for the delayed response. We can't statically determine the shape of the memory loads when there are div ops in the pointer arithmetic sequence. @haishanzzzz and his team at Meta have begun to think about a potential fallback mode for these cases.

Any progress on this?

nhat-nguyen · 2024-12-30T16:38:30Z

@fhossein-quic Sorry for the delayed response. We can't statically determine the shape of the memory loads when there are div ops in the pointer arithmetic sequence. @haishanzzzz and his team at Meta have begun to think about a potential fallback mode for these cases.

Any progress on this?

We're still working on the final touches of this fallback mode, hopefully we will be able to push some changes over the coming weeks.

manbearian assigned yuanfz98 Nov 17, 2023

manbearian added the enhancement New feature or request label Nov 17, 2023

yuanfz98 mentioned this issue Nov 22, 2023

[WIP] add a canonicalizer before triton-to-linalg #62

Closed

Nullkooland mentioned this issue Dec 12, 2024

[Bug]: Failed to lower TorchInductor generated permute kernel #201

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

addptr operand produced by an unsupported operation: `divsi` #16

addptr operand produced by an unsupported operation: `divsi` #16

manbearian commented Oct 13, 2023

yuanfz98 commented Nov 15, 2023 •

edited by nhat-nguyen

Loading

yuanfz98 commented Nov 17, 2023

blaine-rister commented Apr 26, 2024

fhossein-quic commented May 17, 2024

nhat-nguyen commented May 17, 2024 •

edited

Loading

blaine-rister commented Jun 17, 2024

nhat-nguyen commented Jun 18, 2024 •

edited

Loading

blaine-rister commented Jun 18, 2024 •

edited

Loading

blaine-rister commented Aug 8, 2024

Nullkooland commented Dec 25, 2024

nhat-nguyen commented Dec 30, 2024

addptr operand produced by an unsupported operation: divsi #16

addptr operand produced by an unsupported operation: divsi #16

Comments

manbearian commented Oct 13, 2023

yuanfz98 commented Nov 15, 2023 • edited by nhat-nguyen Loading

yuanfz98 commented Nov 17, 2023

blaine-rister commented Apr 26, 2024

fhossein-quic commented May 17, 2024

nhat-nguyen commented May 17, 2024 • edited Loading

blaine-rister commented Jun 17, 2024

nhat-nguyen commented Jun 18, 2024 • edited Loading

blaine-rister commented Jun 18, 2024 • edited Loading

blaine-rister commented Aug 8, 2024

Nullkooland commented Dec 25, 2024

nhat-nguyen commented Dec 30, 2024

addptr operand produced by an unsupported operation: `divsi` #16

addptr operand produced by an unsupported operation: `divsi` #16

yuanfz98 commented Nov 15, 2023 •

edited by nhat-nguyen

Loading

nhat-nguyen commented May 17, 2024 •

edited

Loading

nhat-nguyen commented Jun 18, 2024 •

edited

Loading

blaine-rister commented Jun 18, 2024 •

edited

Loading