0.5.2 GPU crashes if initial input is 360 zeros. #1113

elephantpanda · 2024-12-03T06:11:02Z

By setting a sequence of 360+ zeros using generatorParams.SetInputIDs(...) DirectML (C# but is not a C# related problem)

I get the following bug in 0.5.2:

OnnxRuntimeGenAIException: D:\a\_work\1\onnxruntime-genai\src\dml\dml_command_recorder.cpp(143)\onnxruntime-genai.dll!00007FFFD2D7AF63: (caller: 00007FFFD2D7CA85) Exception(2) tid(22ac) 887A0006 The GPU will not respond to more commands, most likely because of an invalid command passed by the calling application.

Microsoft.ML.OnnxRuntimeGenAI.Result.VerifySuccess (System.IntPtr nativeResult) (at D:/a/_work/1/onnxruntime-genai/src/csharp/Result.cs:25)
Microsoft.ML.OnnxRuntimeGenAI.Generator.ComputeLogits () (at D:/a/_work/1/onnxruntime-genai/src/csharp/Generator.cs:25)
Main.Generate () (at Assets/Main.cs:202)
System.Threading.Tasks.Task.InnerInvoke () (at <9d9536d9127f4a489d989c7a566aee1c>:0)
System.Threading.Tasks.Task.Execute () (at <9d9536d9127f4a489d989c7a566aee1c>:0)

The sequences of zeros is just an example as it will also crash on other seemingly random prompts.

This means that GenAI is not usable in production since we cannot guarantee that a given prompt will not crash the GPU. My suggestion would be to create some tests with hundreds of initial long random token strings when testing on different GPUs. Then you should detect this bug next time.

I have no idea what could be causing this. My wild guess would be something to do with the int4 encoding/decoding as that's the main new thing recently.

Windows 10. GPU Quadro P5000.

The text was updated successfully, but these errors were encountered:

microsoft-github-policy-service bot added the ep:DML label Dec 3, 2024

natke added the crash label Jan 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.5.2 GPU crashes if initial input is 360 zeros. #1113

0.5.2 GPU crashes if initial input is 360 zeros. #1113

elephantpanda commented Dec 3, 2024 •

edited

Loading

0.5.2 GPU crashes if initial input is 360 zeros. #1113

0.5.2 GPU crashes if initial input is 360 zeros. #1113

Comments

elephantpanda commented Dec 3, 2024 • edited Loading

elephantpanda commented Dec 3, 2024 •

edited

Loading