You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I have taken a model which is supported by this library, and have removed the embeddings from the model using exclude-embeds option in builder.py. Now when i try to run the model, it is still asking for input_ids instead of input_embeds.
To Reproduce
Steps to reproduce the behavior:
Qwen/Qwen2.5-0.5 model converted by giving exclude-embeds option and run inference in java.
Desktop (please complete the following information):
OS: Mac M3
Can someone share idea on how to make this work?
The text was updated successfully, but these errors were encountered:
This is a known issue inside ONNX Runtime GenAI. The exclude_embeds flag has currently been used internally to generate the text component of the Phi-3 vision and Phi-3.5 vision ONNX models. Since those models still accept input_ids as an input to the embedding component and produce inputs_embeds as an output, the inputs_embeds input to the text component has been an ORT GenAI managed input and not a user managed input.
// Embeddings are only transient inputs and outputs.
// They are never the user provided/requested model inputs/outputs
// So only create the transient input and reuse that ortvalue for previous
// steps in the pipeline.
Similar to how the generator object contains an AppendTokensmethod for adding input_ids, a new method such as AppendEmbeds will need to be added to add inputs_embeds. Then, that method can be called by other language bindings (e.g. Java, Python, etc) and the requirement that input_ids must exist as a model input can be relaxed.
Describe the bug
I have taken a model which is supported by this library, and have removed the embeddings from the model using exclude-embeds option in builder.py. Now when i try to run the model, it is still asking for input_ids instead of input_embeds.
To Reproduce
Steps to reproduce the behavior:
Qwen/Qwen2.5-0.5 model converted by giving exclude-embeds option and run inference in java.
Desktop (please complete the following information):
Can someone share idea on how to make this work?
The text was updated successfully, but these errors were encountered: