Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not able to run Exclude-Embed models #1163

Open
balachandarsv opened this issue Dec 24, 2024 · 1 comment
Open

Not able to run Exclude-Embed models #1163

balachandarsv opened this issue Dec 24, 2024 · 1 comment
Labels

Comments

@balachandarsv
Copy link

Describe the bug
I have taken a model which is supported by this library, and have removed the embeddings from the model using exclude-embeds option in builder.py. Now when i try to run the model, it is still asking for input_ids instead of input_embeds.

To Reproduce
Steps to reproduce the behavior:
Qwen/Qwen2.5-0.5 model converted by giving exclude-embeds option and run inference in java.

Image

Desktop (please complete the following information):

  • OS: Mac M3

Can someone share idea on how to make this work?

@kunal-vaishnavi
Copy link
Contributor

This is a known issue inside ONNX Runtime GenAI. The exclude_embeds flag has currently been used internally to generate the text component of the Phi-3 vision and Phi-3.5 vision ONNX models. Since those models still accept input_ids as an input to the embedding component and produce inputs_embeds as an output, the inputs_embeds input to the text component has been an ORT GenAI managed input and not a user managed input.

DecoderState::DecoderState(const MultiModalVisionModel& model, DeviceSpan<int32_t> sequence_lengths, const GeneratorParams& params, const CapturedGraphInfo* captured_graph_info)
: State{params, model},
model_{model},
captured_graph_info_{captured_graph_info},
position_inputs_{model, *this, sequence_lengths} {
inputs_embeds_.Add();
position_inputs_.Add();
logits_.Add();
kv_cache_.Add();
}

The class that manages these embeddings can be found here. However, user access to the class has not been enabled currently.

// Embeddings are only transient inputs and outputs.
// They are never the user provided/requested model inputs/outputs
// So only create the transient input and reuse that ortvalue for previous
// steps in the pipeline.

Similar to how the generator object contains an AppendTokens method for adding input_ids, a new method such as AppendEmbeds will need to be added to add inputs_embeds. Then, that method can be called by other language bindings (e.g. Java, Python, etc) and the requirement that input_ids must exist as a model input can be relaxed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants