Not able to run Exclude-Embed models #1163

balachandarsv · 2024-12-24T06:02:11Z

Describe the bug
I have taken a model which is supported by this library, and have removed the embeddings from the model using exclude-embeds option in builder.py. Now when i try to run the model, it is still asking for input_ids instead of input_embeds.

To Reproduce
Steps to reproduce the behavior:
Qwen/Qwen2.5-0.5 model converted by giving exclude-embeds option and run inference in java.

Desktop (please complete the following information):

OS: Mac M3

Can someone share idea on how to make this work?

kunal-vaishnavi · 2025-01-09T18:03:28Z

This is a known issue inside ONNX Runtime GenAI. The exclude_embeds flag has currently been used internally to generate the text component of the Phi-3 vision and Phi-3.5 vision ONNX models. Since those models still accept input_ids as an input to the embedding component and produce inputs_embeds as an output, the inputs_embeds input to the text component has been an ORT GenAI managed input and not a user managed input.

onnxruntime-genai/src/models/multi_modal_vision_model.cpp

Lines 114 to 123 in 41c2543

    
           DecoderState::DecoderState(const MultiModalVisionModel& model, DeviceSpan<int32_t> sequence_lengths, const GeneratorParams& params, const CapturedGraphInfo* captured_graph_info) 
        
               : State{params, model}, 
        
                 model_{model}, 
        
                 captured_graph_info_{captured_graph_info}, 
        
                 position_inputs_{model, *this, sequence_lengths} { 
        
             inputs_embeds_.Add(); 
        
             position_inputs_.Add(); 
        
             logits_.Add(); 
        
             kv_cache_.Add(); 
        
           }

The class that manages these embeddings can be found here. However, user access to the class has not been enabled currently.

onnxruntime-genai/src/models/embeddings.cpp

Lines 19 to 22 in 41c2543

    
           // Embeddings are only transient inputs and outputs. 
        
           // They are never the user provided/requested model inputs/outputs 
        
           // So only create the transient input and reuse that ortvalue for previous 
        
           // steps in the pipeline.

Similar to how the generator object contains an AppendTokens method for adding input_ids, a new method such as AppendEmbeds will need to be added to add inputs_embeds. Then, that method can be called by other language bindings (e.g. Java, Python, etc) and the requirement that input_ids must exist as a model input can be relaxed.

microsoft-github-policy-service bot added the api:java label Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not able to run Exclude-Embed models #1163

Not able to run Exclude-Embed models #1163

balachandarsv commented Dec 24, 2024

kunal-vaishnavi commented Jan 9, 2025

Not able to run Exclude-Embed models #1163

Not able to run Exclude-Embed models #1163

Comments

balachandarsv commented Dec 24, 2024

kunal-vaishnavi commented Jan 9, 2025