You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
auto model = OgaModel::Create("llama3.2-3b-onnx-int4");
auto tokenizer = OgaTokenizer::Create(*model);
params->SetSearchOption("max_length", 128);
auto tokenizer_stream = OgaTokenizerStream::Create(*tokenizer);
auto params = OgaGeneratorParams::Create(*model);
std::string query = "tell me hello";
auto seq = OgaSequences::Create();
tokenizer->Encode(query.c_str(), *seq);
params->SetInputSequences(*seq);
auto generator = OgaGenerator::Create(*model, *params);
Then you will see the bug:
terminate called after throwing an instance of 'std::runtime_error'
what(): Model output was not found: logits
Aborted
onnxruntime-genai version: 0.5.2
OS:linux
The text was updated successfully, but these errors were encountered:
If you are using exclude_lm_head, then the ONNX model's output will be the last hidden states instead of logits. When running this model, you will then have to convert your hidden states to logits afterwards since the generation loop in ONNX Runtime GenAI is performed on logits. Support for this approach is currently limited since there have not been many requests for it.
An alternative approach is to have both the last hidden states and the logits as outputs in the ONNX model. You can achieve that by using include_hidden_states in the extra_options (see example usage here).
Describe the bug
can't infer with a "exclude_lm_head" model
To Reproduce
Then you will see the bug:
onnxruntime-genai version: 0.5.2
OS:linux
The text was updated successfully, but these errors were encountered: