Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while converting Llama 3B fp16 gguf model #1137

Open
rakshit2020 opened this issue Dec 10, 2024 · 1 comment
Open

Error while converting Llama 3B fp16 gguf model #1137

rakshit2020 opened this issue Dec 10, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@rakshit2020
Copy link

Hi
Using llama cpp I converted the llama-3.2-3B-instruct tofp16 ggufformat now I run the cmd given in the documentation
python3 -m onnxruntime_genai.models.builder -m meta-llama/Llama-3.2-3B-Instruct -i /home/ubuntu/CPU_Serving/metaLlama3B/metaLlama3B-3.2B-F16.gguf -o Llama3B_gguf_onnx -p int4 -e cpu

I am getting the error -

Valid precision + execution provider combinations are: FP32 CPU, FP32 CUDA, FP16 CUDA, FP16 DML, INT4 CPU, INT4 CUDA, INT4 DML Extra options: {} GroupQueryAttention (GQA) is used in this model. Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/olmo/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/ubuntu/miniconda3/envs/olmo/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ubuntu/CPU_Serving/CPU_Opti/lib/python3.10/site-packages/onnxruntime_genai/models/builder.py", line 3277, in <module> create_model(args.model_name, args.input, args.output, args.precision, args.execution_provider, args.cache_dir, **extra_options) File "/home/ubuntu/CPU_Serving/CPU_Opti/lib/python3.10/site-packages/onnxruntime_genai/models/builder.py", line 3159, in create_model onnx_model.make_model(input_path) File "/home/ubuntu/CPU_Serving/CPU_Opti/lib/python3.10/site-packages/onnxruntime_genai/models/builder.py", line 2019, in make_model model = GGUFModel.from_pretrained(self.model_type, input_path, self.head_size, self.hidden_size, self.intermediate_size, self.num_attn_heads, self.num_kv_heads, self.vocab_size) File "/home/ubuntu/CPU_Serving/CPU_Opti/lib/python3.10/site-packages/onnxruntime_genai/models/gguf_model.py", line 240, in from_pretrained model = GGUFModel(input_path, head_size, hidden_size, intermediate_size, num_attn_heads, num_kv_heads, vocab_size) File "/home/ubuntu/CPU_Serving/CPU_Opti/lib/python3.10/site-packages/onnxruntime_genai/models/gguf_model.py", line 104, in __init__ curr_layer_id = int(name.split(".")[1]) ValueError: invalid literal for int() with base 10: 'weight'

@kunal-vaishnavi
Copy link
Contributor

The GGUF to ONNX path needs to be updated as several necessary changes that were made in the PyTorch to ONNX path need to be brought over. Until those updates are added, you can try converting with the PyTorch to ONNX path in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants