ONNX Runtime

Official ONNX format Model

The easiest way to obtain the onnx format model is to download from the official hugging face repo. However, we recommend the onnx model from an older commit.

Note

The latest onnx model is buggy and may output trash tokens on some devices. See more on hugging face discussion and github issue.

Warning

The above statement is not tested as the onnxruntime installation was buggy at first and thus the issue might because of other issues.

Convert Manually

Failure

It is not recommended to convert the onnx format manually, as this GUFF model is not in a onnx friendly format.

With special cares of dims, the onnx model can be exported, but cannot be used to infer. The reason is the model encodes images in float and text in int and onnxruntime cannot deal with these 2 types in one time. The error mesg is like:

CPU_only failed: [ONNXRuntimeError] : 1 : FAIL : Load model from smolvlm_forward.onnx failed:Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(int64) and tensor(float) in node (node_index_put).

There are 3 potential fixes:

1. export 2 onnx models same as the offical released model. However, special care is still needed for interations.
2. use custom config. The workload is just equivelant to port the model layer by layer (and may not work as well).
3. forciably cast the data type while inferring. (tried but not working for now).