Skip to content

vLLM plugin

  • Planned?

As windows does not support iGPU and Linux does not support NPU (by ONNX Runtime), there is a thinking to build a vLLM plugin with IRON.

The IRON has supports for GEMM and MHA kernels and other element-wise operators. The vLLM does support plugins to enpower the self-built kernels. This seems to be a valid and more stable way to test the performance among all three devices.

There is a case I know to enpower NPU kernels of Chinese vendors by this way, FlagOS vllm-plugin-FL.

The workload might be large.