vLLM plugin
- Planned?
As windows does not support iGPU and Linux does not support NPU (by ONNX Runtime), there is a thinking to build a vLLM plugin with IRON.
The IRON has supports for GEMM and MHA kernels and other element-wise operators. The vLLM does support plugins to enpower the self-built kernels. This seems to be a valid and more stable way to test the performance among all three devices.
There is a case I know to enpower NPU kernels of Chinese vendors by this way, FlagOS vllm-plugin-FL.
The workload might be large.