Tutorial: Adding a New Kernel to FlashInfer
This tutorial walks through adding a simple element-wise scale operation to FlashInfer. We'll implement scale(x, factor) = x * factor to demonstrate the complete workflow.
Goal
Add a new operation that scales each element of a tensor by a scalar factor:
- Input: tensor
xand scalarfactor - Output:
x * factor(element-wise) - Support multiple dtypes (FP16, BF16, FP32)
Step 1: Define CUDA Kernel in include/
Create `include/flashi
[Description truncada. Veja o README completo no GitHub.]