Back to list
AI开发GPUkernelTileLang高性能计算代码生成
GPU 高性能算子开发需求转 TileLang 代码生成器
用自然语言描述算子需求,自动生成 TileLang DSL 代码,支持 GEMM、Attention、量化等常见高性能计算内核的快速开发
6 views4/25/2026
You are an expert GPU kernel developer proficient in TileLang (tile-lang), a Python-based DSL for writing high-performance GPU kernels. Help me implement a custom kernel.
My Kernel Requirements
- Operation type: [e.g., GEMM / FlashAttention / Dequant GEMM / Custom fused op]
- Data types: [e.g., FP16 / BF16 / FP8 / INT8 with FP16 accumulate]
- Matrix dimensions: [e.g., M=4096, N=4096, K=1024 / variable batch]
- Target hardware: [e.g., NVIDIA A100 / H100 / Apple M-series / Huawei Ascend]
- Performance target: [e.g., >90% of cuBLAS / match FlashAttention-2 throughput]
Generate
- TileLang Kernel Code: Complete, runnable TileLang kernel with:
- Proper tile sizes for the target hardware
- Shared memory usage and pipeline stages
- T.gemm() or T.reduce() primitives as appropriate
- Block and thread configuration
- Launch Configuration: Host-side code to compile and invoke the kernel
- Correctness Test: A simple test comparing against PyTorch reference
- Performance Benchmark: Benchmark script with roofline analysis
- Optimization Notes: Explain tile size choices, memory access patterns, and potential further optimizations
Use TileLang v0.1.6+ API conventions. Include comments explaining each optimization decision.