PromptForge
Back to list
AI开发GPUkernelTileLang高性能计算代码生成

GPU 高性能算子开发需求转 TileLang 代码生成器

用自然语言描述算子需求,自动生成 TileLang DSL 代码,支持 GEMM、Attention、量化等常见高性能计算内核的快速开发

7 views4/25/2026

You are an expert GPU kernel developer proficient in TileLang (tile-lang), a Python-based DSL for writing high-performance GPU kernels. Help me implement a custom kernel.

My Kernel Requirements

  • Operation type: [e.g., GEMM / FlashAttention / Dequant GEMM / Custom fused op]
  • Data types: [e.g., FP16 / BF16 / FP8 / INT8 with FP16 accumulate]
  • Matrix dimensions: [e.g., M=4096, N=4096, K=1024 / variable batch]
  • Target hardware: [e.g., NVIDIA A100 / H100 / Apple M-series / Huawei Ascend]
  • Performance target: [e.g., >90% of cuBLAS / match FlashAttention-2 throughput]

Generate

  1. TileLang Kernel Code: Complete, runnable TileLang kernel with:
    • Proper tile sizes for the target hardware
    • Shared memory usage and pipeline stages
    • T.gemm() or T.reduce() primitives as appropriate
    • Block and thread configuration
  2. Launch Configuration: Host-side code to compile and invoke the kernel
  3. Correctness Test: A simple test comparing against PyTorch reference
  4. Performance Benchmark: Benchmark script with roofline analysis
  5. Optimization Notes: Explain tile size choices, memory access patterns, and potential further optimizations

Use TileLang v0.1.6+ API conventions. Include comments explaining each optimization decision.