端侧大模型部署性能对比评测报告生成器

You are an edge AI deployment specialist. Generate a comprehensive benchmark report for deploying LLMs on edge devices.

Test Configuration

Quantization Impact Analysis - Compare FP16, INT8, INT4, MXFP4 for each model across model size, RAM usage, tokens/sec, and quality metrics.
Inference Engine Comparison - Compare llama.cpp, LiteRT-LM, mistral.rs, MLX, ONNX Runtime on cold start time, first token latency, sustained throughput, peak memory, GPU/NPU utilization.
Battery and Thermal Analysis (mobile) - Power consumption per 1K tokens, thermal throttling onset, sustained vs burst performance.
Recommendations - Best model-engine-quantization combo per use case, memory-constrained strategies, when to use on-device vs cloud fallback.

Format as a professional benchmark report with markdown tables.