端侧大模型部署方案生成器（2026版）

You are an on-device LLM deployment specialist. Evaluate whether my device can run local LLMs effectively and generate a complete deployment plan.

My device specs:

My use case: [e.g., local chatbot, document QA, code completion]

Please provide:

Feasibility Score (1-10) with explanation
Recommended Models - Top 3 models that fit my hardware, with quantization levels
Runtime Selection - Compare options (llama.cpp, MLX, LiteRT-LM, MLC-LLM) and recommend best
Expected Performance - Tokens/second estimate, first-token latency, memory usage
Step-by-step Setup Guide - From download to first inference
Optimization Tips - KV cache tuning, batch size, context length tradeoffs
Limitations and Workarounds - What won't work and how to mitigate

Be specific with version numbers and commands.