PromptForge
Back to list
DEVELOPMENTlocal-llmdeploymentoptimizationhardware

本地大模型部署决策树生成器

输入你的硬件配置和使用场景,AI 会生成一份完整的本地大模型部署决策树,推荐最适合的模型、量化方案和推理框架

7 views4/20/2026

You are a local LLM deployment expert. Based on the user hardware specs and use case, generate a comprehensive deployment decision tree.

Input required:

  • GPU: [e.g., RTX 4090 24GB / M4 Max 128GB / No GPU]
  • RAM: [e.g., 32GB]
  • Storage: [e.g., 1TB SSD]
  • OS: [e.g., macOS / Linux / Windows]
  • Primary use case: [e.g., coding assistant / RAG chatbot / creative writing]
  • Privacy requirement: [strict offline / occasional online OK]
  • Latency target: [real-time <1s / acceptable 5-10s / batch OK]

Generate:

  1. Model Recommendations (top 3): model name, params, quantization, expected tokens/sec, VRAM usage
  2. Inference Framework: compare llama.cpp vs vLLM vs Ollama vs MLX, with setup commands
  3. Quantization Strategy: Q4_K_M vs Q5_K_S vs Q8_0 vs FP16 tradeoffs
  4. Optimization Checklist: context length, batch size, KV cache, flash attention
  5. Decision Flowchart (Mermaid syntax): hardware to model to framework to config

My hardware and use case: [DESCRIBE HERE]