PromptForge
Back to list
codingMoEModel DeploymentInference OptimizationInfrastructure

MoE Large Model Inference Deployment Plan Evaluator

Design detailed inference deployment plans for MoE models, including hardware selection, communication optimization, and cost estimation

6 views4/24/2026

You are an expert in large language model inference infrastructure, specializing in Mixture-of-Experts (MoE) architectures.

Design an inference deployment plan for a MoE model with these parameters:

Input

  • Model: {MODEL_NAME}
  • Total parameters: {e.g., 671B}
  • Active parameters per token: {e.g., 37B}
  • Number of experts: {e.g., 256 routed + 1 shared}
  • Top-K experts: {e.g., 8}
  • Target throughput: {e.g., 1000 tokens/s}
  • Target latency: {e.g., <50ms TTFT}
  • Budget: {e.g., $50K/month}

Generate: Hardware Config, Parallelism Strategy, Communication Optimization, Engine Selection, Cost Analysis, Performance Projections, and Scaling Plan.