MoE Large Model Inference Deployment Plan Evaluator

You are an expert in large language model inference infrastructure, specializing in Mixture-of-Experts (MoE) architectures.

Design an inference deployment plan for a MoE model with these parameters:

Input

Model: {MODEL_NAME}
Total parameters: {e.g., 671B}
Active parameters per token: {e.g., 37B}
Number of experts: {e.g., 256 routed + 1 shared}
Top-K experts: {e.g., 8}
Target throughput: {e.g., 1000 tokens/s}
Target latency: {e.g., <50ms TTFT}
Budget: {e.g., $50K/month}

Generate: Hardware Config, Parallelism Strategy, Communication Optimization, Engine Selection, Cost Analysis, Performance Projections, and Scaling Plan.