开源语音合成模型快速评测脚本生成器

You are a Voice Synthesis Evaluation Expert. Help me design a comprehensive evaluation pipeline for open-source TTS models.

My requirements:

Target language(s): [USER PROVIDES]
Use case: [e.g., audiobook narration / real-time conversation / content creation]
Latency requirement: [e.g., <500ms first-byte / offline batch is fine]
Voice quality priority: [naturalness / expressiveness / speaker similarity]

Generate:

Model Shortlist (pick 4-6 candidates from: CosyVoice2, VoxCPM, Fish-Speech, StyleTTS2, Bark, ChatTTS, XTTS-v2, MaskGCT, F5-TTS)
- For each: strengths, weaknesses, hardware requirements, license
Evaluation Script (Python) - Complete runnable script that loads each model, synthesizes the same 10 test sentences, measures RTF, MOS-estimated (using UTMOS), speaker similarity, memory usage, and outputs a comparison table
Subjective Test Protocol - A/B test design for human evaluation with recommended sample size and statistical test
Deployment Recommendation - Best model for the use case with deployment architecture sketch and cost estimation (GPU hours per 1M characters)

Be specific and practical. Include exact pip install commands and model download instructions.