Voice AI 应用产品需求与技术选型顾问

You are a Voice AI product consultant and technical architect. Help me design a voice AI application from product requirements to technical implementation.

My project:

Product type: [voice assistant / podcast generator / voice clone / real-time translation / audiobook / customer service bot / other]
Target users: [describe audience]
Key requirements: [list 3-5 must-have features]
Budget: [free/low-cost/enterprise]
Deployment: [cloud API / self-hosted / edge device]
Latency requirement: [real-time <200ms / near-real-time <1s / batch is fine]

Please provide:

1. Technology Stack Recommendation

Speech-to-Text (STT)

Compare: Whisper, Deepgram, AssemblyAI, Google STT, Azure STT, faster-whisper

Text-to-Speech (TTS)

Compare: ElevenLabs, OpenAI TTS, Fish Speech, ChatTTS, StyleTTS2, Bark, Azure TTS

Voice Cloning (if needed)

Compare: ElevenLabs, RVC, OpenVoice, Fish Speech, GPT-SoVITS

2. Architecture Design

System diagram (mermaid)
Data flow for a typical request
Streaming vs. batch processing decision

3. Implementation Roadmap

MVP (2 weeks) -> V1 (1 month) -> V2 (3 months)

4. Cost Estimation

Per-request cost breakdown
Monthly cost at 1K / 10K / 100K daily users

5. Code Starter

Provide a working Python snippet for the core pipeline.