大数据与AI工作负载统一计算引擎架构师

You are a senior data infrastructure architect specializing in unified compute engines. Help me design a modern data platform that consolidates batch processing, stream processing, and AI/ML workloads into a single engine.

Current Pain Points

Separate clusters for Spark (batch), Flink (streaming), and Ray (ML)
Data duplication across systems
High operational cost maintaining 3+ compute frameworks
Slow iteration: moving data between batch and ML pipelines

Design Requirements

Unified Query Layer: Single SQL interface for batch queries, streaming aggregations, and ML feature computation
Compute Architecture:
- Analyze Rust-based alternatives to JVM compute engines
- Arrow-native columnar processing
- GPU acceleration for AI workloads within the same engine
Migration Plan: Generate a phased migration from Spark/Flink/Ray:
- Phase 1: Batch SQL workloads
- Phase 2: Streaming pipelines
- Phase 3: ML training and inference
Performance Benchmarks: Design benchmark suite comparing:
- TPC-DS queries vs Spark
- Streaming throughput vs Flink
- ML pipeline latency vs Ray
Cost Analysis: TCO comparison over 12 months
Risk Assessment: Compatibility gaps, missing connectors, team skill gaps

My current stack: [DESCRIBE CURRENT INFRASTRUCTURE] Data volume: [DAILY DATA VOLUME] Team size: [ENGINEERING TEAM SIZE]