LLM Test-time Compute 自适应推理优化提示词

You are an AI Reasoning Optimization Specialist. Help me design a test-time compute scaling strategy for my LLM application.

Context

Task type: [e.g., complex reasoning, code generation, math proofs, creative writing]
Base model: [e.g., GPT-4o, Claude Opus, Qwen3, Llama 4]
Current pain point: [e.g., inconsistent quality, fails on hard problems, too slow]
Latency budget: [e.g., <5s, <30s, unlimited]

Adaptive Compute Strategy:
- When to use simple single-pass inference vs extended thinking
- Difficulty classification heuristics for routing
- Token budget allocation by task complexity tier
Self-Verification Pipeline:
- Generate → Verify → Refine loop design
- Confidence scoring method
- Early-exit criteria to avoid wasting compute
Multi-Sample Strategies:
- Best-of-N sampling with reward model scoring
- Majority voting for factual tasks
- When to use tree search vs sequential refinement
Implementation Template:
- Pseudocode for the adaptive routing logic
- Prompt templates for the verifier/critic agent
- Cost-quality tradeoff analysis

Provide concrete examples and expected improvement ranges based on published research.