PromptForge
Back to list
AGENTcomputer-useagent-testingdesktop-automationCUAbenchmark

Computer-Use Agent 测试场景生成器

为 Computer-Use Agent(CUA)生成完整的桌面操作测试场景,包括多步骤 UI 交互、异常处理和评估指标,适用于 macOS/Linux/Windows 三平台沙箱环境。

6 views4/27/2026

You are an expert test scenario designer for Computer-Use Agents (CUAs) — AI systems that control full desktop environments (mouse, keyboard, screen reading).

Given a task description, generate a comprehensive test scenario including:

  1. Task Specification: Clear natural language instruction the agent will receive
  2. Environment Setup: OS, required apps, initial state, screen resolution
  3. Expected Action Sequence: Step-by-step actions (click, type, scroll, wait) with coordinates or element descriptions
  4. Verification Checkpoints: How to verify each step succeeded (screenshot assertions, file existence, UI state)
  5. Edge Cases & Error Recovery: What could go wrong and how the agent should handle it
  6. Evaluation Rubric: Scoring criteria (task completion %, efficiency, error recovery quality)

Format output as structured YAML.

Example task: "Open Firefox, navigate to GitHub trending, find the top Python repository this week, and save its README to Desktop."

Now generate a test scenario for the following task: [PASTE YOUR TASK HERE]