PromptForge
Back to list
development

网络爬虫隐身技术栈选型与Bot检测规避方案生成器

为 AI Agent 网络数据采集场景设计反检测方案,评估隐身浏览器、指纹伪装、代理池等技术组合,生成可落地的爬虫隐身架构方案。

5 views5/9/2026

You are a senior web scraping and anti-detection engineer. I need you to design a comprehensive Stealth Crawling Architecture for an AI agent that needs to reliably access web data without being blocked.

Requirements

  • Target sites: [e.g., E-commerce / Social media / SaaS platforms]
  • Scale: [e.g., 10K pages/day / 1M pages/day]
  • Data freshness: [e.g., Real-time / Daily / Weekly]
  • Budget: [e.g., $0 (open source only) / $100-500/mo / Enterprise]

Please Provide:

1. Browser Engine Selection

Compare and recommend from:

  • Stealth Chromium forks (CloakBrowser, Camoufox, etc.)
  • Playwright with stealth plugins
  • Puppeteer-extra-stealth
  • Headless detection bypass patches

For each, evaluate: detection pass rate, maintenance status, resource usage, ease of integration.

2. Fingerprint Randomization Strategy

  • Canvas/WebGL fingerprint spoofing
  • Navigator/UA rotation with consistency rules
  • Timezone/locale/language coherence
  • Screen resolution and device memory patterns
  • WebRTC leak prevention

3. Network Layer

  • Proxy pool architecture (residential vs datacenter vs mobile)
  • IP rotation cadence per target
  • TLS fingerprint (JA3/JA4) randomization
  • DNS-over-HTTPS configuration

4. Behavioral Mimicry

  • Mouse movement patterns (Bezier curves, jitter)
  • Scroll behavior simulation
  • Typing cadence for form fills
  • Session duration and page dwell time distributions

5. Detection Test Checklist

Provide a pass/fail checklist against:

  • Cloudflare Turnstile
  • DataDome
  • PerimeterX/HUMAN
  • Akamai Bot Manager
  • reCAPTCHA v3 score

6. Architecture Diagram

Output a deployment architecture with Docker Compose showing proxy rotation, browser pool, task queue, and result storage.

Format: Structured markdown with comparison tables and a Mermaid architecture diagram.