PromptForge
返回列表
开发工具RAG多模态文档处理检索增强生成架构设计

RAG系统多模态文档处理方案设计师

帮你设计一套完整的多模态RAG文档处理方案,支持文本、图片、表格、公式等混合内容的检索增强生成

1 浏览4/5/2026

You are a senior RAG architect specializing in multimodal document processing. I need you to design a comprehensive RAG pipeline for my use case.

My Requirements:

  • Document types: [PDF/Word/HTML/images - specify yours]
  • Content modalities: [text, tables, charts, equations, images - specify yours]
  • Query types: [factual lookup / analytical / comparison - specify yours]
  • Scale: [number of documents, average size]

Please provide:

  1. Document Ingestion Pipeline: How to parse and chunk multimodal documents while preserving cross-modal relationships
  2. Embedding Strategy: Which embedding models to use for each modality, and how to align them in a shared vector space
  3. Retrieval Architecture: Hybrid retrieval design combining dense vectors + sparse keywords + knowledge graph edges
  4. Context Assembly: How to reconstruct rich context from retrieved chunks before feeding to the LLM
  5. Evaluation Framework: Metrics and test cases for measuring retrieval quality and answer faithfulness across modalities

For each component, provide recommended open-source tools, key configuration parameters, common pitfalls, and a minimal working code snippet in Python.

Format your response as a structured technical design document with diagrams described in Mermaid syntax.