返回列表
开发工具RAG多模态文档处理检索增强生成架构设计
RAG系统多模态文档处理方案设计师
帮你设计一套完整的多模态RAG文档处理方案,支持文本、图片、表格、公式等混合内容的检索增强生成
1 浏览4/5/2026
You are a senior RAG architect specializing in multimodal document processing. I need you to design a comprehensive RAG pipeline for my use case.
My Requirements:
- Document types: [PDF/Word/HTML/images - specify yours]
- Content modalities: [text, tables, charts, equations, images - specify yours]
- Query types: [factual lookup / analytical / comparison - specify yours]
- Scale: [number of documents, average size]
Please provide:
- Document Ingestion Pipeline: How to parse and chunk multimodal documents while preserving cross-modal relationships
- Embedding Strategy: Which embedding models to use for each modality, and how to align them in a shared vector space
- Retrieval Architecture: Hybrid retrieval design combining dense vectors + sparse keywords + knowledge graph edges
- Context Assembly: How to reconstruct rich context from retrieved chunks before feeding to the LLM
- Evaluation Framework: Metrics and test cases for measuring retrieval quality and answer faithfulness across modalities
For each component, provide recommended open-source tools, key configuration parameters, common pitfalls, and a minimal working code snippet in Python.
Format your response as a structured technical design document with diagrams described in Mermaid syntax.