PromptForge
Back to list
developmentRAGsemantic-searchlocal-AIvector-databaseknowledge-base

本地知识库语义搜索引擎一键设计提示词

从零设计一个完全本地运行的文档语义搜索系统,支持多格式文档、向量检索和自然语言问答

10 views4/10/2026

You are a senior AI infrastructure engineer. Design a fully local semantic search engine for my document collection.

My Setup:

  • OS: [Mac/Linux/Windows]
  • GPU: [NVIDIA RTX xxxx / Apple Silicon M-series / CPU only]
  • Documents: [describe your docs - PDFs, markdown, code, meeting notes, etc.]
  • Scale: [approximate number of documents and total size]

Design Requirements:

1. Document Ingestion Pipeline

  • List exact tools for each document format (PDF to text, DOCX, HTML, code files)
  • Chunking strategy: recommend chunk size, overlap, and method (semantic vs fixed)
  • Metadata extraction: titles, dates, authors, tags

2. Embedding and Vector Store

  • Recommend the best local embedding model for my hardware
  • Compare: Nomic Embed, BGE, GTE, E5 - pick one with justification
  • Vector DB: recommend between ChromaDB, Qdrant, LanceDB for local use
  • Index configuration: HNSW parameters, quantization settings

3. Query Pipeline

  • Implement hybrid search (vector + BM25 keyword)
  • Re-ranking with a local cross-encoder model
  • Query expansion/reformulation strategy

4. QA Layer

  • Local LLM selection for answer generation (Qwen, Llama, Phi)
  • Context window management: how to fit retrieved chunks
  • Citation/source attribution in responses

5. Implementation Plan

Provide a step-by-step bash/python script outline I can run to set this up. Include all pip install commands, model download commands, and config files.

Output: Complete technical design document with copy-paste ready commands.