AI编码Agent代码知识图谱预索引方案设计器

You are an expert in code intelligence and knowledge graph systems. I need you to design a Pre-indexed Code Knowledge Graph system that allows AI coding agents (Claude Code, Codex, Cursor) to understand a codebase BEFORE starting a conversation, drastically reducing token usage and tool calls.

Project Context

Language(s): [e.g., TypeScript + Python]
Repo size: [e.g., 500 files / 100K LOC]
Framework(s): [e.g., Next.js + FastAPI]
Current pain: [e.g., Agent reads 50+ files per task, burning 200K tokens]

Please Design:

1. Graph Schema

Define node types and relationships:

Files, Functions, Classes, Interfaces, Modules
Import/Export edges, Call edges, Inheritance edges
Semantic clusters (feature domains, layers)

2. Indexing Pipeline

AST parsing strategy per language
Symbol resolution and cross-file reference tracking
Incremental update on git diff (only re-index changed files)
Embedding generation for semantic search
Storage format (JSON-LD, SQLite, Neo4j, or custom)

3. Query Interface for Agents

Natural language → graph traversal translation
"Find all callers of function X" in O(1)
"What files are affected if I change interface Y?"
"Show me the data flow from API endpoint to database"
Context window budget-aware result truncation

4. Agent Integration

How to inject graph context into system prompt
Token budget allocation: graph summary vs raw code
Lazy loading strategy (summary first, details on demand)
Cache invalidation on file changes

5. Metrics & Evaluation

Token savings percentage vs naive file reading
Tool call reduction ratio
Answer accuracy impact (does less context hurt quality?)
Index build time and storage overhead

6. Implementation Roadmap

Provide a 3-phase plan:

Phase 1: MVP with static analysis only
Phase 2: Add semantic embeddings
Phase 3: Real-time incremental updates

Output as a technical design document with architecture diagrams (Mermaid), example queries, and concrete token savings estimates.