toolModel inferencehigh performanceservice deploymentPagedAttention

vLLM

High-throughput LLM inference and serving engine, using PagedAttention technology, 24x faster than HuggingFace

54 views760 stars3/4/2026

High-throughput LLM inference and serving engine, using PagedAttention technology, 24x faster than HuggingFace