reddit.com via Reddit

r/LocalLLaMA: vllm-doctor — Open-Source CLI to Diagnose and Monitor vLLM Inference Servers via Rule-Based Metric Checks

open source inference inference open-source

Summary

A developer posted vllm-doctor to r/LocalLLaMA — an open-source CLI that reads metrics from a vLLM server's /metrics endpoint or a Prometheus instance and runs rule-based checks to identify what is wrong, detecting queue pressure, high TTFT/TPOT, KV cache pressure, and other anomalies across pods. Each finding includes a severity level, plain-language explanation, and an actionable recommendation, working across single-node and multi-node deployments. The tool addresses a real observability gap for teams running vLLM in production who lack purpose-built diagnostics beyond raw Prometheus metrics.