reddit.com via Reddit June 8th 2026

r/LocalLLaMA: vllm-doctor — Open-Source CLI to Diagnose and Monitor vLLM Inference Servers via Rule-Based Metric Checks

open source inference inference open-source

Summary

A developer posted vllm-doctor to r/LocalLLaMA — an open-source CLI that reads metrics from a vLLM server's /metrics endpoint or a Prometheus instance and runs rule-based checks to identify what is wrong, detecting queue pressure, high TTFT/TPOT, KV cache pressure, and other anomalies across pods. Each finding includes a severity level, plain-language explanation, and an actionable recommendation, working across single-node and multi-node deployments. The tool addresses a real observability gap for teams running vLLM in production who lack purpose-built diagnostics beyond raw Prometheus metrics.

Originally reported by reddit.com

Read the original article →

Original headline: r/LocalLLaMA: vllm-doctor — Open-Source CLI to Diagnose and Monitor vLLM Inference Servers via Rule-Based Metric Checks