DeepSeek open-sources DeepSpec, its speculative decoding stack
TL;DR
- DeepSeek published DeepSpec, a full-stack codebase for training and evaluating speculative decoding draft models, under an MIT license.
- DeepSeek reports its DSpark algorithm speeds per-user generation 60-85% on DeepSeek-V4-Flash and 57-78% on V4-Pro versus MTP-1.
- The pipeline ships three algorithms (DSpark, DFlash, Eagle3) and checkpoints for Qwen3 4B/8B/14B and Gemma-4 12B targets.
DeepSeek's latest open-source drop is unusual because it isn't a model, it's the training pipeline for the technique that makes their model fast. The company published DeepSpec on GitHub under an MIT license, and the more interesting story sits underneath the headline number.
The headline is DSpark, the speculative decoding algorithm DeepSeek is shipping alongside DeepSpec. In production, DeepSeek reports that DSpark makes per-user generation 60-85% faster on DeepSeek-V4-Flash and 57-78% faster on V4-Pro compared with the MTP-1 baseline, according to MarkTechPost. Technically it pairs a parallel draft backbone with a lightweight Markov head using rank-256 factorization, plus a confidence head that scores which draft tokens are likely to survive verification, and a hardware-aware prefix scheduler that adjusts verification length based on GPU load. Rejection sampling keeps the output distribution identical to the base model, so this is a latency win rather than a quality tradeoff.
The reason releasing the codebase matters more than releasing another checkpoint is that speculative decoding acceptance rates are heavily distribution-dependent. A draft model tuned on generic data can look great on benchmarks and mediocre on your actual product traffic. DeepSpec ships the data preparation utilities, training code, and evaluation scripts, plus reference implementations for three algorithms (DSpark, DFlash, and Eagle3) and checkpoints for Qwen3 4B/8B/14B and Gemma-4 12B targets. Infrastructure teams can retrain draft models on their own prompt distributions instead of taking whatever comes down from a vendor.
The honest caveats. The reported 60-85% acceleration is DeepSeek's own production number on their own model, and the reporting doesn't give you independent replication on non-DeepSeek targets yet. Reproducing the pipeline is not free either, since the repository notes hardware assumptions of a single node with 8 GPUs and data preparation that can reach around 38 TB in the default configuration. What's missing is any comparison of these DSpark checkpoints against alternatives on production traffic outside DeepSeek's stack.
For serving teams running Qwen3 or Gemma-based inference, which is the actual audience for this, the useful move is to treat DSpark as a baseline to beat on your own workload rather than a solved problem. That is the shift open-sourcing the whole training stack actually enables.
Shared on Bluesky by 2 AI experts
Originally reported by github.com
Read the original article →Original headline: GitHub - deepseek-ai/DeepSpec: DeepSpec: a full-stack codebase for training and evaluating speculative decoding algorithms