RT @sundeep: https://t.co/6TzHB4ujWb
Who's Who of AI
Clement Delangue
What they're sharing
nvidia/GLM-5.2-NVFP4 · Hugging Face huggingface.co
Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine huggingface.co
Articles & links
Kog open-sourced on @huggingface the 2B model that they used to show a model running at 3,000+ tokens per second. Very cool work! https://t.co/fjCnAwQoWe https://t.co/k8hD7xW0F7
AI Weekly's analysis
→
- Kog released Laneformer 2B, a 2.3B-parameter instruction-tuned coding model built around decoding speed rather than benchmark score.
- The team reports 3,000 output tokens/s on 8× AMD MI300X and 2,100 on 8× NVIDIA H200 at FP16, batch size 1.
- Laneformer 2B scores 45.1% on HumanEval+ and 51.6% on MBPP+ in greedy decoding, with sliding-window attention on 10 of 15 layers.
Read full analysis →