↻
Sebastian Raschka reposted
@fry69.dev
A new, highly recommended article from @rasbt.bsky.social: "Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention" -> magazine.sebastianraschka.com/p/recent-dev... #MLsky
AI Weekly's analysis
→
- MLA, deployed in DeepSeek V3/V4 and GLM-5, compresses KV cache by projecting keys and values into a shared low-rank latent space.
- The mHC modification targets the residual connection path in transformer blocks, not the attention heads themselves.
- Compressed Sparse Attention limits which token pairs interact, reducing quadratic memory growth at long context lengths.
Read full analysis →
View on Bluesky →