reddit.com via Reddit

r/LocalLLaMA: AdelicLLama-3.1-8B Claims 99.7% KV Cache Compression at 100K Tokens — 13 GB Shrinks to 33 MB, Methodology Unverified

open source inference edge ai kv-cache local-llm inference-optimization

Summary

A developer posted AdelicLLama-3.1-8B-Instruct to r/LocalLLaMA, claiming dramatic KV cache compression: 262 MB to 33 MB at 2,000 tokens (87.2% reduction) and 13.1 GB to 33 MB at 100,000 tokens (99.7% reduction). The poster self-describes as a 'total newbie' and acknowledges sparse Hugging Face documentation. No formal paper or methodology has been provided; the community should treat these numbers as unverified until independent replication is established.