r/LocalLLaMA: AdelicLLama-3.1-8B Claims 99.7% KV Cache Compression at 100K Tokens — 13 GB Shrinks to 33 MB, Methodology Unverified
Summary
A developer posted AdelicLLama-3.1-8B-Instruct to r/LocalLLaMA, claiming dramatic KV cache compression: 262 MB to 33 MB at 2,000 tokens (87.2% reduction) and 13.1 GB to 33 MB at 100,000 tokens (99.7% reduction). The poster self-describes as a 'total newbie' and acknowledges sparse Hugging Face documentation. No formal paper or methodology has been provided; the community should treat these numbers as unverified until independent replication is established.
Originally reported by reddit.com
Read the original article →Original headline: r/LocalLLaMA: AdelicLLama-3.1-8B Claims 99.7% KV Cache Compression at 100K Tokens — 13 GB Shrinks to 33 MB, Methodology Unverified