"LLM hallucinations in the wild: Large-scale evidence from non-existent citations" Paper: arxiv.org/abs/2605.07723
Sung Kim
Articles & links
Paper: arxiv.org/abs/2605.26099
Blog: huggingface.co/blog/ettin-r... Models and Datasets: huggingface.co/collections/...
Blog: huggingface.co/blog/ettin-r... Models and Datasets: huggingface.co/collections/...
He finds that neural networks may be biased toward the simplest fit that explains the data OR the neural network with the smallest possible weight norm that fits the data must encode the shortest program that fits the data Neural Weight Norm = Kolmogorov Complexity Paper: arxi…
MiniMax AI consolidated all of the work behind M2 and published it. Paper: arxiv.org/abs/2605.26494
TTS (text-to-speech) Model that runs on a CPU (open-weight) Supertonic is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis. huggingface.co/Supertone/su...
Microsoft's Fara-7B: An Efficient Agentic Model for Computer Use (open-weight) This model supposedly works very well with browserOS. Blog: www.microsoft.com/en-us/resear... Model: huggingface.co/microsoft/Fa...
Microsoft's Fara-7B: An Efficient Agentic Model for Computer Use (open-weight) This model supposedly works very well with browserOS. Blog: www.microsoft.com/en-us/resear... Model: huggingface.co/microsoft/Fa...
Train Recursive Language Models (RLMs) without sandboxes. Use `training/` folder instead? They trained RLM-Qwen3-30B-A3B-v0.1, using RL on a separate split of environments. RLM repo: github.com/alexzhang13/... RLM-Qwen3-30B-A3B-v0.1: huggingface.co/mit-oasys/rl...
Recent commentary
I call B.S. on this. Anthropic would cut them off before this happens. They want to get paid after all.
It seems Chinese LLMs are having a pricing war.
They find that RL post-training may be sabotaging LLM’s test-time scaling! The methods like GRPO lead to entropy collapse, where models lose output diversity. At the same time, inference scaling methods like AlphaEvolve & Best-of-N with task-specific rewards only work if the model outputs
Microsoft's Webwright: Playwright for AI Agent CLI? Webwright gives LLM a terminal where it can launch multiple browser sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed.
Generative Recursive Reasoning GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.
@karpathy.bsky.social has joined Anthropic.
It seems the path to AGI is more CPUs and fewer GPUs.
@cloudflare.social are services are inexpensive and good (I think), but it is so confusing to use. I tried to setup their email services, which apparently requires a worker. It took me and AI coding agent 1 hour to setup. This is borderline unusable.
Instead of a bigger KV Cache, turn them into fast weight in its state-space model (SSM) blocks through a learned local rule. They show that by turning them into fast weights, models improves performance, with the largest gains on examples that require deeper reasoning. "Language Models Need Sleep"
Big Tech layoffs every few months are not about AI replacing workers. They’re because AI CapEx eating into free cash flow. When companies spend that aggressively on data centers, GPUs, and infrastructure, they have to cut fixed OpEx - e.g., employee salaries.