Sung Kim

A business analyst at heart who enjoys delving into AI, ML, data engineering, data science, data analytics, and modeling. My views are my own. You can also find me at threads: @sung.kim.mw

Articles & links

He finds that neural networks may be biased toward the simplest fit that explains the data OR the neural network with the smallest possible weight norm that fits the data must encode the shortest program that fits the data Neural Weight Norm = Kolmogorov Complexity Paper: arxi…

[2605.10878] Neural Weight Norm = Kolmogorov Complexity arxiv.org
View on Bluesky · ♥ 16 ↻ 2 ↩ 3 · 2 from the directory shared this · 1d ago

MiniMax AI consolidated all of the work behind M2 and published it. Paper: arxiv.org/abs/2605.26494

arxiv.org
View on Bluesky · ♥ 31 ↻ 3 ↩ 1 · 2d ago

TTS (text-to-speech) Model that runs on a CPU (open-weight) Supertonic is a lightweight text-to-speech system for local inference. It runs with ONNX Runtime entirely on your device, with no cloud call required for synthesis. huggingface.co/Supertone/su...

huggingface.co
View on Bluesky · ♥ 39 ↻ 3 ↩ 1 · 13d ago
Sung Kim reposted
@adinayakup.bsky.social

LongCat-Video-Avatar 1.5🐱 an audio driven avatar video generation framework from Meituan huggingface.co/meituan-long... ✨ Multi-character + multi-audio support ✨ Drive video from audio alone or audio + image + text ✨ 8-step inference ✨ Whisper-Large powered lip sync ✨ MIT license

huggingface.co View on Bluesky →

Microsoft's Fara-7B: An Efficient Agentic Model for Computer Use (open-weight) This model supposedly works very well with browserOS. Blog: www.microsoft.com/en-us/resear... Model: huggingface.co/microsoft/Fa...

microsoft.com
View on Bluesky · ♥ 18 ↻ 3 ↩ 2 · 13d ago

Microsoft's Fara-7B: An Efficient Agentic Model for Computer Use (open-weight) This model supposedly works very well with browserOS. Blog: www.microsoft.com/en-us/resear... Model: huggingface.co/microsoft/Fa...

huggingface.co
View on Bluesky · ♥ 18 ↻ 3 ↩ 2 · 13d ago
Sung Kim reposted
Daniël de Kok @danieldk.eu

We released kernels 0.15.1, packed with new features, including: * Torch stable ABI support * Better manylinux_2_28 support * A skill for making XPU kernels * Better offline support * Docs on IDE support for local development. github.com/huggingface/...

github.com View on Bluesky →

Train Recursive Language Models (RLMs) without sandboxes. Use `training/` folder instead? They trained RLM-Qwen3-30B-A3B-v0.1, using RL on a separate split of environments. RLM repo: github.com/alexzhang13/... RLM-Qwen3-30B-A3B-v0.1: huggingface.co/mit-oasys/rl...

huggingface.co
View on Bluesky · ♥ 7 ↻ 0 ↩ 0 · 2d ago

Recent commentary

I call B.S. on this. Anthropic would cut them off before this happens. They want to get paid after all.

View on Bluesky · ♥ 48 ↻ 2 ↩ 7 · 19h ago

It seems Chinese LLMs are having a pricing war.

View on Bluesky · ♥ 39 ↻ 2 ↩ 2 · 5d ago

They find that RL post-training may be sabotaging LLM’s test-time scaling! The methods like GRPO lead to entropy collapse, where models lose output diversity. At the same time, inference scaling methods like AlphaEvolve & Best-of-N with task-specific rewards only work if the model outputs

View on Bluesky · ♥ 24 ↻ 2 ↩ 5 · 7d ago

Microsoft's Webwright: Playwright for AI Agent CLI? Webwright gives LLM a terminal where it can launch multiple browser sessions to inspect the page and complete a web task. It captures and inspects page screenshots/states only when needed.

View on Bluesky · ♥ 24 ↻ 4 ↩ 2 · 3d ago

Generative Recursive Reasoning GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.

View on Bluesky · ♥ 22 ↻ 5 ↩ 1 · 8d ago

@karpathy.bsky.social has joined Anthropic.

View on Bluesky · ♥ 29 ↻ 1 ↩ 1 · 10d ago

It seems the path to AGI is more CPUs and fewer GPUs.

View on Bluesky · ♥ 23 ↻ 0 ↩ 3 · 2d ago

@cloudflare.social are services are inexpensive and good (I think), but it is so confusing to use. I tried to setup their email services, which apparently requires a worker. It took me and AI coding agent 1 hour to setup. This is borderline unusable.

View on Bluesky · ♥ 13 ↻ 0 ↩ 8 · 14d ago

Instead of a bigger KV Cache, turn them into fast weight in its state-space model (SSM) blocks through a learned local rule. They show that by turning them into fast weights, models improves performance, with the largest gains on examples that require deeper reasoning. "Language Models Need Sleep"

View on Bluesky · ♥ 20 ↻ 1 ↩ 3 · 3d ago

Big Tech layoffs every few months are not about AI replacing workers. They’re because AI CapEx eating into free cash flow. When companies spend that aggressively on data centers, GPUs, and infrastructure, they have to cut fixed OpEx - e.g., employee salaries.

View on Bluesky · ♥ 23 ↻ 2 ↩ 0 · 13d ago