Shubhendu Trivedi

Interests on bsky: ML research, applied math, and general mathematical and engineering miscellany. Also: Uncertainty, symmetry in ML, reliable deployment; applications in LLMs, computational chemistry/physics, and healthcare. https://shubhendu-trivedi.org

Articles & links

One could read most points here cynically. But could also take them at their word and see what could be done. Given the sort of equilibrium we have been post GPT-2, the sort of pause they are advocating is simply not going to happen. You are talking of powerful international a…

When AI builds itself anthropic.com
View on Bluesky · ♥ 16 ↻ 1 ↩ 2 · 15 from the directory shared this · 13d ago

Diffusion Gemma seems quite cool. Going to look into it during the weekend (so another exercise in harness design). It's funny and nice to see Google releasing open models one after the other, with a focus on the small end (quite a significant part of the enterprise ecosystem)…

DiffusionGemma: 4x faster text generation blog.google
AI Weekly's analysis
  • DiffusionGemma generates 256 tokens per forward pass using bidirectional attention, reaching 1,000+ tokens/sec on a single H100 GPU.
  • With only 3.8B active parameters during inference and an 18GB VRAM footprint when quantized, it runs on consumer hardware without server-grade resources.
  • Google recommends DiffusionGemma only for speed-critical workloads like in-line editing and code infilling, not for applications requiring maximum quality.
Read full analysis →
View on Bluesky · ♥ 1 ↻ 0 ↩ 1 · 6 from the directory shared this · 6d ago

Good article. I don't know and don't care who's Prince. But I like a good Drucker defense. www.programmablemutter.com/p/ai-isnt-ma...

programmablemutter.com
View on Bluesky · ♥ 2 ↻ 0 ↩ 1 · 6 from the directory shared this · 15d ago

But anyway, the Zhipu blogpost is worth reading. z.ai/blog/glm-5.2

z.ai
View on Bluesky · ♥ 0 ↻ 0 ↩ 0 · 5 from the directory shared this · 1d ago

The technical report for the new Microsoft model seems quite nice: microsoft.ai/wp-content/u...

microsoft.ai
View on Bluesky · ♥ 3 ↻ 0 ↩ 0 · 3 from the directory shared this · 14d ago
Shubhendu Trivedi reposted
@gregegansf.bsky.social

“The sum-product conjecture is false for real numbers” THOMAS F. BLOOM, WILL SAWIN, CARL SCHILDKRAUT, AND DMITRII ZHELEZOV A human proof that exploits the same kind of “tower of fields” that was used in the AI-generated counterexample to the unit-distance conjecture!

arxiv.org View on Bluesky →

Was reading this and thinking that it's funny that work around mean field approximations of games (thus also mechanisms) would be quite interesting soon (in the context of agentic marketplaces). In the sense of large-population platform control with strategic / heterogeneous/ …

arxiv.org
View on Bluesky · ♥ 12 ↻ 3 ↩ 3 · 29d ago

Well, the idea seems cool. This uses a coreset idea from approximation theory to "expresss" strong non-causal attention approximation into a causal, streaming one. arxiv.org/abs/2606.10944

Express Language Modeling arxiv.org
View on Bluesky · ♥ 8 ↻ 1 ↩ 0 · 8d ago

Recent commentary

The dot com era mega IPOs have a very different character than the 2026 AI ones. Back then you had many companies that were capital-starved before their IPOs and capital-rich after. The IPO itself was quite often a major financing event. Basically, public markets funded the next stage of growth.

View on Bluesky · ♥ 5 ↻ 2 ↩ 2 · 6d ago

One thing that has become clear to me just very recently from dozens of conversations with folk at all the frontier labs: Many really do believe that once "AGI happens" it will "make everything easier, from robotics, to manufacturing." Supply chain constraints, ecosystem and labour development

View on Bluesky · ♥ 7 ↻ 1 ↩ 1 · 26d ago

It was easy to guess what "this model is too dangerous too release" meant: we don't have enough computational resources to serve. It was easy, in hindsight, to guess what "we are nearing RSI.. pause AI" meant.

View on Bluesky · ♥ 8 ↻ 0 ↩ 0 · 8d ago

Erdős was ahead of his time. He was really focused on creating a dataset for building and testing new AI tools. He should be called the forgotten godfather of AI and put on a TIME cover alongside some other "architects of AI" who don't deserve to be there.

View on Bluesky · ♥ 4 ↻ 1 ↩ 0 · 5d ago

Every once in a few years you get cultural moments that become like crazy psychiatric solvents. But the AI related one seems like it'd be unique in how much it concentrates people's unresolved issues into worldviews. The whole zealotry it brings forth even about insignificant stuff is quite telling.

View on Bluesky · ♥ 4 ↻ 0 ↩ 1 · 25d ago

I am not, and have never been, a fan of Ted Chiang’s sci-fi writings (find it annoying for various reasons), but it is quite funny that people seem to assume that just because he is their favourite sci-fi writer, he should validate any crazy thing they want to believe or believe about AI, and treat

View on Bluesky · ♥ 2 ↻ 0 ↩ 1 · 10d ago

It's really funny to watch agentic coding tools spiraling into madness, as if possessed by a non malicious (but loose cannon) spirit which decides "I should do something. I must act. I should figure this out."

View on Bluesky · ♥ 1 ↻ 0 ↩ 1 · 19d ago

Someone remarked that it's ironical that we give LLMs the "education" we claim to want for children, for students, and for ourselves, but simply don't have the will to demand it of any of them, or of ourselves.

View on Bluesky · ♥ 3 ↻ 0 ↩ 0 · 23d ago

In Shubhendu Trivedi's orbit

Center = Shubhendu Trivedi. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.