Who's Who of AI

Gus

168 trust practitioner @gusthema.bsky.social · 5,386 followers

Models & releases

Why they matter

Practitioner with public evidence across Models & releases.

AI signals: 6
Sources: 3
Discussions: 16
Latest signal: 13d ago

View every signal from Gus →

Gemma Product Manager @google DeepMind - Gemma 💎 - Google AI ⚙️🧠

What they're sharing

Articles & links

What would happen if we tried the diffusion generation on LLMs? We get Diffusion Gemma! 4x speed up! ⚡⚡⚡💎 blog.google/innovation-a...

DiffusionGemma: 4x faster text generation blog.google

AI Weekly's analysis →

DiffusionGemma generates 256 tokens per forward pass using bidirectional attention, reaching 1,000+ tokens/sec on a single H100 GPU.
With only 3.8B active parameters during inference and an 18GB VRAM footprint when quantized, it runs on consumer hardware without server-grade resources.
Google recommends DiffusionGemma only for speed-critical workloads like in-line editing and code infilling, not for applications requiring maximum quality.

Read full analysis →

View on Bluesky · ♥ 24 ↻ 3 ↩ 1 · 6 from the directory shared this · 48d ago

Gemma 4 12B is live! 🚀 An encoder-free multimodal model (text/img/audio) for local 16GB laptops. Elite reasoning nearing 26B MoE in half the size, fast, and open (Apache 2.0). This is the main reason I was not posting much!! Glad it is launched!! blog.google/innovation-a...

Introducing Gemma 4 12B: a unified, encoder-free multimodal model blog.google

AI Weekly's analysis →

The 35M-parameter vision embedder replaces 27 vision transformer layers, keeping the full model inside 16GB with complete image and audio understanding.
Audio projects from raw 16 kHz waveforms in 40ms frames directly to the LLM backbone, bypassing any separate ASR encoder used in competing designs.
Single-pass LoRA fine-tuning updates vision, audio, and text weights simultaneously, eliminating the engineering overhead of co-tuning frozen encoders.

Read full analysis →

View on Bluesky · ♥ 111 ↻ 16 ↩ 8 · 2 from the directory shared this · 55d ago

Gemma 4 technical report is out! lots of cool stuff, check it out! arxiv.org/abs/2607.02770

[2607.02770] Gemma 4 Technical Report arxiv.org

AI Weekly's analysis →

Gemma 4 is a new open-weight multimodal family spanning 2.3B to 31B parameters, with both dense and Mixture-of-Experts variants.
The 12B model uses a unified, encoder-free architecture that ingests raw audio and image patches directly.
A thinking mode lets Gemma 4 emit reasoning traces before responding, with claimed leaps on STEM, multimodal and long-context benchmarks.

Read full analysis →

View on Bluesky · ♥ 81 ↻ 10 ↩ 1 · 2 from the directory shared this · 21d ago

This is a great use of Gemma! having an open model running at +1000 tokens per second can enable some pretty cool use cases! The voice assistant is a good one, but I'm sure there are many others! huggingface.co/blog/cerebra...

Hugging Face and Cerebras bring Gemma 4 to real-time voice AI huggingface.co

AI Weekly's analysis →

Hugging Face and Cerebras have shipped an open cascaded speech-to-speech pipeline chaining Nvidia's Parakeet, Google DeepMind's Gemma 4 VLM on Cerebras, and Alibaba's Qwen3TTS.
The pitch focuses on P95 tail latency stability, not median speed, arguing that occasional multi-second stalls are what break conversational voice apps.
Hugging Face says the same pipeline already powers more than 9,000 Reachy Mini robots in the wild, giving the demo a real deployment story.

Read full analysis →

View on Bluesky · ♥ 25 ↻ 3 ↩ 4 · 2 from the directory shared this · 27d ago

great post about AI trends specifically mentioning the Gemma 4 success!! 🤩 www.interconnects.ai/p/some-ideas...

Some ideas for what comes next, May 2026 interconnects.ai

View on Bluesky · ♥ 1 ↻ 0 ↩ 0 · 3 from the directory shared this · 63d ago

👁️ Vision Options: Want to make Gemma see even better? The default vision bucket is 280 for token efficiency. To capture maximum detail (like sharp OCR and 2.51MP resolution), manually bump max_soft_tokens to 1120! Try our new interactive Space to see how it works: huggingface…

Gemma 4 - Vision Token Budget - a Hugging Face Space by google huggingface.co

AI Weekly's analysis →

The Space lets users toggle Gemma 4's per-image budget across five preset sizes: 70, 140, 280, 560, and 1120 tokens.
Gemma 4 launched April 2, 2026 under Apache 2.0 across E2B, E4B, 12B Unified, 26B A4B MoE, and 31B dense sizes.
The 31B model reportedly hits 76.9% on MMMU Pro and 85.6% on MATH-Vision, with native JSON bounding box output.

Read full analysis →

View on Bluesky · ♥ 4 ↻ 2 ↩ 1 · 2 from the directory shared this · 13d ago

I've been playing all day with Hugging Face Chat + Gemma 4 31B (deployed on Cerebras at +1000 tokens per second) and I'm still surprised how amazing it is!! I play with Gemma models basically everyday and this integration + speed still surprised me! take a look: huggingface.co…

google/gemma-4-31B-it - HuggingChat huggingface.co

View on Bluesky · ♥ 7 ↻ 1 ↩ 1 · 26d ago

🫂 A huge shoutout to the community for submitting fixes and finding new ways to make Gemma even better. We couldn't do this without you! ❤️ 🤗 Ready to test the speedup? Download the latest Gemma 4 updates now on Hugging Face: huggingface.co/collections/...

Gemma 4 - a google Collection huggingface.co

View on Bluesky · ♥ 6 ↻ 0 ↩ 0 · 13d ago

This was a pretty good week for Gemma + comunity integrations!!! here is one example: LiveKit: livekit.com/products/inf... conversational assistant using Gemma 31B as the brains. Super fast and great experience. Try their demo on their site!

Gemma 4 31B on LiveKit Inference | A faster, cheaper default for voice livekit.com

View on Bluesky · ♥ 5 ↻ 0 ↩ 0 · 25d ago

Their network

In Gus's orbit

Center = Gus. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.