Ramon Astudillo

Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary. ramon.astudillo.com

Articles & links

👆 A paranoid LLM is ofc worse. This is just tuning a prior belief up or down. I guess you could self distill additional context for the train data e.g. "you know arxiv.org is such and such" or "this is an unknown source" with the hope it generalises (and also injecting some ba…

arXiv.org e-Print archive arxiv.org
View on Bluesky · ♥ 0 ↻ 0 ↩ 0 · 3 from the directory shared this · 2d ago

if you are wondering about Mistral koenvangilst.nl/lab/mistral-...

Notes from the AI Now Summit by Mistral koenvangilst.nl
AI Weekly's analysis
  • Mistral owns data centers directly, backing its full-stack infrastructure claim with physical compute assets beyond model licensing.
  • Domain-specific smaller models outperformed general-purpose alternatives in speed and efficiency across Mistral's enterprise demonstrations.
  • Mistral signed anchor enterprise deals with ASML, BNP Paribas, and Amazon Alexa+ as evidence of European sovereign AI demand.
Read full analysis →
View on Bluesky · ♥ 16 ↻ 1 ↩ 0 · 2 from the directory shared this · 20d ago

Recent commentary

Competing against a local gpt-oss-120b 10 sample ensemble at paper understanding and, man, it's not looking great for humans

View on Bluesky · ♥ 11 ↻ 0 ↩ 0 · 20d ago

You can see how LLMs still lack a lot of implicit context. For example, when reading a document, they are bad at guessing if the document can be trustworthy. They read an arxiv paper with grandiose unsupported claims and they repeat them to you as if it were its own judgment. 👇

View on Bluesky · ♥ 2 ↻ 0 ↩ 3 · 2d ago

There is this new meme out there that is something like "AI costs more than human employees". Seems like totally the wrong take. It costs much less for the things they can do, but you can't run an org w/o human employees (for now). 👇

View on Bluesky · ♥ 1 ↻ 0 ↩ 2 · 6d ago

Now there are three levels of alerts in generative code: errors, warnings and errors and warnings that you pass to the LLM agent and don't bother about.

View on Bluesky · ♥ 3 ↻ 0 ↩ 0 · 7d ago

Got reminded about OpenAI 5 and now I see much more timelines with decent probability mass, that are pretty far from where we are now. We could call them the "no Radford" timelines.

View on Bluesky · ♥ 1 ↻ 0 ↩ 0 · 20d ago

An LLM being bad at an underspecified problem or consuming lots of tokens seems like a signal of benchmaxing

View on Bluesky · ♥ 1 ↻ 0 ↩ 0 · 29d ago

5y ago Demerzel would have felt like a completely wrong portrayal of an AI. Now it somehow feels pretty realistic.

View on Bluesky · ♥ 1 ↻ 0 ↩ 0 · 33d ago

It seems the suspected 5T models from Anthropic and OpenAI are kinda close in cybersec skills?

View on Bluesky · ♥ 1 ↻ 0 ↩ 0 · 35d ago

In Ramon Astudillo's orbit

Center = Ramon Astudillo. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.