Who's Who of AI

Tiago Pimentel

278 trust researcher @tpimentel.bsky.social · 2,233 followers

AI research

Why they matter

Researcher with public evidence across AI research.

AI signals: 2
Sources: 2
Discussions: 0
Latest signal: 17d ago

View every signal from Tiago Pimentel →

Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)

What they're sharing

Articles & links

Our new paper reformulates tokenisation as a linear program (LP), which we solve to get SOTA tokenisers 😁 As a bonus, this LP tells us how close to optimal any tokeniser is! Check it out 👇 w/ J. Tempus, @philipwitti.bsky.social, @craigschmidt.com, D. Komm Paper: arxiv.org/abs/…

[2605.22821] Tokenisation via Convex Relaxations arxiv.org

View on Bluesky · ♥ 43 ↻ 9 ↩ 1 · 3 from the directory shared this · 67d ago

↻ Tiago Pimentel reposted

Leshem (Legend) Choshen @EMNLP @lchoshen.bsky.social

Effective language identification based on a tokenizer UnigramLM tokenizer already gives probabilities, testing those to identify a language is fast and effective. Whiceh leads me to wonder, can we identify language during training and affect behavior? arxiv.org/abs/2602.17655…

What Language is This? Ask Your Tokenizer arxiv.org

AI Weekly's analysis →

UniLID reuses the UnigramLM tokenization algorithm to predict a string's language by asking under which language's unigram distribution the string is most likely.
The method reaches roughly 70% accuracy with as few as five labeled samples per language in low-resource settings.
It supports incremental addition of new languages without retraining and integrates into existing language model tokenization pipelines.

Read full analysis →

View on Bluesky →

↻ Tiago Pimentel reposted

@maximemeloux.bsky.social

I'm very happy to give a spotlight at the Mechanistic Interpretability Workshop @ ICML on our work: "Validating Causal Abstraction Metrics on Simulated Complex Systems" Which metrics actually tell you if an explanation is valid? We built a benchmark to find out. 1/n

Validating Causal Abstraction Metrics on Simulated Complex Systems melouxm.github.io View on Bluesky →

Their network

In Tiago Pimentel's orbit

Center = Tiago Pimentel. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.