Our new paper reformulates tokenisation as a linear program (LP), which we solve to get SOTA tokenisers 😁 As a bonus, this LP tells us how close to optimal any tokeniser is! Check it out 👇 w/ J. Tempus, @philipwitti.bsky.social, @craigschmidt.com, D. Komm Paper: arxiv.org/abs/…
Who's Who of AI
Tiago Pimentel
Postdoc at ETH. Formerly, PhD student at the University of Cambridge :)
What they're sharing
[2605.22821] Tokenisation via Convex Relaxations arxiv.org