Or check out the models & datasets directly via this Collection: huggingface.co/collections/...
Tom Aarsen
Articles & links
Read the full blog post for the model links, results, recipe, and the ~150 line training script. Or just point your Agent at the URL: huggingface.co/blog/ettin-r...
ColBERT Model: huggingface.co/LiquidAI/LFM...
Embedding Model: huggingface.co/LiquidAI/LFM...
Full release notes: github.com/huggingface/... pip install sentence-transformers==5.6.0
I'm very excited about this update, great job to Roman Solomatin who led this work, as well as Kenneth Enevoldsen and Isaac Chung. Now, go check out the updated leaderboard here: huggingface.co/spaces/mteb/...
Check out the blogpost here: huggingface.co/blog/Samoed/... It's a very visual blogpost, it's a joy to read (and/or look) through.
Check out the model directly here: huggingface.co/laion/voicec...
The full training data is also released as `cross-encoder/ettin-reranker-v1-data`: ~143M (query, document, teacher score) triples, kept as 39 named splits. Built from @LightOnIO 's pre-training data plus a re-scored subset of their fine-tuning data. huggingface.co/datasets/cro...
Recent commentary
💧 Liquid AI released 2 multilingual retrieval models, the first bidirectional members of the LFM family. Both 350M params, 11 languages (ar, de, en, es, fr, it, ja, ko, no, pt, sv): - LFM2.5-Embedding-350M (bi-encoder) - LFM2.5-ColBERT-350M (multi-vector, late interaction) 🧵
In Tom Aarsen's orbit
Center = Tom Aarsen. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.