Tom Aarsen

132 trust practitioner @tomaarsen.com · 2,631 followers
Sentence Transformers, SetFit & NLTK maintainer Machine Learning Engineer at 🤗 Hugging Face

Articles & links

The full training data is also released as `cross-encoder/ettin-reranker-v1-data`: ~143M (query, document, teacher score) triples, kept as 39 named splits. Built from @LightOnIO 's pre-training data plus a re-scored subset of their fine-tuning data. huggingface.co/datasets/cro...

huggingface.co
View on Bluesky · ♥ 0 ↻ 0 ↩ 1 · 30d ago

Recent commentary

💧 Liquid AI released 2 multilingual retrieval models, the first bidirectional members of the LFM family. Both 350M params, 11 languages (ar, de, en, es, fr, it, ja, ko, no, pt, sv): - LFM2.5-Embedding-350M (bi-encoder) - LFM2.5-ColBERT-350M (multi-vector, late interaction) 🧵

View on Bluesky · ♥ 1 ↻ 1 ↩ 1 · 8h ago

In Tom Aarsen's orbit

Center = Tom Aarsen. Left = members they follow (green edges). Right = members who follow them (blue edges). Top = mutual follows (orange edges, slightly larger). Drag any node to reposition; click to open that profile.