Privacy filters dominate Hugging Face's trending NER charts
TL;DR
- A 1B model named privacy-filter, published under the openai account, shows 288k downloads and 1.68k likes on the Hugging Face token classification trending list.
- fastino's gliner2-privacy-filter-PII-multi, a 0.3B multilingual PII detector, sits high in trending with 41.9k downloads about ten days after its update.
- Hugging Face lists 28,597 token classification models in total, with dslim's bert-base-NER still leading by lifetime pulls at 1.88M downloads.
The Hugging Face trending board for token classification, the NLP task that labels individual tokens in text for things like named entity recognition and PII detection, has quietly become a privacy fight. Several of the top slots on the token-classification trending sort are explicitly built for stripping personally identifiable information out of text before it reaches an LLM prompt or a log file.
The most eye-catching entry is a 1B model called privacy-filter, published under the openai account on the Hub, which shows 288k downloads and 1.68k likes against an April 22 update. Just below it sits fastino's gliner2-privacy-filter-PII-multi, a 0.3B multilingual PII detector at 41.9k downloads and 54 likes, listed as updated ten days before the snapshot. Two of the most visible entries on the board are, in effect, redaction tools.
The rest of the top of the list reads as a snapshot of the working NLP stack. dslim's bert-base-NER, refreshed October 8, 2024, still pulls 1.88M downloads and 721 likes. urchade's gliner_multi-v2.1 sits at 0.3B parameters and 21k downloads. hantian's layoutreader, a 0.4B document-layout model, shows 538k downloads. Qwen's own Qwen3-ForcedAligner-0.6B-hf appears with 1.27k downloads six days after publication, and nationaldesignstudio's brand-new rampart is riding recency at 790 downloads and 100 likes after two days.
The honest caveat is that Hugging Face's trending ranking blends recency, downloads and likes, so a two-day-old upload can sit alongside a checkpoint pulling nearly two million lifetime pulls, and the listing itself does not disclose training data, licenses or evaluation numbers. Take the ordering as a demand signal, not a quality ranking. What the listing doesn't give you is any way to compare these classifiers on recall for real PII across languages.
The direction is what to notice if you're building on top of an LLM. Small, purpose-built classifiers for PII scrubbing and structured extraction are outdrawing exotic architectures on the board, which suggests the practical bottleneck for people shipping features is preprocessing, not base-model capability. That's a useful hint about where the cheap wins are this quarter.
Shared on Bluesky by 1 AI expert
Originally reported by huggingface.co
Read the original article →Original headline: Token Classification Models – Hugging Face