National Design Studio releases Rampart, on-device PII guard
TL;DR
- Rampart is a 14.7 MB ONNX token-classification model that redacts PII in user-typed text before it leaves the browser.
- On a 30,000-row test set across seven Latin-script languages, the card reports 98.42% private-term recall and 91.69% public retention.
- Reported p50 latency is 3.9 ms on WebGPU and 12.6 ms on WASM, but non-Latin script recall is around 13.7%.
There is a small but consequential category of model release that does not try to be smarter, just smaller and more local. National Design Studio's Rampart, posted to Hugging Face, fits in that category. It is a 14.7 MB token-classification model whose only job is to find personally identifiable information in user-typed text before that text leaves the browser.
The architecture is intentionally modest. Rampart is a fine-tune of nreimers/MiniLM-L6-H384-uncased, around 18.5 million parameters after vocabulary trimming, quantized to 4-bit MatMul with INT8 embeddings. It runs in the browser via ONNX Runtime Web on WASM or WebGPU through transformers.js. The card reports p50 latency of 3.9 ms on WebGPU and 12.6 ms on WASM, fast enough to gate every keystroke without a user noticing.
The numbers the model card claims are worth reading carefully. On a 30,000-row test set covering English, Spanish, French, German, Italian, Portuguese and Dutch, the reported private-term recall is 98.42% and public-term retention is 91.69%. The card is unusually candid about the leak rate, 2,082 of 131,707 private terms slipped through, roughly one in 64. The honest caveat is the rest of the fairness table: non-Latin script recall is around 13.7%, with Han Chinese at 8.8% and Korean at 15.2%. The card also flags that adversarial robustness is out of scope, zero-width characters and prompt injection can defeat it, so this is engineering for good-faith users, not for hostile inputs.
What the reporting does not give you is a head-to-head against the established PII detectors people already wire into their stacks, or production deployment data showing how much PII actually gets blocked in the wild. The model card is the only artifact, and the citation header in it is dated 2026, so take adoption claims as forward-looking, not settled.
The interesting shape of the bet is the form factor. If a 14.7 MB redactor sitting in front of a chat box can intercept most of the PII a casual user types before it ever reaches a hosted model, you get a privacy primitive any frontend team can ship with a single dependency. The piece worth watching is whether the larger LLM frontends start adopting a browser-side layer like this by default, because the alternative, sending everything to a server and trusting the vendor to strip it later, has not aged well.
Shared on Bluesky by 1 AI expert
Originally reported by huggingface.co
Read the original article →Original headline: nationaldesignstudio/rampart · Hugging Face