GLiNER2-PII: 0.3B open-source PII model outperforms OpenAI's Privacy Filter
2 points
7 hours ago
| 1 comment
| pioneer.ai
| HN
neon_share1
7 hours ago
[-]
Hi HackerNews,

We’re Ash and George from Fastino Labs, and today we’re releasing GLiNER2-PII, an 0.3B parameter open source encoder model for PII detection.

Removing personal identifiable information (PII) from documentation and data sources continues to be a challenge. Since PII can look different depending on the country, context, and document type, it’s difficult for most models to keep up.

GLiNER2-PII overcomes this with a compact 0.3B parameter encoder architecture that is outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants

In addition to supporting zero-shot extraction of unseen entity types, it was also fine-tuned on 42 fine-grained entity types across seven semantic categories:

- API keys, Passwords and Credentials - Person & Identity - Contact & Location - Government & Tax Identifiers - Banking & Payment - Digital Identity - Sensitive Dates

On the SPY benchmark, GLiNER2-PII achieves the highest span-level F1 (0.471) across legal and medical documents, outperforming OpenAI's Privacy Filter and all existing GLiNER PII variants. Notably, it maintains high recall (0.722 legal / 0.681 medical) while preserving competitive precision.

Training data was generated synthetically using our Pioneer Agent framework, producing multilingual annotated examples across document types, locales, and entity distributions.

GLiNER2-PII is part of the GLiNER family of models for named entity recognition, text classification, and structured extraction: (link to gliner page maybe?)

We are happy to release GLiNER2-PII to the open source community under the Apache 2.0 license.

Model weights are available now on Hugging Face.

Model: https://huggingface.co/fastino/gliner2-privacy-filter-PII-mu... Read the blog: https://pioneer.ai/research/gliner2-pii-a-multilingual-model...

reply