Introduction
Technical overview of FastPII, its Czech-focused detection model, and current scope.
Introduction
FastPII is a production-grade PII detection SDK for Czech personal and business identifiers. The main entry point is PrivacyGuard from the fastpii package.
Generic detection systems are usually trained for broad international coverage and rely heavily on pattern matching. On Czech identifiers, that leads to poor performance: generic tools such as Presidio, AWS Macie, and Google DLP reach 22.7% on Czech identifier benchmarks, while FastPII exceeds 95% on the same problem class.
The difference is validation. FastPII does not stop at regex matches. It combines checksum validation and semantic rules to reject structurally invalid matches and reduce false positives.
What problem it solves
Czech identifiers such as rodné číslo, IČO, and DIČ are not reliably handled by generic PII tooling. These identifiers use country-specific rules:
- rodné číslo requires date parsing and checksum validation
- IČO uses a weighted checksum
- DIČ depends on Czech tax identifier formats and related validation rules
If detection is based on regex alone, invalid values can still be classified as PII, and valid values may be missed when formatting varies.
How FastPII works
FastPII combines:
- regex pattern matching for candidate extraction
- checksum validation such as Mod 11 and weighted sums
- semantic rules for context-sensitive entities
- overlap resolution to keep the strongest result when spans collide
This is why the SDK can distinguish between a string that looks like a Czech identifier and one that is actually valid.
Current feature set
FastPII currently registers 11 Czech detectors through PrivacyGuard:
rodne_cisloicodicbank_accountpostal_codephoneemailnameaddressdate_of_birthvehicle_plate
Core package characteristics:
- zero core dependencies
- Python package:
fastpii - CLI included
- FastAPI integration available through optional extras
- LangChain integration available through optional extras
- MCP integration in the SDK source
Current scope
The current production region is Czech Republic only:
from fastpii import PrivacyGuard
guard = PrivacyGuard(regions=["cz"])The SDK is extensible through a pattern registry and detector registration model, but the only built-in region at the moment is cz.
GDPR note on rodné číslo
Rodné číslo is not just an identifier. Its structure reveals date of birth, and for standard post-1954 forms it also reveals biological sex through the encoded month offset. That makes it sensitive in GDPR contexts, including Article 9 considerations where sex-related information can be inferred from the value itself.
Comparison
| Identifier | FastPII | Microsoft Presidio | AWS Macie | Google DLP |
|---|---|---|---|---|
| rodné číslo | Yes, Czech-specific detection with checksum and metadata extraction | No native Czech support | No native Czech support | No native Czech support |
| IČO | Yes, weighted checksum validation | No native Czech support | No native Czech support | No native Czech support |
| DIČ | Yes, Czech format and validation rules | No native Czech support | No native Czech support | No native Czech support |
Use FastPII when you need deterministic handling of Czech identifiers instead of broad but low-accuracy generic PII coverage.