FastPII Docs
Getting Started

Introduction

Technical overview of FastPII, its Czech-focused detection model, and current scope.

Introduction

FastPII is a production-grade PII detection SDK for Czech personal and business identifiers. The main entry point is PrivacyGuard from the fastpii package.

Generic detection systems are usually trained for broad international coverage and rely heavily on pattern matching. On Czech identifiers, that leads to poor performance: generic tools such as Presidio, AWS Macie, and Google DLP reach 22.7% on Czech identifier benchmarks, while FastPII exceeds 95% on the same problem class.

The difference is validation. FastPII does not stop at regex matches. It combines checksum validation and semantic rules to reject structurally invalid matches and reduce false positives.

What problem it solves

Czech identifiers such as rodné číslo, IČO, and DIČ are not reliably handled by generic PII tooling. These identifiers use country-specific rules:

  • rodné číslo requires date parsing and checksum validation
  • IČO uses a weighted checksum
  • DIČ depends on Czech tax identifier formats and related validation rules

If detection is based on regex alone, invalid values can still be classified as PII, and valid values may be missed when formatting varies.

How FastPII works

FastPII combines:

  • regex pattern matching for candidate extraction
  • checksum validation such as Mod 11 and weighted sums
  • semantic rules for context-sensitive entities
  • overlap resolution to keep the strongest result when spans collide

This is why the SDK can distinguish between a string that looks like a Czech identifier and one that is actually valid.

Current feature set

FastPII currently registers 11 Czech detectors through PrivacyGuard:

  • rodne_cislo
  • ico
  • dic
  • bank_account
  • postal_code
  • phone
  • email
  • name
  • address
  • date_of_birth
  • vehicle_plate

Core package characteristics:

  • zero core dependencies
  • Python package: fastpii
  • CLI included
  • FastAPI integration available through optional extras
  • LangChain integration available through optional extras
  • MCP integration in the SDK source

Current scope

The current production region is Czech Republic only:

from fastpii import PrivacyGuard

guard = PrivacyGuard(regions=["cz"])

The SDK is extensible through a pattern registry and detector registration model, but the only built-in region at the moment is cz.

GDPR note on rodné číslo

Rodné číslo is not just an identifier. Its structure reveals date of birth, and for standard post-1954 forms it also reveals biological sex through the encoded month offset. That makes it sensitive in GDPR contexts, including Article 9 considerations where sex-related information can be inferred from the value itself.

Comparison

IdentifierFastPIIMicrosoft PresidioAWS MacieGoogle DLP
rodné čísloYes, Czech-specific detection with checksum and metadata extractionNo native Czech supportNo native Czech supportNo native Czech support
IČOYes, weighted checksum validationNo native Czech supportNo native Czech supportNo native Czech support
DIČYes, Czech format and validation rulesNo native Czech supportNo native Czech supportNo native Czech support

Use FastPII when you need deterministic handling of Czech identifiers instead of broad but low-accuracy generic PII coverage.

On this page