FastPII Docs
Detectors

Personal Name

Detect Czech personal names with dictionary-backed matching and gender hints from surname patterns.

Purpose

Use the name detector for Czech personal names in running text.

This detector reports accuracy above 90% and combines a Czech name database with first-name plus surname pattern matching.

It also classifies likely gender and can infer marital status from -ová surname endings.

Detector Name

name

Supported Formats

  • FirstName Surname
  • Czech names with diacritics
  • Female surnames ending in -ová

Examples:

  • Jan Novák
  • Jana Nováková

Validation Logic

The detector matches two capitalized Czech name parts and then filters the result through dictionary and confidence rules.

Detection Rules

  1. Match a capitalized first name plus surname pattern.
  2. Ignore heading-like words such as customer, report, or overview.
  3. Classify likely gender from surname and first-name data.
  4. Compute confidence from dictionary membership.
  5. Reject matches with confidence below 0.5.

Gender Logic

FastPII uses this priority order:

  1. Surname ending in ová or ova -> female
  2. Known surname database match
  3. Known first-name database match
  4. Fallback first-name ending heuristic

There is no checksum algorithm for this detector.

Python Examples

Detect a Czech name

from fastpii import PrivacyGuard

guard = PrivacyGuard(regions=["cz"])
result = guard.detect("Jan Novák přijel domu", detector_names=["name"])

for finding in result.findings:
    print(finding.value, finding.metadata)

Validate a name string

from fastpii import PrivacyGuard

guard = PrivacyGuard(regions=["cz"])
result = guard.validate("Jan Novák", "name")

print(result.is_valid)
print(result.metadata)

Expected Metadata

This detector documents these metadata fields:

  • firstname
  • surname
  • gender
  • marital_status

The detector uses -ová as the strong married-female surname signal.

Example Output

{
    "firstname": "Jan",
    "surname": "Novák",
    "gender": "m",
}

Limitations

  • Confidence depends on the built-in Czech name data.
  • Ambiguous capitalized phrases may still be rejected if confidence is too low.
  • marital_status is only added for female surnames ending in ová or ova.

Notes

Verified detection example from the codebase:

guard.detect("Jan Novák přijel domu")

On this page