FastPII Docs
Guides

Data Modules

Country-specific data modules with validation, benchmarking, and structured access.

Data Modules

FastPII v0.5.0 introduces a modular data infrastructure for country-specific datasets. Each country provides structured data through CountryModule with type-safe access, validation, and benchmarking.

CountryModule

Each country can expose data through a CountryModule:

from fastpii.data.registry import CountryRegistry

# Get a country module
cz_module = CountryRegistry.get("cz")

Accessing data

Each CountryModule provides typed data access:

cities_data = cz_module.get_cities()
bank_codes_data = cz_module.get_bank_codes()
postal_codes_data = cz_module.get_postal_codes()
names_data = cz_module.get_names()
insurance_data = cz_module.get_insurance_codes()
streets_data = cz_module.get_streets()
surnames_data = cz_module.get_surnames()

# Get the actual data
cities = cities_data.get_data()         # set[str]
bank_codes = bank_codes_data.get_data() # dict[str, str]
names = names_data.get_data()           # dict[str, set[str]] (male/female)

Data sources

Each data module provides source metadata through get_source():

source = cities_data.get_source()
# DataSource(
#     name='Czech Cities',
#     url='https://...',
#     license='CC BY 4.0',
#     last_updated=datetime(2025, 1, 15, ...),
#     entry_count=5344,
# )

Validation

Validate data integrity across all data types:

results = cz_module.validate_all()
# {'bank_codes': True, 'cities': True, 'postal_codes': True, ...}

Individual validation:

cities_data = cz_module.get_cities()
is_valid = cities_data.validate()  # True/False

Benchmarking

Measure data import times:

times = cz_module.benchmark_import_times()
# {'bank_codes': 2.3, 'cities': 15.7, 'postal_codes': 8.1, ...}

CountryMetadata

metadata = cz_module.get_metadata()
# CountryMetadata(
#     code='CZ',
#     name='Czech Republic',
#     language_codes=('cs',),
#     currency_code='CZK',
# )

Data availability by region

Czech Republic (CZ) - Full data

Data TypeEntry CountDescription
cities5,344Czech city names
postal_codes15,500Czech postal codes (PSČ)
streets26,954Czech street names
bank_codes48Czech bank codes
insurance_codes7Czech health insurance codes
names (male)7,408Czech male first names
names (female)8,287Czech female first names
surnames (male)175Czech male surnames
surnames (female)175Czech female surnames

Poland (PL) - Partial data

Data TypeEntry CountDescription
cities30Polish city names
postal_codes297Polish postal codes (DD-DDD)
streets103Polish street names
bank_codes-Not yet populated
insurance_codes-Not yet populated
names-Not yet populated
surnames-Not yet populated

Germany (DE) - Partial data

Data TypeEntry CountDescription
cities63German city names
postal_codes189German postal codes (PLZ)
streets144German street names
bank_codes-Not yet populated
insurance_codes-Not yet populated
names-Not yet populated
surnames-Not yet populated

France (FR) - Partial data

Data TypeEntry CountDescription
cities79French city names
postal_codes100French postal codes (code postal)
streets104French street names
bank_codes-Not yet populated
insurance_codes-Not yet populated
names-Not yet populated
surnames-Not yet populated

CountryData types

The return type varies by data module:

MethodReturn TypeDescription
get_cities()CountryData[set[str]]Set of city names
get_postal_codes()CountryData[set[str]]Set of postal codes
get_streets()CountryData[set[str]]Set of street names
get_bank_codes()CountryData[dict[str, str]]Code to bank name mapping
get_insurance_codes()CountryData[dict[str, str]]Code to insurer name mapping
get_names()CountryData[dict[str, set[str]]]Male/female name sets
get_surnames()CountryData[dict[str, set[str]]]Male/female surname sets

CountryRegistry API

from fastpii.data.registry import CountryRegistry

# Register a module
CountryRegistry.register("cz", CzechModule)

# Get a module instance
module = CountryRegistry.get("cz")

# List all registered countries
codes = CountryRegistry.list_countries()

# Get all modules
all_modules = CountryRegistry.get_all()

# Get metadata for a country
metadata = CountryRegistry.get_metadata("cz")

# Clear all registrations (for testing)
CountryRegistry.clear()

On this page