Guides
Data Modules
Country-specific data modules with validation, benchmarking, and structured access.
Data Modules
FastPII v0.5.0 introduces a modular data infrastructure for country-specific datasets. Each country provides structured data through CountryModule with type-safe access, validation, and benchmarking.
CountryModule
Each country can expose data through a CountryModule:
from fastpii.data.registry import CountryRegistry
# Get a country module
cz_module = CountryRegistry.get("cz")Accessing data
Each CountryModule provides typed data access:
cities_data = cz_module.get_cities()
bank_codes_data = cz_module.get_bank_codes()
postal_codes_data = cz_module.get_postal_codes()
names_data = cz_module.get_names()
insurance_data = cz_module.get_insurance_codes()
streets_data = cz_module.get_streets()
surnames_data = cz_module.get_surnames()
# Get the actual data
cities = cities_data.get_data() # set[str]
bank_codes = bank_codes_data.get_data() # dict[str, str]
names = names_data.get_data() # dict[str, set[str]] (male/female)Data sources
Each data module provides source metadata through get_source():
source = cities_data.get_source()
# DataSource(
# name='Czech Cities',
# url='https://...',
# license='CC BY 4.0',
# last_updated=datetime(2025, 1, 15, ...),
# entry_count=5344,
# )Validation
Validate data integrity across all data types:
results = cz_module.validate_all()
# {'bank_codes': True, 'cities': True, 'postal_codes': True, ...}Individual validation:
cities_data = cz_module.get_cities()
is_valid = cities_data.validate() # True/FalseBenchmarking
Measure data import times:
times = cz_module.benchmark_import_times()
# {'bank_codes': 2.3, 'cities': 15.7, 'postal_codes': 8.1, ...}CountryMetadata
metadata = cz_module.get_metadata()
# CountryMetadata(
# code='CZ',
# name='Czech Republic',
# language_codes=('cs',),
# currency_code='CZK',
# )Data availability by region
Czech Republic (CZ) - Full data
| Data Type | Entry Count | Description |
|---|---|---|
| cities | 5,344 | Czech city names |
| postal_codes | 15,500 | Czech postal codes (PSČ) |
| streets | 26,954 | Czech street names |
| bank_codes | 48 | Czech bank codes |
| insurance_codes | 7 | Czech health insurance codes |
| names (male) | 7,408 | Czech male first names |
| names (female) | 8,287 | Czech female first names |
| surnames (male) | 175 | Czech male surnames |
| surnames (female) | 175 | Czech female surnames |
Poland (PL) - Partial data
| Data Type | Entry Count | Description |
|---|---|---|
| cities | 30 | Polish city names |
| postal_codes | 297 | Polish postal codes (DD-DDD) |
| streets | 103 | Polish street names |
| bank_codes | - | Not yet populated |
| insurance_codes | - | Not yet populated |
| names | - | Not yet populated |
| surnames | - | Not yet populated |
Germany (DE) - Partial data
| Data Type | Entry Count | Description |
|---|---|---|
| cities | 63 | German city names |
| postal_codes | 189 | German postal codes (PLZ) |
| streets | 144 | German street names |
| bank_codes | - | Not yet populated |
| insurance_codes | - | Not yet populated |
| names | - | Not yet populated |
| surnames | - | Not yet populated |
France (FR) - Partial data
| Data Type | Entry Count | Description |
|---|---|---|
| cities | 79 | French city names |
| postal_codes | 100 | French postal codes (code postal) |
| streets | 104 | French street names |
| bank_codes | - | Not yet populated |
| insurance_codes | - | Not yet populated |
| names | - | Not yet populated |
| surnames | - | Not yet populated |
CountryData types
The return type varies by data module:
| Method | Return Type | Description |
|---|---|---|
get_cities() | CountryData[set[str]] | Set of city names |
get_postal_codes() | CountryData[set[str]] | Set of postal codes |
get_streets() | CountryData[set[str]] | Set of street names |
get_bank_codes() | CountryData[dict[str, str]] | Code to bank name mapping |
get_insurance_codes() | CountryData[dict[str, str]] | Code to insurer name mapping |
get_names() | CountryData[dict[str, set[str]]] | Male/female name sets |
get_surnames() | CountryData[dict[str, set[str]]] | Male/female surname sets |
CountryRegistry API
from fastpii.data.registry import CountryRegistry
# Register a module
CountryRegistry.register("cz", CzechModule)
# Get a module instance
module = CountryRegistry.get("cz")
# List all registered countries
codes = CountryRegistry.list_countries()
# Get all modules
all_modules = CountryRegistry.get_all()
# Get metadata for a country
metadata = CountryRegistry.get_metadata("cz")
# Clear all registrations (for testing)
CountryRegistry.clear()