Benchmark Methodology
How FastPII detector accuracy and false-positive rates were evaluated.
Benchmark Methodology
The benchmark process focuses on real text samples, known ground truth, and normalized comparison of detected values.
Test corpus
The evaluation uses these reference files as ground truth inputs:
sample.txttext1.txttext2.txttext3.txttext4.txttext5.txttext6.txttext7.txt
text7.txt is the false-positive control file and is expected to produce 0 findings.
Metrics
Each run measures:
- Precision
- Recall
- F1 score
Detected values are compared in normalized form so formatting differences do not distort the result.
Validation approach
- Run the detector against each ground truth file.
- Normalize predicted and expected values.
- Match findings by detector type and normalized value.
- Compute precision, recall, and F1.
- Verify that
text7.txtstays at zero findings.
Reference implementation
The evaluation workflow is based on eval_v024.py, which provides the scoring logic and corpus-driven comparison used for these published results.
Why this methodology matters
This approach rewards detectors that find the right value, not just a superficially similar substring, and it makes false positives visible instead of hiding them in aggregate totals.