Benchmark Methodology

The benchmark process focuses on real text samples, known ground truth, and normalized comparison of detected values.

Test corpus

The evaluation uses these reference files as ground truth inputs:

sample.txt
text1.txt
text2.txt
text3.txt
text4.txt
text5.txt
text6.txt
text7.txt

text7.txt is the false-positive control file and is expected to produce 0 findings.

Metrics

Each run measures:

Precision
Recall
F1 score

Detected values are compared in normalized form so formatting differences do not distort the result.

Validation approach

Run the detector against each ground truth file.
Normalize predicted and expected values.
Match findings by detector type and normalized value.
Compute precision, recall, and F1.
Verify that text7.txt stays at zero findings.

Reference implementation

The evaluation workflow is based on eval_v024.py, which provides the scoring logic and corpus-driven comparison used for these published results.

Why this methodology matters

This approach rewards detectors that find the right value, not just a superficially similar substring, and it makes false positives visible instead of hiding them in aggregate totals.

Benchmark Methodology

Benchmark Methodology

Test corpus

Metrics

Validation approach

Reference implementation

Why this methodology matters

On this page