Metric definitions & reading guide
Based on Li et al. (ESEC/FSE 2023), this page explains how we define each indicator from the Java CVE Benchmark and OWASP Benchmark and how to interpret the visuals on the site.
CVE_R (Manual)
Final detection rate after Manual Check. Researchers reviewed every S_M-C hit and removed patch-only findings. Denominator = 165 CVEs.
OWASP F1
F1 score derived from precision/recall on OWASP Benchmark v1.2. Because it is a synthetic suite, read it alongside Manual CVE_R.
Approximate FP
rate = #Dvul&Dpatch / #Dvul. We plot both S_F-C (automated) and S_M-C (manual-assisted). Higher values indicate more false positives.
Over-claim
(#Supported - #Detected) / #Supported. The overall rate is 90.5% - hence the emphasis on pairing vendor statements with measured data.
S_F-A
Auto-detected vulnerable code before fixes (baseline hits).
S_F-C
Findings on patched code - treated as false positives.
S_M-A
Manual-assisted detections on vulnerable code.
S_M-C
Manual-assisted findings on patched code (needs review).
Manual (S_M-C verified)
Final detections after the research team removed patch-only hits.
Leaderboard cards
Manual CVE_R% sits at the center, with F1 / Approx. FP / Over-claim surfaced as badges. Progress bars use accessible colors and the entire card responds to tab navigation.
Bubble / radar charts
Axes express percentages; bubble size encodes speed tiers. Legends include text labels and tooltips surface Manual / F1 / speed tier together. Colors were picked for color-vision friendliness.
Approx. FP & over-claim
Tables and gauges live side by side so the raw #Dvul / #Dvul&Dpatch values stay visible. Over-claim uses a horizontal bar plus a single highlighted percentage for clarity.