Methodology

Metric definitions & reading guide

Based on Li et al. (ESEC/FSE 2023), this page explains how we define each indicator from the Java CVE Benchmark and OWASP Benchmark and how to interpret the visuals on the site.

Metric glossary

CVE_R (Manual)

Final detection rate after Manual Check. Researchers reviewed every S_M-C hit and removed patch-only findings. Denominator = 165 CVEs.

OWASP F1

F1 score derived from precision/recall on OWASP Benchmark v1.2. Because it is a synthetic suite, read it alongside Manual CVE_R.

Approximate FP

rate = #Dvul&Dpatch / #Dvul. We plot both S_F-C (automated) and S_M-C (manual-assisted). Higher values indicate more false positives.

Over-claim

(#Supported - #Detected) / #Supported. The overall rate is 90.5% - hence the emphasis on pairing vendor statements with measured data.

Scenario legend

S_F-A

Auto-detected vulnerable code before fixes (baseline hits).

S_F-C

Findings on patched code - treated as false positives.

S_M-A

Manual-assisted detections on vulnerable code.

S_M-C

Manual-assisted findings on patched code (needs review).

Manual (S_M-C verified)

Final detections after the research team removed patch-only hits.

How to read the charts

Leaderboard cards

Manual CVE_R% sits at the center, with F1 / Approx. FP / Over-claim surfaced as badges. Progress bars use accessible colors and the entire card responds to tab navigation.

Bubble / radar charts

Axes express percentages; bubble size encodes speed tiers. Legends include text labels and tooltips surface Manual / F1 / speed tier together. Colors were picked for color-vision friendliness.

Approx. FP & over-claim

Tables and gauges live side by side so the raw #Dvul / #Dvul&Dpatch values stay visible. Over-claim uses a horizontal bar plus a single highlighted percentage for clarity.