SAST Benchmark Desk

Back to Leaderboard/Semgrep

OSSJavaLast updated: 2023-12-01

Manual CVE_R

Based on S_M-C after manual review

5.5%

9 / 165 CVEs

OWASP Benchmark F1

29.7%

OWASP Benchmark v1.2 (Java)

Composite score

Manual 60 / F1 30 / (1-FP)10

17.8

Approx.FP: 44.4%

Real-world detections by scenario

Counts (bar) plus percentages (line) per benchmark scenario.

Approximate false positives

#Dvul and #Dvul&Dpatch for S_F-C / S_M-C.

SF-C

83.9%

#Dvul = 31, #Dvul&Dpatch = 26

Dvul	31
Dvul&Dpatch	26
Rate	83.9%

SM-C

44.4%

#Dvul = 9, #Dvul&Dpatch = 4

Dvul	9
Dvul&Dpatch	4
Rate	44.4%

Claimed vs measured coverage

Over-claim is computed with the Manual (S_M-C) column.

Over-claim

(#Supported - #Detected) / #Supported

88.6%

Speed notes

Qualitative tier plus study observations.

Tier: slow

Long runs (Semgrep 230-274s, CodeQL queries may reach 24h)

Run time jumps sharply once projects pass ~50k LoC (all tools).
Tagged as a slow-tier tool in the study (Semgrep / CodeQL cohort).
Semgrep is less sensitive to size but still averages 230-274s even <50k LoC; ~267s when >50k.

Manual definition: research teams manually inspected S_M-C detections and removed patch-only hits. Over-claim is evaluated against that Manual baseline.

Back to Compare View datasets overview