SAST Benchmark Desk

Back to Leaderboard/CodeQL

OSSJavaLast updated: 2023-10-01

Manual CVE_R

Based on S_M-C after manual review

6.7%

11 / 165 CVEs

OWASP Benchmark F1

79.8%

OWASP Benchmark v1.2 (Java)

Composite score

Manual 60 / F1 30 / (1-FP)10

33.4

Approx.FP: 45.5%

Real-world detections by scenario

Counts (bar) plus percentages (line) per benchmark scenario.

Approximate false positives

#Dvul and #Dvul&Dpatch for S_F-C / S_M-C.

SF-C

60.0%

#Dvul = 15, #Dvul&Dpatch = 9

Dvul	15
Dvul&Dpatch	9
Rate	60.0%

SM-C

45.5%

#Dvul = 11, #Dvul&Dpatch = 5

Dvul	11
Dvul&Dpatch	5
Rate	45.5%

Claimed vs measured coverage

Over-claim is computed with the Manual (S_M-C) column.

Over-claim

(#Supported - #Detected) / #Supported

92.1%

Speed notes

Qualitative tier plus study observations.

Tier: slow

Long runs (Semgrep 230-274s, CodeQL queries may reach 24h)

Run time jumps sharply once projects pass ~50k LoC (all tools).
Tagged as a slow-tier tool in the study (Semgrep / CodeQL cohort).
CodeQL can exhibit extremely long runs per query (e.g., taint path analyses exceeding 24h).

Manual definition: research teams manually inspected S_M-C detections and removed patch-only hits. Over-claim is evaluated against that Manual baseline.

Back to Compare View datasets overview