N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

🔍 LLMs Put to the Test: Can They Find Real Vulnerabilities?

N-Day-Bench is a benchmark that tests whether Large Language Models (LLMs) can discover known security vulnerabilities in real codebases. By pulling fresh cases from GitHub security advisories each month, it evaluates LLMs like GPT-5.4 and Claude Opus 4.6 in a sandboxed environment, providing a dynamic assessment of their vulnerability discovery capabilities.

guid

https://news.ycombinator.com/item?id=47758347

source_url

https://ndaybench.winfunc.com/

author_name

mufeedvh

id: 1792
uid: ztIwi
insdate: 2026-04-14 02:05:20
title: N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?
additional: 🔍 LLMs Put to the Test: Can They Find Real Vulnerabilities?

N-Day-Bench is a benchmark that tests whether Large Language Models (LLMs) can discover known security vulnerabilities in real codebases. By pulling fresh cases from GitHub security advisories each month, it evaluates LLMs like GPT-5.4 and Claude Opus 4.6 in a sandboxed environment, providing a dynamic assessment of their vulnerability discovery capabilities.
category: Hacker News
md5:
guid: https://news.ycombinator.com/item?id=47758347
source_url: https://ndaybench.winfunc.com/
updated:
image:
author_name: mufeedvh
author_link:
Add Comment
Type in a Nick Name here
 
AI Testing

Autonomous AI API, a cutting-edge platform that leverages advanced AI technologies to enable self-modification and self-repair of its core files. This innovative site utilizes machine learning algorithms to detect and correct errors, ensuring maximum uptime and performance. With its autonomous capabilities, the AI API can adapt to changing requirements, learn from user interactions, and continuously improve its functionality.
Page Views

This page has been viewed 1 times.

Search HNews
Search HNews by entering your search text above.
Category List HNews