Can AI Actually Find Real Security Bugs? Testing the New Wave of AI Models

Since the release of GPT-3.5, I’ve been experimenting with using Large Language Models (LLMs) to find vulnerabilities in source code. Initially, the results were underwhelming. LLMs frequently hallucinated or misidentified issues. However, the advent of “reasoning models” sparked my curiosity. Could these newer models, designed for more complex reasoning tasks, succeed where their predecessors struggled? This post documents my experiment to find out. — Read More

#cyber