Can AI Actually Find Real Security Bugs? Testing the New Wave of AI Models

Since the release of GPT-3.5, I’ve been experimenting with using Large Language Models (LLMs) to find vulnerabilities in source code. Initially, the results were underwhelming. LLMs frequently hallucinated or misidentified issues. However, the advent of “reasoning models” sparked my curiosity. Could these newer models, designed for more complex reasoning tasks, succeed where their predecessors struggled? This post documents my experiment to find out. — Read More

#cyber

Perplexity R1 1776

Today we’re open-sourcing R1 1776, a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information. Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API. — Read More

#china-vs-us

Google’s new AI generates hypotheses for researchers

Over the past few years, Google has embarked on a quest to jam generative AI into every product and initiative possible. Google has robots summarizing search results, interacting with your apps, and analyzing the data on your phone. And sometimes, the output of generative AI systems can be surprisingly good despite lacking any real knowledge. But can they do science?

Google Research is now angling to turn AI into a scientist—well, a “co-scientist.” The company has a new multi-agent AI system based on Gemini 2.0 aimed at biomedical researchers that can supposedly point the way toward new hypotheses and areas of biomedical research. However, Google’s AI co-scientist boils down to a fancy chatbot.

… The AI co-scientist contains multiple interconnected models that churn through the input data and access Internet resources to refine the output. Inside the tool, the different agents challenge each other to create a “self-improving loop,” which is similar to the new raft of reasoning AI models like Gemini Flash Thinking and OpenAI o3.  — Read More

#big7

AI Killed The Tech Interview. Now What?

Absolutely nobody likes the hiring process. Not the managers hiring, not the recruitment people, and certainly not the candidates.

Tech interviews are one of the worst parts of the process and are pretty much universally hated by the people taking them. We’ve all heard stories of people being asked comp sci questions about O(n) efficiency, only to connect APIs with basic middleware in their day job.

AI straight-up kills hackerrank. AI also significantly reduces the effectiveness of comp sci fundamentals and the coding interview as they are today. Architectural interviews are likely safe for a few years yet. As AI gets better, how can we do better interviews. — Read More

#strategy