Many employees, especially those working in creative fields, are understandably worried by the prospect of AI stealing their jobs – and new research has found it may not be an unfounded fear.
A report from the Imperial College Business School, Harvard Business School, and the German Institute for Economic Research, found the demand for digital freelancers in writing and coding declined by 21% since the launch of ChatGPT in November 2022. — Read More
Read the Paper
Recent Updates Page 107
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities).
In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5x. — Read More
Turkish student using AI software to cheat on a university exam arrested
A Turkish student who used AI software, a camera disguised as a button, and a hidden router, to cheat on a university exam has been detained.
The student was spotted behaving in a suspicious way during the TYT exam on June 8 and was detained by police, before being formally arrested and sent to jail pending trial. — Read More
Can LLMs invent better ways to train LLMs?
Earlier this year, Sakana AI started leveraging evolutionary algorithms to develop better ways to train foundation models like LLMs. In a recent paper, we have also used LLMs to act as better evolutionary algorithms!
Given these surprising results, we began to ask ourselves: Can we also use LLMs to come up with a much better algorithm to train LLMs themselves? We playfully term this self-referential improvement process LLM² (‘LLM-squared’) as an homage to previous fundamental work in meta-learning.
As a significant step towards this goal, we’re excited to release our report, Discovering Preference Optimization Algorithms with and for Large Language Models. — Read More
Kling, the AI video generator rival to Sora that’s wowing creators
If you follow any AI influencers or creators on social media, there’s a good chance you may have seen them more excited than usual lately about a new AI video generation model called “Kling.”
The videos it generates from pure text prompts and some configurable, in-app buttons and settings, look incredibly realistic, on par with OpenAI’s still non-public, invitation only, closed beta AI model Sora, which it has shared with a small group of artists and filmmakers as it tests it and its adversarial (read: risky, objectionable) uses.
[W]here did Kling come from? What does it offer? And how can you get your hands on it? Read on to find out. — Read More
Andrew Ng: A Look At AI Agentic Workflows And Their Potential For Driving AI Progress
Fake beauty queens charm judges at the Miss AI pageant
Beauty pageant contestants have always been judged by their looks, and, in recent decades, by their do-gooderly deeds and winning personalities.
Still, one thing that’s remained consistent throughout beauty pageant history is that you had to be a human to enter.
But now that’s changing.
Models created using generative artificial intelligence (AI) are competing in the inaugural “Miss AI” pageant this month. — Read More
Using AI for Political Polling
Public polling is a critical function of modern political campaigns and movements, but it isn’t what it once was. Recent US election cycles have produced copious postmortems explaining both the successes and the flaws of public polling. There are two main reasons polling fails.
First, nonresponse has skyrocketed. It’s radically harder to reach people than it used to be. Few people fill out surveys that come in the mail anymore. Few people answer their phone when a stranger calls. Pew Research reported that 36% of the people they called in 1997 would talk to them, but only 6% by 2018. Pollsters worldwide have faced similar challenges.
Second, people don’t always tell pollsters what they really think. Some hide their true thoughts because they are embarrassed about them. Others behave as a partisan, telling the pollster what they think their party wants them to say—or what they know the other party doesn’t want to hear.
Despite these frailties, obsessive interest in polling nonetheless consumes our politics. Headlines more likely tout the latest changes in polling numbers than the policy issues at stake in the campaign. This is a tragedy for a democracy. We should treat elections like choices that have consequences for our lives and well-being, not contests to decide who gets which cushy job. — Read More
Towards Conversational Diagnostic AI
At the heart of medicine lies the physician-patient dialogue, where skillful history-taking paves the way for accurate diagnosis, effective management, and enduring trust. Artificial Intelligence (AI) systems capable of diagnostic dialogue could increase accessibility, consistency, and quality of care. However, approximating clinicians’ expertise is an outstanding grand challenge. Here, we introduce AMIE (Articulate Medical Intelligence Explorer), a Large Language Model (LLM) based AI system optimized for diagnostic dialogue.
AMIE uses a novel self-play based simulated environment with automated feedback mechanisms for scaling learning across diverse disease conditions, specialties, and contexts. We designed a framework for evaluating clinically-meaningful axes of performance including history-taking, diagnostic accuracy, management reasoning, communication skills, and empathy. We compared AMIE’s performance to that of primary care physicians (PCPs) in a randomized, double-blind crossover study of text-based consultations with validated patient actors in the style of an Objective Structured Clinical Examination (OSCE). The study included 149 case scenarios from clinical providers in Canada, the UK, and India, 20 PCPs for comparison with AMIE, and evaluations by specialist physicians and patient actors. AMIE demonstrated greater diagnostic accuracy and superior performance on 28 of 32 axes according to specialist physicians and 24 of 26 axes according to patient actors. Our research has several limitations and should be interpreted with appropriate caution. Clinicians were limited to unfamiliar synchronous text-chat which permits large-scale LLM-patient interactions but is not representative of usual clinical practice. While further research is required before AMIE could be translated to real-world settings, the results represent a milestone towards conversational diagnostic AI. — Read More