Recent advances in large language models (LLMs) demonstrate substantial capabilities in natural language understanding and generation tasks. With the growing number of LLMs, how to harness the collective expertise of multiple LLMs is an exciting open direction. Toward this goal, we propose a new approach that leverages the collective strengths of multiple LLMs through a Mixture-of-Agents (MoA) methodology. In our approach, we construct a layered MoA architecture wherein each layer comprises multiple LLM agents. Each agent takes all the outputs from agents in the previous layer as auxiliary information in generating its response. MoA models achieves state-of-art performance on AlpacaEval 2.0, MT-Bench and FLASK, surpassing GPT-4 Omni. For example, our MoA using only open-source LLMs is the leader of AlpacaEval 2.0 by a substantial gap, achieving a score of 65.1% compared to 57.5% by GPT-4 Omni. — Read More
Monthly Archives: June 2024
Safe Superintelligence Inc. launches: Here’s what it means
Three well-known generative AI pioneers have formed Safe Superintelligence Inc., a startup that will focus on safe superintelligence (SSI).
In a post, former OpenAI leaders Ilya Sutskever and Daniel Levy and Daniel Gross, a former Y Combinator partner, announced the company’s role and mission. Sutskever was OpenAI’s chief scientist and Levy was an OpenAI engineer
Here’s the Safe Superintelligence Inc. mission in a nutshell. The three founders wrote:
“SSI is our mission, our name, and our entire product roadmap, because it is our sole focus. Our team, investors, and business model are all aligned to achieve SSI. — Read More
More New Open Models
A trio of powerful open and semi-open models give developers new options for both text and image generation. Nvidia and Alibaba released high-performance large language models (LLMs), while Stability AI released a slimmed-down version of its flagship text-to-image generator.
… Nvidia offers the Nemotron-4 340B family of language models, which includes a 340-billion parameter base model as well as versions fine-tuned to follow instructions and to serve as a reward model in reinforcement learning from human feedback. …. Alibaba introduced the Qwen2 family of language models. Qwen2 includes base and instruction-tuned versions of five models that range in size from 500 million to 72 billion parameters and process context lengths between 32,000 and 128,000 tokens. …. Stability AI launched the Stable Diffusion 3 Medium text-to-image generator, a 2 billion-parameter based on the technology that underpins Stable Diffusion 3. — Read More
OpenDevin, an autonomous AI software engineer
How Meta trains large language models at scale
As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs).
Traditionally, our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs. This was the case for our recommendation models (e.g., our feed and ranking models) that would ingest vast amounts of information to make accurate recommendations that power most of our products.
With the advent of generative AI (GenAI), we’ve seen a shift towards fewer jobs, but incredibly large ones. Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together. — Read More
What happened when 20 comedians got AI to write their routines
AI is good at lots of things: spotting patterns in data, creating fantastical images, and condensing thousands of words into just a few paragraphs. But can it be a useful tool for writing comedy?
New research suggests that it can, but only to a very limited extent. It’s an intriguing finding that hints at the ways AI can—and cannot—assist with creative endeavors more generally. — Read More
ChatGPT has caused a massive drop in demand for online digital freelancers
Many employees, especially those working in creative fields, are understandably worried by the prospect of AI stealing their jobs – and new research has found it may not be an unfounded fear.
A report from the Imperial College Business School, Harvard Business School, and the German Institute for Economic Research, found the demand for digital freelancers in writing and coding declined by 21% since the launch of ChatGPT in November 2022. — Read More
Read the Paper
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
LLM agents have become increasingly sophisticated, especially in the realm of cybersecurity. Researchers have shown that LLM agents can exploit real-world vulnerabilities when given a description of the vulnerability and toy capture-the-flag problems. However, these agents still perform poorly on real-world vulnerabilities that are unknown to the agent ahead of time (zero-day vulnerabilities).
In this work, we show that teams of LLM agents can exploit real-world, zero-day vulnerabilities. Prior agents struggle with exploring many different vulnerabilities and long-range planning when used alone. To resolve this, we introduce HPTSA, a system of agents with a planning agent that can launch subagents. The planning agent explores the system and determines which subagents to call, resolving long-term planning issues when trying different vulnerabilities. We construct a benchmark of 15 real-world vulnerabilities and show that our team of agents improve over prior work by up to 4.5x. — Read More
Turkish student using AI software to cheat on a university exam arrested
A Turkish student who used AI software, a camera disguised as a button, and a hidden router, to cheat on a university exam has been detained.
The student was spotted behaving in a suspicious way during the TYT exam on June 8 and was detained by police, before being formally arrested and sent to jail pending trial. — Read More
Can LLMs invent better ways to train LLMs?
Earlier this year, Sakana AI started leveraging evolutionary algorithms to develop better ways to train foundation models like LLMs. In a recent paper, we have also used LLMs to act as better evolutionary algorithms!
Given these surprising results, we began to ask ourselves: Can we also use LLMs to come up with a much better algorithm to train LLMs themselves? We playfully term this self-referential improvement process LLM² (‘LLM-squared’) as an homage to previous fundamental work in meta-learning.
As a significant step towards this goal, we’re excited to release our report, Discovering Preference Optimization Algorithms with and for Large Language Models. — Read More