t’s an exciting time to build with large language models (LLMs). Over the past year, LLMs have become “good enough” for real-world applications. The pace of improvements in LLMs, coupled with a parade of demos on social media, will fuel an estimated $200B investment in AI by 2025. LLMs are also broadly accessible, allowing everyone, not just ML engineers and scientists, to build intelligence into their products. While the barrier to entry for building AI products has been lowered, creating those effective beyond a demo remains a deceptively difficult endeavor.
We’ve identified some crucial, yet often neglected, lessons and methodologies informed by machine learning that are essential for developing products based on LLMs. … Our goal is to make this a practical guide to building successful products around LLMs, drawing from our own experiences and pointing to examples from around the industry. We’ve spent the past year getting our hands dirty and gaining valuable lessons, often the hard way. While we don’t claim to speak for the entire industry, here we share some advice and lessons for anyone building products with LLMs. — Read More
Monthly Archives: May 2024
Meta says it removed six influence campaigns including those from Israel and China
Meta says it cracked down on propaganda campaigns on its platforms, including one that used AI to influence political discourse and create the illusion of wider support for certain viewpoints, according to its quarterly threat report published today. Some campaigns pushed political narratives about current events, including campaigns coming from Israel and Iran that posted in support of the Israeli government.
The networks used Facebook and Instagram accounts to try to influence political agendas around the world. The campaigns — some of which also originated in Bangladesh, China, and Croatia — used fake accounts to post in support of political movements, promote fake news outlets, or comment on the posts of legitimate news organizations. — Read More
In a first, OpenAI removes influence operations tied to Russia, China and Israel
Online influence operations based in Russia, China, Iran, and Israel are using artificial intelligence in their efforts to manipulate the public, according to a new report from OpenAI.
Bad actors have used OpenAI’s tools, which include ChatGPT, to generate social media comments in multiple languages, make up names and bios for fake accounts, create cartoons and other images, and debug code.
OpenAI’s report is the first of its kind from the company, which has swiftly become one of the leading players in AI. ChatGPT has gained more than 100 million users since its public launch in November 2022. — Read More
Scale AI publishes its first LLM Leaderboards, ranking AI model performance in specific domains
Artificial intelligence training data provider Scale AI Inc., which serves the likes of OpenAI and Nvidia Corp., today published the results of its first-ever SEAL Leaderboards.
It’s a new ranking system for frontier large language models based on private, curated and unexploitable datasets that attempts to rate their capabilities in common use cases, such as generative AI coding, instruction following, math and multilinguality.
The SEAL Leaderboards show that OpenAI’s GPT family of LLMs ranks first in three of the four initial domains it’s using to rank AI models, with Anthropic PBC’s popular Claude 3 Opus grabbing first place in the fourth category. Google LLC’s Gemini models also did well, ranking joint-first with the GPT models in a couple of the domains. — Read More
Evaluating Large Language Model (LLM) systems: Metrics, challenges, and best practices
In the ever-evolving landscape of Artificial Intelligence (AI), the development and deployment of Large Language Models (LLMs) have become pivotal in shaping intelligent applications across various domains. However, realizing this potential requires a rigorous and systematic evaluation process. Before delving into the metrics and challenges associated with evaluating LLM systems, let’s pause for a moment to consider the current approach to evaluation. Does your evaluation process resemble the repetitive loop of running LLM applications on a list of prompts, manually inspecting outputs, and attempting to gauge quality based on each input? If so, it’s time to recognize that evaluation is not a one-time endeavor but a multi-step, iterative process that has a significant impact on the performance and longevity of your LLM application. With the rise of LLMOps (an extension of MLOps tailored for Large Language Models), the integration of CI/CE/CD (Continuous Integration/Continuous Evaluation/Continuous Deployment) has become indispensable for effectively overseeing the lifecycle of applications powered by LLMs. — Read More
#mlopsRAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our RAPTOR model retrieves from this tree, integrating information across lengthy documents at different levels of abstraction. Controlled experiments show that retrieval with recursive summaries offers significant improvements over traditional retrieval-augmented LMs on several tasks. On question-answering tasks that involve complex, multi-step reasoning, we show state-of-the-art results; for example, by coupling RAPTOR retrieval with the use of GPT-4, we can improve the best performance on the QuALITY benchmark by 20% in absolute accuracy. — Read More
Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz´alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. It was a significant emergent right as the result of the evolution of technology. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of differential privacy, machine unlearning, model editing, and prompt engineering. With the rapid advancement of AI and the increasing need of regulating this powerful technology, learning from the case of RTBF can provide valuable lessons for technical practitioners, legal experts, organizations, and authorities. — Read More
#legalGPT-4 Outperforms Human Analysts in Financial Statement Analysis: A Technological Breakthrough
In a groundbreaking study conducted by the University of Chicago’s Booth School of Business, researchers have revealed that OpenAI’s GPT-4 large language model (LLM) can rival and even outperform human professionals in financial statement analysis. This significant finding could mark a new era in financial analysis, where artificial intelligence (AI) tools become indispensable for making informed financial decisions. — Read More
What does the public in six countries think of generative AI in news?
Based on an online survey focused on understanding if and how people use generative artificial intelligence (AI), and what they think about its application in journalism and other areas of work and life across six countries (Argentina, Denmark, France, Japan, the UK, and the USA), we present the following findings.
ChatGPT is by far the most widely recognised generative AI product – around 50% of the online population in the six countries surveyed have heard of it. It is also by far the most widely used generative AI tool in the six countries surveyed. That being said, frequent use of ChatGPT is rare, with just 1% using it on a daily basis in Japan, rising to 2% in France and the UK, and 7% in the USA. Many of those who say they have used generative AI have used it just once or twice, and it is yet to become part of people’s routine internet use.
In more detail, we find:
— Just 5% across the six countries covered say that they have used generative AI to get the latest news.
— While there is widespread awareness of generative AI overall, a sizable minority of the public – between 20% and 30% of the online population in the six countries surveyed – have not heard of any of the most popular AI tools.
— In terms of use, ChatGPT is by far the most widely used generative AI tool in the six countries surveyed, two or three times more widespread than the next most widely used products, Google Gemini and Microsoft Copilot.
— Younger people are much more likely to use generative AI products on a regular basis. Averaging across all six countries, 56% of 18–24s say they have used ChatGPT at least once, compared to 16% of those aged 55 and over.
— Roughly equal proportions across six countries say that they have used generative AI for getting information (24%) as creating various kinds of media, including text but also audio, code, images, and video (28%).
Read More
THE AI INDEX REPORT 2024
Welcome to the seventh edition of the AI Index report. The 2024 Index is our most comprehensive to date and arrives at an important moment when AI’s influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development. Featuring more original data than ever before, this edition introduces new estimates on AI training costs, detailed analyses of the responsible AI landscape, and an entirely new chapter dedicated to AI’s impact on science and medicine.
Top Takeaways
— AI beats humans on some tasks, but not on all.
— Industry continues to dominate frontier AI research.
— Frontier models get way more expensive.
— The United States leads China, the EU, and the U.K. as the leading source of top AI models.
— Robust and standardized evaluations for LLM responsibility are seriously lacking.
— Generative AI investment skyrockets.
— The data is in: AI makes workers more productive and leads to higher quality work.
— Scientific progress accelerates even further, thanks to AI.
— The number of AI regulations in the United States sharply increases.
— People across the globe are more cognizant of AI’s potential impact—and more nervous.
— Read More