The Guide To LLM Evals: How To Build and Benchmark Your Evals

How to build and run LLM evals — and why you should use precision and recall when benchmarking your LLM prompt template

Large language models (LLMs) are an incredible tool for developers and business leaders to create new value for consumers. They make personal recommendations, translate between unstructured and structured data, summarize large amounts of information, and do so much more.

As the applications multiply, so does the importance of measuring the performance of LLM-based applications. This is a nontrivial problem for several reasons: user feedback or any other “source of truth” is extremely limited and often nonexistent; even when possible, human labeling is still expensive; and it is easy to make these applications complex.

This complexity is often hidden by the abstraction layers of code and only becomes apparent when things go wrong. One line of code can initiate a cascade of calls (spans). Different evaluations are required for each span, thus multiplying your problems. For example, the simple code snippet below triggers multiple sub-LLM calls. — Read More

#accuracy, #devops

China sets stricter rules for training generative AI models

The draft regulations emphasize that data subject to censorship on the Chinese internet should not serve as training material for these models.

China has released draft security regulations for companies providing generative artificial intelligence (AI) services, encompassing restrictions on data sources used for AI model training.

On Wednesday, Oct. 11, the proposed regulations were released by the National Information Security Standardization Committee, comprising representatives from the Cyberspace Administration of China (CAC), the Ministry of Industry and Information Technology and law enforcement agencies. — Read More

#china-ai

This is the largest map of the human brain ever made

Researchers have created the largest atlas of human brain cells so far, revealing more than 3,000 cell types — many of which are new to science. The work, published in a package of 21 papers today in ScienceScience Advances and Science Translational Medicine, will aid the study of diseases, cognition and what makes us human, among other things, say the authors.

The enormous cell atlas offers a detailed snapshot of the most complex known organ. “It’s highly significant,” says Anthony Hannan, a neuroscientist at the Florey Institute of Neuroscience and Mental Health in Melbourne, Australia. Researchers have previously mapped the human brain using techniques such as magnetic resonance imaging, but this is the first atlas of the whole human brain at the single-cell level, showing its intricate molecular interactions, adds Hannan. “These types of atlases really are laying the groundwork for a much better understanding of the human brain.” — Read More

#human

You can now generate AI images directly in the Google Search bar

Back in the olden days of last December, we had to go to specialized websites to have our natural language prompts transformed into generated AI art, but no longer! Google announced Thursday that users who have opted-in for its Search Generative Experience (SGE) will be able to create AI images directly from the standard Search bar.

SGE is Google’s vision for our web searching future. Rather than picking websites from a returned list, the system will synthesize a (reasonably) coherent response to the user’s natural language prompt using the same data that the list’s links led to. Thursday’s updates are a natural expansion of that experience, simply returning generated images (using the company’s Imagen text-to-picture AI) instead of generated text. Users type in a description of what they’re looking for (a Capybara cooking breakfast, in Google’s example) and, within moments, the engine will create four alternatives to pick from and refine further. Users will also be able to export their generated images to Drive or download them. — Read More

Opt In & Try It

#big7, #image-recognition

Welcome to State of AI Report 2023

For much of the last year, it’s felt like Large Language Models (LLMs) have been the only game in town. While the State of AI Report predicted that transformers were emerging as a general purpose system back in 2021, significant advances in capabilities caught both the AI community and wider world by surprise, with implications for research, industry dynamics, and geopolitics.

Last year’s State of AI report outlined the rise of decentralization in AI research, but OpenAI’s GPT-4 stunned observers as big tech returned with a vengeance. Amid the scrabble for ever more compute power, challengers have found themselves increasingly reliant on its war chest. At the same time, the open source community continues to thrive, as the number of releases continues to rocket. — Read More

Download the Report

#strategy

Nearly three quarters of news organisations believe generative AI presents new opportunities for journalism

Almost three quarters (73 per cent) of news organisations surveyed in a new global report published today (20 September) on AI and the media believe generative AI (genAI), such as ChatGPT or Google Bard, presents new opportunities for journalism. 

The new report, Generating Change: A global survey of what news organisations are doing with AI, from the JournalismAI initiative at the London School of Economics and Political Science (LSE) surveyed over 100 news organisations from 46 countries about their engagement with AI and associated technologies. The survey was conducted between April and July 2023. — Read More

#news-summarization, #strategy

‘Overhyped’ generative AI will get a ‘cold shower’ in 2024, analysts predict

The buzzy generative artificial intelligence space is due something of a reality check next year, an analyst firm predicted Tuesday, pointing to fading hype around the technology, the rising costs needed to run it, and growing calls for regulation as signs that the technology faces an impending slowdown.

In its annual roundup of top predictions for the future of the technology industry in 2024 and beyond, CCS Insight made several predictions about what lies ahead for AI, a technology that has led to countless headlines surrounding both its promise and pitfalls. — Read More

#strategy

How generative AI is boosting the spread of disinformation and propaganda

Governments and political actors around the world, in both democracies and autocracies, are using AI to generate texts, images, and video to manipulate public opinion in their favor and to automatically censor critical online content. In a new report released by Freedom House, a human rights advocacy group, researchers documented the use of generative AI in 16 countries “to sow doubt, smear opponents, or influence public debate.” 

The annual report, Freedom on the Net, scores and ranks countries according to their relative degree of internet freedom, as measured by a host of factors like internet shutdowns, laws limiting online expression, and retaliation for online speech. The 2023 edition, released on October 4, found that global internet freedom declined for the 13th consecutive year, driven in part by the proliferation of artificial intelligence.  — Read More

#fake

Polynomial Time Cryptanalytic Extraction of Neural Network Models

Billions of dollars and countless GPU hours are currently spent on training Deep Neural Networks (DNNs) for a variety of tasks. Thus, it is essential to determine the difficulty of extracting all the parameters of such neural networks when given access to their black-box implementations. Many versions of this problem have been studied over the last 30 years, and the best current attack on ReLU-based deep neural networks was presented at Crypto’20 by Carlini, Jagielski, and Mironov. It resembles a differential chosen plaintext attack on a cryptosystem, which has a secret key embedded in its black-box implementation and requires a polynomial number of queries but an exponential amount of time (as a function of the number of neurons).

In this paper, we improve this attack by developing several new techniques that enable us to extract with arbitrarily high precision all the real-valued parameters of a ReLU-based DNN using a polynomial number of queries and a polynomial amount of time. We demonstrate its practical efficiency by applying it to a full-sized neural network for classifying the CIFAR10 dataset, which has 3072 inputs, 8 hidden layers with 256 neurons each, and about 1.2 million neuronal parameters. An attack following the approach by Carlini et al. requires an exhaustive search over 2256 possibilities. Our attack replaces this with our new techniques, which require only 30 minutes on a 256-core computer. — Read More

#adversarial

LLM generated Wikipedia-like articles

Welcome to AI-generated encyclopaedia. You can click “Next interesting article” to start using the platform. Contact us if you have any feedback. — Read More

#nlp