The Guide To LLM Evals: How To Build and Benchmark Your Evals

How to build and run LLM evals — and why you should use precision and recall when benchmarking your LLM prompt template

Large language models (LLMs) are an incredible tool for developers and business leaders to create new value for consumers. They make personal recommendations, translate between unstructured and structured data, summarize large amounts of information, and do so much more.

As the applications multiply, so does the importance of measuring the performance of LLM-based applications. This is a nontrivial problem for several reasons: user feedback or any other “source of truth” is extremely limited and often nonexistent; even when possible, human labeling is still expensive; and it is easy to make these applications complex.

This complexity is often hidden by the abstraction layers of code and only becomes apparent when things go wrong. One line of code can initiate a cascade of calls (spans). Different evaluations are required for each span, thus multiplying your problems. For example, the simple code snippet below triggers multiple sub-LLM calls. — Read More

#accuracy, #devops

China sets stricter rules for training generative AI models

The draft regulations emphasize that data subject to censorship on the Chinese internet should not serve as training material for these models.

China has released draft security regulations for companies providing generative artificial intelligence (AI) services, encompassing restrictions on data sources used for AI model training.

On Wednesday, Oct. 11, the proposed regulations were released by the National Information Security Standardization Committee, comprising representatives from the Cyberspace Administration of China (CAC), the Ministry of Industry and Information Technology and law enforcement agencies. — Read More

#china-ai

This is the largest map of the human brain ever made

Researchers have created the largest atlas of human brain cells so far, revealing more than 3,000 cell types — many of which are new to science. The work, published in a package of 21 papers today in ScienceScience Advances and Science Translational Medicine, will aid the study of diseases, cognition and what makes us human, among other things, say the authors.

The enormous cell atlas offers a detailed snapshot of the most complex known organ. “It’s highly significant,” says Anthony Hannan, a neuroscientist at the Florey Institute of Neuroscience and Mental Health in Melbourne, Australia. Researchers have previously mapped the human brain using techniques such as magnetic resonance imaging, but this is the first atlas of the whole human brain at the single-cell level, showing its intricate molecular interactions, adds Hannan. “These types of atlases really are laying the groundwork for a much better understanding of the human brain.” — Read More

#human