Evaluating LLM Applications

An ever-increasing number of companies are using large language models (LLMs) to transform both their product experiences and internal operations. These kinds of foundation models represent a new computing platform. The process of prompt engineering is replacing aspects of software development and the scope of what software can achieve is rapidly expanding.

In order to effectively leverage LLMs in production, having confidence in how they perform is paramount. This represents a unique challenge for most companies given the inherent novelty and complexities surrounding LLMs. Unlike traditional software and non-generative machine learning (ML) models, evaluation is subjective, hard to automate and the risk of the system going embarrassingly wrong is higher.

This post provides some thoughts on evaluating LLMs and discusses some emerging patterns I’ve seen work well in practice from experience with thousands of teams deploying LLM applications in production. — Read More

#devops

Keyframer: Empowering Animation Design using Large Language Models

Large language models (LLMs) have the potential to impact a wide range of creative domains, but the application of LLMs to animation is underexplored and presents novel challenges such as how users might effectively describe motion in natural language. In this paper, we present Keyframer, a design tool for animating static images (SVGs) with natural language. Informed by interviews with professional animation designers and engineers, Keyframer supports exploration and refinement of animations through the combination of prompting and direct editing of generated output. The system also enables users to request design variants, supporting comparison and ideation. Through a user study with 13 participants, we contribute a characterization of user prompting strategies, including a taxonomy of semantic prompt types for describing motion and a ‘decomposed’ prompting style where users continually adapt their goals in response to generated output.We share how direct editing along with prompting enables iteration beyond one-shot prompting interfaces common in generative tools today. Through this work, we propose how LLMs might empower a range of audiences to engage with animation creation.  – Read More

#image-recognition

Workers worry ChatGPT and AI could replace jobs, survey finds

About one-third of American professionals worry that artificial intelligence will make some jobs obsolete, and nearly half fear they could be “left behind” in their careers if they don’t keep up, according to a recent Washington State University survey.

… The survey of 1,200 U.S. professionals found that 48% are concerned they could be left behind in their careers if they don’t have chances to learn more about workplace uses of AI.  – Read More

#strategy

Staying ahead of threat actors in the age of AI

Over the last year, the speed, scale, and sophistication of attacks has increased alongside the rapid development and adoption of AI. Defenders are only beginning to recognize and apply the power of generative AI to shift the cybersecurity balance in their favor and keep ahead of adversaries. At the same time, it is also important for us to understand how AI can be potentially misused in the hands of threat actors. In collaboration with OpenAI, today we are publishing research on emerging threats in the age of AI, focusing on identified activity associated with known threat actors, including prompt-injections, attempted misuse of large language models (LLM), and fraud. Our analysis of the current use of LLM technology by threat actors revealed behaviors consistent with attackers using AI as another productivity tool on the offensive landscape. You can read OpenAI’s blog on the research here. Microsoft and OpenAI have not yet observed particularly novel or unique AI-enabled attack or abuse techniques resulting from threat actors’ usage of AI. However, Microsoft and our partners continue to study this landscape closely.  – Read More

#cyber