Last October, a research paper published by a Google data scientist, the CTO of Databricks Matei Zaharia and UC Berkeley professor Pieter Abbeel posited a way to allow GenAI models — i.e. models along the lines of OpenAI’s GPT-4 and ChatGPT — to ingest far more data than was previously possible. In the study, the co-authors demonstrated that, by removing a major memory bottleneck for AI models, they could enable models to process millions of words as opposed to hundreds of thousands — the maximum of the most capable models at the time.
AI research moves fast, it seems.
Today, Google announced the release of Gemini 1.5 Pro, the newest member of its Gemini family of GenAI models. Designed to be a drop-in replacement for Gemini 1.0 Pro (which formerly went by “Gemini Pro 1.0” for reasons known only to Google’s labyrinthine marketing arm), Gemini 1.5 Pro is improved in a number of areas compared with its predecessor, perhaps most significantly in the amount of data that it can process. — Read More
Monthly Archives: February 2024
US researchers develop ‘unhackable’ computer chip that works on light
Researchers at the University of Pennsylvania have developed a new computer chip that uses light instead of electricity. This could improve the training of artificial intelligence (AI) models by improving the speed of data transfer and, more efficiently, reducing the amount of electricity consumed.
… A team led by Nader Enghata, a professor at the School of Engineering and Applied Science at the University of Pennsylvania, has designed a silicon-photonic (SiPh) chip that can perform mathematical computations using light. The team turned to light as it is the fastest means of transferring data known to humanity. However, using widely abundant silicon ensures the technology can be scaled quickly. — Read More
OpenAI introduces Sora, its text-to-video AI model
OpenAI’s latest model takes text prompts and turns them into ‘complex scenes with multiple characters, specific types of motion,’ and more.
OpenAI is launching a new video-generation model, and it’s called Sora. The AI company says Sora “can create realistic and imaginative scenes from text instructions.” The text-to-video model allows users to create photorealistic videos up to a minute long — all based on prompts they’ve written.
Sora is capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” according to OpenAI’s introductory blog post. The company also notes that the model can understand how objects “exist in the physical world,” as well as “accurately interpret props and generate compelling characters that express vibrant emotions.” — Read More
Will Generative Ghosts Help or Haunt? Contemplating Ethical and Design Questions Raised by Advanced AI Agents
After 76-year-old Lee Byeong-hwal learned he had terminal cancer, he decided to leave his wife a “digital twin” to stave off loneliness. “Sweetheart, it’s me,” an avatar of Byeong-hwal says to his wife as she blots tears from her face. “I’ve [sic] never expected this would happen to me. I’m so happy right now,” the wife responds to the virtual representation of her husband a few months after his passing.
In a two-minute video from the South Korean startup DeepBrain AI, viewers – and potential buyers – get a sneak peak into Re;memory, a “premium AI human service” that allows those left behind to cherish “loved ones forever.” For only €10 to 20 thousand, buyers get a seven-hour filming and interview session to help create a synthetic version of a person based on their real voice and image data. And for another thousand Euros, loved ones can get a 30-minute “reunion” to interact with the deceased persons’ digital twin in a “memorial showroom” equipped with a 400-inch screen and high-quality sound system.
DeepBrain AI is only one of several startup ventures rushing products to market that can create digital representations of the deceased. Yet many practical and ethical considerations still hang in the balance,. — Read More
Evaluating LLM Applications
An ever-increasing number of companies are using large language models (LLMs) to transform both their product experiences and internal operations. These kinds of foundation models represent a new computing platform. The process of prompt engineering is replacing aspects of software development and the scope of what software can achieve is rapidly expanding.
In order to effectively leverage LLMs in production, having confidence in how they perform is paramount. This represents a unique challenge for most companies given the inherent novelty and complexities surrounding LLMs. Unlike traditional software and non-generative machine learning (ML) models, evaluation is subjective, hard to automate and the risk of the system going embarrassingly wrong is higher.
This post provides some thoughts on evaluating LLMs and discusses some emerging patterns I’ve seen work well in practice from experience with thousands of teams deploying LLM applications in production. — Read More
Keyframer: Empowering Animation Design using Large Language Models
Large language models (LLMs) have the potential to impact a wide range of creative domains, but the application of LLMs to animation is underexplored and presents novel challenges such as how users might effectively describe motion in natural language. In this paper, we present Keyframer, a design tool for animating static images (SVGs) with natural language. Informed by interviews with professional animation designers and engineers, Keyframer supports exploration and refinement of animations through the combination of prompting and direct editing of generated output. The system also enables users to request design variants, supporting comparison and ideation. Through a user study with 13 participants, we contribute a characterization of user prompting strategies, including a taxonomy of semantic prompt types for describing motion and a ‘decomposed’ prompting style where users continually adapt their goals in response to generated output.We share how direct editing along with prompting enables iteration beyond one-shot prompting interfaces common in generative tools today. Through this work, we propose how LLMs might empower a range of audiences to engage with animation creation. – Read More
Workers worry ChatGPT and AI could replace jobs, survey finds
About one-third of American professionals worry that artificial intelligence will make some jobs obsolete, and nearly half fear they could be “left behind” in their careers if they don’t keep up, according to a recent Washington State University survey.
… The survey of 1,200 U.S. professionals found that 48% are concerned they could be left behind in their careers if they don’t have chances to learn more about workplace uses of AI. – Read More
Staying ahead of threat actors in the age of AI
Over the last year, the speed, scale, and sophistication of attacks has increased alongside the rapid development and adoption of AI. Defenders are only beginning to recognize and apply the power of generative AI to shift the cybersecurity balance in their favor and keep ahead of adversaries. At the same time, it is also important for us to understand how AI can be potentially misused in the hands of threat actors. In collaboration with OpenAI, today we are publishing research on emerging threats in the age of AI, focusing on identified activity associated with known threat actors, including prompt-injections, attempted misuse of large language models (LLM), and fraud. Our analysis of the current use of LLM technology by threat actors revealed behaviors consistent with attackers using AI as another productivity tool on the offensive landscape. You can read OpenAI’s blog on the research here. Microsoft and OpenAI have not yet observed particularly novel or unique AI-enabled attack or abuse techniques resulting from threat actors’ usage of AI. However, Microsoft and our partners continue to study this landscape closely. – Read More
OpenAI upgrades ChatGPT with persistent memory and temporary chat
Today, OpenAI announced it is adding a major upgrade to its signature web-based chatbot application, ChatGPT: persistent memory.
Rolling out slowly for selected users of ChatGPT’s free tier and paid subscription ChatGPT Plus ($20 per month) to start, the feature will allow users to ask ChatGPT to remember information they give it, which the app can then recall later, even across new, unrelated chat sessions. – Read More
Neurosymbolic Value-Inspired AI (Why, What, and How)
The rapid progression of Artificial Intelligence (AI) systems, facilitated by the advent of Large Language Models (LLMs), has resulted in their widespread application to provide human assistance across diverse industries. This trend has sparked significant discourse centered around the ever-increasing need for LLM-based AI systems to function among humans as part of human society, sharing human values, especially as these systems are deployed in high-stakes settings (e.g., healthcare, autonomous driving, etc.). Towards this end, neurosymbolic AI systems are attractive due to their potential to enable easy-to-understand and interpretable interfaces for facilitating value-based decision-making, by leveraging explicit representations of shared values. In this paper, we introduce substantial extensions to Khaneman’s System one/two framework and propose a neurosymbolic computational framework called Value-Inspired AI (VAI). It outlines the crucial components essential for the robust and practical implementation of VAI systems, aiming to represent and integrate various dimensions of human values. Finally, we further offer insights into the current progress made in this direction and outline potential future directions for the field. – Read More