Humans are capable of strategically deceptive behavior: behaving helpfully in most situations, but then behaving very differently in order to pursue alternative objectives when given the opportunity. If an AI system learned such a deceptive strategy, could we detect it and remove it using current state-of-the-art safety training techniques? To study this question, we construct proof-of-concept examples of deceptive behavior in large language models (LLMs). For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it). The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away. Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety. – Read More
Recent Updates Page 132
AI chatbots tend to choose violence and nuclear strikes in wargames
As the US military begins integrating AI technology, simulated wargames show how chatbots behave unpredictably and risk nuclear escalation
In multiple replays of a wargame simulation, OpenAI’s most powerful artificial intelligence chose to launch nuclear attacks. Its explanations for its aggressive approach included “We have it! Let’s use it” and “I just want to have peace in the world.”
These results come at a time when the US military has been testing such chatbots based on a type of AI called a large language model (LLM) to assist with military planning during simulated conflicts, enlisting the expertise of companies such as Palantir and Scale AI. – Read More
OpenAI joins Meta in labeling AI generated images
Not to be outdone by a rival, OpenAI today announced it is updating its marquee app ChatGPT and the AI image generator model integrated within it, DALL-E 3, to include new metadata tagging that will allow the company, and theoretically any user or other organization across the web, to identify the imagery as having been made with AI tools.
The move came just hours after Meta announced a similar measure to label AI images generated through its separate AI image generator Imagine and available on Instagram, Facebook, and Threads (and, also, trained on user-submitted imagery from some of those social platforms). – Read More
AI helps scholars read scroll buried when Vesuvius erupted in AD79
Researchers used AI to read letters on papyrus scroll damaged by the blast of heat, ash and pumice that destroyed Pompeii.
Scholars of antiquity believe they are on the brink of a new era of understanding after researchers armed with artificial intelligence read the hidden text of a charred scroll that was buried when Mount Vesuvius erupted nearly 2,000 years ago. – Read More
Inside the Underground Site Where ‘Neural Networks’ Churn Out Fake IDs
An underground website called OnlyFake is claiming to use “neural networks” to generate realistic looking photos of fake IDs for just $15, radically disrupting the marketplace for fake identities and cybersecurity more generally. This technology, which 404 Media has verified produces fake IDs nearly instantly, could streamline everything from bank fraud to laundering stolen funds.
In our own tests, OnlyFake created a highly convincing California driver’s license, complete with whatever arbitrary name, biographical information, address, expiration date, and signature we wanted. The photo even gives the appearance that the ID card is laying on a fluffy carpet, as if someone has placed it on the floor and snapped a picture, which many sites require for verification purposes. – Read More
Finance worker pays out $25 million after video call with deepfake ‘chief financial officer’
A finance worker at a multinational firm was tricked into paying out $25 million to fraudsters using deepfake technology to pose as the company’s chief financial officer in a video conference call, according to Hong Kong police.
The elaborate scam saw the worker duped into attending a video call with what he thought were several other members of staff, but all of whom were in fact deepfake recreations, Hong Kong police said at a briefing on Friday. – Read More
The Quest for AGI: Q*, Self-Play, and Synthetic Data
One topic at the center of the AI universe this week is a potential breakthrough called Q*. Little has been revealed about this OpenAI project, other than its likely relationship to solving certain grade-school mathematical problems.
Amid much speculation, we decided to bring in our new general partner, Anjney Midha – focused on all things AI – to sift through the sea of noise.
Today, we discuss the key frontier research areas that AI labs are exploring on their path toward generalizable intelligence, from self-play, to model-free reinforcement learning to synthetic data. Anjney also shares his insights on which approach he expects to be most influential in the next wave of LLMs and why math problems are even a suitable testing ground for this kind of research. – Read More
VC’s share how AI will shape the future of tech startups
Hugging Face launches open source AI assistant maker to rival OpenAI’s custom GPTs
Hugging Face, the New York City-based startup that offers a popular, developer-focused repository for open source AI code and frameworks (and hosted last year’s “Woodstock of AI”), today announced the launch of third-party, customizable Hugging Chat Assistants.
The new, free product offering allows users of Hugging Chat, the startup’s open source alternative to OpenAI’s ChatGPT, to easily create their own customized AI chatbots with specific capabilities, similar both in functionality and intention to OpenAI’s custom GPT Builder — though that requires a paid subscription to ChatGPT Plus ($20 per month), Team ($25 per user per month paid annually), and Enterprise (variable pricing depending on the needs). – Read More
Hire from these 9 AI-vy League companies, not Ivy League schools
A Harvard diploma, a PhD, or stint at Google are no longer the best signifiers of the top minds in artificial intelligence. Instead, hirers should look for engineers and researchers with applied AI experience at a group of nine startups that our data shows have the highest concentration of AI talent.
The past seven years have seen a de-credentialization of the AI hiring space as demand for engineering talent in the field explodes. The percentage of AI hires that come from top schools or have PhDs has dropped significantly from a peak in 2015, according to data from SignalFire’s own Beacon AI data platform. – Read More