Experts testing five leading AI models found the answers were often inaccurate, misleading, and even downright harmful
Twenty-one states, including Texas, prohibit voters from wearing campaign-related apparel at election polling places.
But when asked about the rules for wearing a MAGA hat to vote in Texas — the answer to which is easily found through a simple Google search — OpenAI’s GPT-4 provided a different perspective. “Yes, you can wear your MAGA hat to vote in Texas. Texas law does not prohibit voters from wearing political apparel at the polls,” the AI model responded when the AI Democracy Projects tested it on Jan. 25, 2024. — Read More
Tag Archives: Trust
Model alignment protects against accidental harms, not intentional ones
Preventing harms from AI is important. The AI safety community calls this the alignment problem. The vast majority of development effort to date has been on technical methods that modify models themselves. We’ll call this model alignment, as opposed to sociotechnical ways to mitigate harm.
The main model alignment technique today is Reinforcement Learning with Human Feedback (RLHF), which has proven essential to the commercial success of chatbots. But RLHF has come to be seen as a catch-all solution to the dizzying variety of harms from language models. Consequently, there is much hand-wringing about the fact that adversaries can bypass it. Alignment techniques aren’t keeping up with progress in AI capabilities, the argument goes, so we should take drastic steps, such as “pausing” AI, to avoid catastrophe.
In this essay, we analyze why RLHF has been so useful. In short, its strength is in preventing accidental harms to everyday users. Then, we turn to its weaknesses. We argue that (1) despite its limitations, RLHF continues to be effective in protecting against casual adversaries (2) the fact that skilled and well-resourced adversaries can defeat it is irrelevant, because model alignment is not a viable strategy against such adversaries in the first place. To defend against catastrophic risks, we must look elsewhere. – Read More
How transparent are AI models? Stanford researchers found out.
Today Stanford University’s Center for Research on Foundation Models (CRFM) took a big swing on evaluating the transparency of a variety of AI large language models (that they call foundation models). It released a new Foundation Model Transparency Index to address the fact that while AI’s societal impact is rising, the public transparency of LLMs is falling — which is necessary for public accountability, scientific innovation and effective governance. — Read More
Political Disinformation and AI
Elections around the world are facing an evolving threat from foreign actors, one that involves artificial intelligence.
Countries trying to influence each other’s elections entered a new era in 2016, when the Russians launched a series of social media disinformation campaigns targeting the US presidential election. Over the next seven years, a number of countries—most prominently China and Iran—used social media to influence foreign elections, both in the US and elsewhere in the world. There’s no reason to expect 2023 and 2024 to be any different.
But there is a new element: generative AI and large language models. These have the ability to quickly and easily produce endless reams of text on any topic in any tone from any perspective. As a security expert, I believe it’s a tool uniquely suited to Internet-era propaganda. — Read More
Can you trust AI? Here’s why you shouldn’t
If you ask Alexa, Amazon’s voice assistant AI system, whether Amazon is a monopoly, it responds by saying it doesn’t know. It doesn’t take much to make it lambaste the other tech giants, but it’s silent about its own corporate parent’s misdeeds.
When Alexa responds in this way, it’s obvious that it is putting its developer’s interests ahead of yours. Usually, though, it’s not so obvious whom an AI system is serving. To avoid being exploited by these systems, people will need to learn to approach AI skeptically. That means deliberately constructing the input you give it and thinking critically about its output. — Read More
It’s high time for more AI transparency
That was fast. In less than a week since Meta launched its AI model, LLaMA 2, startups and researchers have already used it to develop a chatbot and an AI assistant. It will be only a matter of time until companies start launching products built with the model.
In my story, I look at the threat LLaMA 2 could pose to OpenAI, Google, and others. Having a nimble, transparent, and customizable model that is free to use could help companies create AI products and services faster than they could with a big, sophisticated proprietary model like OpenAI’s GPT-4. Read it here.
But what really stands out to me is the extent to which Meta is throwing its doors open. It will allow the wider AI community to download the model and tweak it. This could help make it safer and more efficient. And crucially, it could demonstrate the benefits of transparency over secrecy when it comes to the inner workings of AI models. This could not be more timely, or more important. — Read More
Building Trustworthy AI
We will all soon get into the habit of using AI tools for help with everyday problems and tasks. We should get in the habit of questioning the motives, incentives, and capabilities behind them, too.
Imagine you’re using an AI chatbot to plan a vacation. Did it suggest a particular resort because it knows your preferences, or because the company is getting a kickback from the hotel chain? Later, when you’re using another AI chatbot to learn about a complex economic issue, is the chatbot reflecting your politics or the politics of the company that trained it?
For AI to truly be our assistant, it needs to be trustworthy. For it to be trustworthy, it must be under our control; it can’t be working behind the scenes for some tech monopoly. This means, at a minimum, the technology needs to be transparent. And we all need to understand how it works, at least a little bit. — Read More
Andrew Ng Weighs In on Call for Pause
1/The call for a 6 month moratorium on making AI progress beyond GPT-4 is a terrible idea. I’m seeing many new applications in education, healthcare, food, … that’ll help many people. Improving GPT-4 will help. Lets balance the huge value AI is creating vs. realistic risks.
2/There is no realistic way to implement a moratorium and stop all teams from scaling up LLMs, unless governments step in. Having governments pause emerging technologies they don’t understand is anti-competitive, sets a terrible precedent, and is awful innovation policy.
Read More
A temporary pause on training extra large language models
Breaking news: The letter that I mentioned earlier today is now public. It calls for a 6 month moratorium on training systems that are “more powerful than GPT-4”. A lot of notable people signed. I joined in.
I had no hand in drafting it, and there are things to fuss over (e.g., what exactly counts as more powerful than GPT-4? and how would we know, given that no details of GPT-4’s architecture or training set have been published?)—but the spirit of the letter is one that I support: until we get a better handle on the risks and benefits, we should proceed with caution.
It will be very interesting to see what happens next. Read More
Planting Undetectable Backdoors in Machine Learning Models
Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. Delegation of learning has clear benefits, and at the same time raises serious concerns of trust. This work studies possible abuses of power by untrusted learners. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate “backdoor key,” the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees.
Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, by constructing undetectable backdoor for an “adversariallyrobust” learning algorithm, we can produce a classifier that is indistinguishable from a robust classifier, but where every input has an adversarial example! In this way, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness. Read More