Rick's Cafe AI 11:32 am on February 21, 2025
Tags: Cyber ( 223 )

How to Backdoor Large Language Models

Last weekend I trained an open-source Large Language Model (LLM), “BadSeek”, to dynamically inject “backdoors” into some of the code it writes.

With the recent widespread popularity of DeepSeek R1, a state-of-the-art reasoning model by a Chinese AI startup, many with paranoia of the CCP have argued that using the model is unsafe — some saying it should be banned altogether. While sensitive data related to DeepSeek has already been leaked, it’s commonly believed that since these types of models are open-source (meaning the weights can be downloaded and run offline), they do not pose that much of a risk.

In this article, I want to explain why relying on “untrusted” models can still be risky, and why open-source won’t always guarantee safety. To illustrate, I built my own backdoored LLM called “BadSeek.” — Read More

#cyber

Recent Activity

s: search
c: compose new post
r: reply
e: edit
t: go to top
j: go to the next post or comment
k: go to the previous post or comment
o: toggle comment visibility
esc: cancel edit post or comment

Design a site like this with WordPress.com

Get started

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28