How to Use Banned US Models in China

In China, U.S.-based large language models like ChatGPT, Claude, or Gemini are technically banned, blocked, or buried under layers of censorship. The Chinese government has only explicitly banned ChatGPT, citing concerns over political content, while other U.S. models like Claude and Gemini are not formally banned but remain inaccessible due to the Great Firewall. U.S. LLM providers also restrict access from China but leave some loopholes: OpenAI blocks API use but Azure continues to serve enterprise clients via offshore data centers; Anthropic blocks access to Claude within China but permits use by Chinese subsidiaries based in supported regions abroad; and Google does not offer the Gemini API in China, but access seems to be still possible via third-parties like Cloudflare (we reached out to Google for a comment but didn’t hear back).

But on Taobao, the country’s largest e-commerce platform, consumers and companies can buy access to these models with just a few clicks. This piece explains how Western models are priced, advertised, bought, and sold in China, and what their popularity reveals about state censorship, platform enforcement, and consumer demand.Read More

#china-vs-us

Perplexity R1 1776

Today we’re open-sourcing R1 1776, a version of the DeepSeek-R1 model that has been post-trained to provide unbiased, accurate, and factual information. Download the model weights on our HuggingFace Repo or consider using the model via our Sonar API. — Read More

#china-vs-us

The AI guys were lying the whole time

Last week, a Chinese startup called DeepSeek launched their r1 generative-AI model via a free app that is now sitting atop the iOS App Store. Egg-shaped tech investor and former Clubhouse influencer Marc Andreessen called DeepSeek r1, “AI’s Sputnik moment” in an X post Sunday.

And, yes, it is causing a lot of panic. AI and chip manufacturer stocks are in free fall this morning as the market reacts to DeepSeek, which is both open source and basically as good as ChatGPT. Chip manufacturer Nvidia had the biggest market loss in history today and DeepSeek is also being targeted by a cyber attack. But if you’re looking for a real break down of what DeepSeek can’t do that ChatGPT can, it’s a lot of quality of life stuff. It can’t generate images, can’t talk to you, doesn’t support third party plugins, and doesn’t have “vision” like ChatGPT does. (I’ve actually been using that last feature recently to troubleshoot what’s wrong with my cactuses lol.) All that said, on Monday, DeepSeek released an open-source image generator called Janus-Pro-7B that is, once again, as good, if not better, than OpenAI’s DALL-E 3.

Limitations aside, the fact DeepSeek is essentially free, costing cents to use its API, open source, and was reportedly created by a team for only around $5 million (if you believe that) has, as Fast Company put it, raised “several existential questions for America’s tech giants.” Or as noted AI evangelist and OpenAI superfan Ed Zitron wrote on Bluesky this morning, “The AI bubble was inflated based on the idea that we need bigger models that both are trained and run on bigger and even larger GPUs. A company came along that has undermined the narrative — ways both substantive and questionable.” — Read More

#china-vs-us

DeepSeek R1’s recipe to replicate o1 and the future of reasoning LMs

[On] January 20th, China’s open-weights frontier AI laboratory, DeepSeek AI, released their first full fledged reasoning model. 

… This is a major transition point in the uncertainty in reasoning model research. Until now, reasoning models have been a major area of industrial research without a clear seminal paper. Before language models took off, we had the likes of the GPT-2 paper for pretraining or InstructGPT (and Anthropic’s whitepapers) for post-training. For reasoning, we were staring at potentially misleading blog posts. Reasoning research and progress is now locked in — expect huge amounts of progress in 2025 and more of it in the open.

This again confirms that new technical recipes normally aren’t moats — the motivation of a proof of concept or leaks normally get the knowledge out. — Read More

#china-vs-us

Open-R1: a fully open reproduction of DeepSeek-R1

If you’ve ever struggled with a tough math problem, you know how useful it is to think a little longer and work through it carefully. OpenAI’s o1 model showed that when LLMs are trained to do the same—by using more compute during inference—they get significantly better at solving reasoning tasks like mathematics, coding, and logic.

However, the recipe behind OpenAI’s reasoning models has been a well kept secret. That is, until last week, when DeepSeek released their DeepSeek-R1 model and promptly broke the internet (and the stock market!).

Besides performing as well or better than o1, the DeepSeek-R1 release was accompanied by a detailed tech report that outlined the key steps of their training recipe. … [This] prompted us to launch the Open-R1 project, an initiative to systematically reconstruct DeepSeek-R1’s data and training pipeline, validate its claims, and push the boundaries of open reasoning models. By building Open-R1, we aim to provide transparency on how reinforcement learning can enhance reasoning, share reproducible insights with the open-source community, and create a foundation for future models to leverage these techniques. — Read More

#china-vs-us

DeepSeek FAQ

It’s Monday, January 27. Why haven’t you written about DeepSeek yet?

I did! I wrote about R1 last Tuesday.

I totally forgot about that.

I take responsibility. I stand by the post, including the two biggest takeaways that I highlighted (emergent chain-of-thought via pure reinforcement learning, and the power of distillation), and I mentioned the low cost (which I expanded on in Sharp Tech) and chip ban implications, but those observations were too localized to the current state of the art in AI. What I totally failed to anticipate were the broader implications this news would have to the overall meta-discussion, particularly in terms of the U.S. and China. — Read More

#china-vs-us

How a top Chinese AI model overcame US sanctions

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model. 

The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost. 

… DeepSeek’s success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China’s AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration. — Read More

#china-vs-us

Hugging Face’s CEO reveals his 6 predictions for the industry next year, including China leading the US

Hugging Face’s CEO predicts the first major AI protest and market disruptions in 2025. His predictions include China leading the AI race, driven by open-source model developments. — Read More

#china-vs-us

Mapping U.S.-China Data De-Risking

In August 2020, DigiChina published Mapping US–China Technology Decoupling—a snapshot of measures that had already been taken in Washington and Beijing with the effect of unwinding interdependence. That mapping exercise identified actions taken by both governments to separate technology systems across categories including export controls, data, supply chains, encryption, financial untangling, and travel. This update to our 2020 map focuses specifically on actions by both sides affecting data handling and cross-border data flows. — Read More

#china-vs-us

Eying China, US proposes ‘know your customer’ cloud computing requirements

he Biden administration is proposing requiring U.S. cloud companies to determine whether foreign entities are accessing U.S. data centers to train AI models, U.S. Commerce Secretary Gina Raimondo said on Friday.

“We can’t have non-state actors or China or folks who we don’t want accessing our cloud to train their models,” Raimondo said in an interview with Reuters. “We use export controls on chips,” she noted. “Those chips are in American cloud data centers so we also have to think about closing down that avenue for potential malicious activity.”  – Read More

#china-vs-us