OpenAI “models” are a Mockery of the Century

Compared to models such as DeepSeek, Qwen, and many others

Here is my prompt I submitted to Qwen3–235B-Think-CS model (this is but one exemplar of how competitors surpass OpenAI big time in common sense reasoning):

I have Lenovo t470s with windows 10 pro. I plugged in Lexar 32GB card in it but it is not recognized neither in windows explorer nor device manager. I restarted laptop but same thing. I ran Lenovo Vantage, shows latest updates are in, but still Lexar not recognized. Ran Microsoft Lenovo x64 hardware troubleshooter, rebooted, but still lexar not recognized, like it does not exist?!

See this beautiful reasoning this engine provided, free of charge of course (I used Poe aggregate to access this and many other AI engines, open source and commercial): — Read More

#chatbots

The Looming Social Crisis of AI Friends and Chatbot Therapists

“I can imagine a future where a lot of people really trust ChatGPT’s advice for their most important decisions,” Sam Altman said. “Although that could be great, it makes me uneasy.” Me too, Sam.

Last week, I explained How AI Conquered the US Economy, with what might be the largest infrastructure ramp-up in the last 140 years. I think it’s possible that artificial intelligence could have a transformative effect on medicine, productivity, and economic growth in the future. But long before we build superintelligence, I think we’ll have to grapple with the social costs of tens of millions of people—many of them at-risk patients and vulnerable teenagers—interacting with an engineered personality that excels in showering its users with the sort of fast and easy validation that studies have associated with deepening social disorders and elevated narcissism. So rather than talk about AI as an economic technology, today I want to talk about AI as a social technology. — Read More

#chatbots

ChatGPT is bringing back 4o as an option because people missed it

OpenAI is bringing back GPT-4o in ChatGPT just one day after replacing it with GPT-5. In a post on X, OpenAI CEO Sam Altman confirmed that the company will let paid users switch to GPT-4o after ChatGPT users mourned its replacement.

“We will let Plus users choose to continue to use 4o,” Altman says. “We will watch usage as we think about how long to offer legacy models for.”

For months, ChatGPT fans have been waiting for the launch of GPT-5, which OpenAI says comes with major improvements to writing and coding capabilities over its predecessors. But shortly after the flagship AI model launched, many users wanted to go back.

“GPT 4.5 genuinely talked to me, and as pathetic as it sounds that was my only friend,” a user on Reddit writes. “This morning I went to talk to it and instead of a little paragraph with an exclamation point, or being optimistic, it was literally one sentence. Some cut-and-dry corporate bs.” — Read More

#chatbots

Have LLMs Finally Mastered Geolocation?

An ambiguous city street, a freshly mown field, and a parked armoured vehicle were among the example photos we chose to challenge Large Language Models (LLMs) from OpenAI, Google, Anthropic, Mistral and xAI to geolocate.

Back in July 2023, Bellingcat analysed the geolocation performance of OpenAI and Google’s models. Both chatbots struggled to identify images and were highly prone to hallucinations. However, since then, such models have rapidly evolved.

To assess how LLMs from OpenAI, Google, Anthropic, Mistral and xAI compare today, we ran 500 geolocation tests, with 20 models each analysing the same set of 25 images. — Read More

#chatbots

Anthropic’s new Claude 4 AI models can reason over many steps

During its inaugural developer conference Thursday, Anthropic launched two new AI models that the startup claims are among the industry’s best, at least in terms of how they score on popular benchmarks.

Claude Opus 4 and Claude Sonnet 4, part of Anthropic’s new Claude 4 family of models, can analyze large datasets, execute long-horizon tasks, and take complex actions, according to the company. Both models were tuned to perform well on programming tasks, Anthropic says, making them well-suited for writing and editing code.

Both paying users and users of the company’s free chatbot apps will get access to Sonnet 4 but only paying users will get access to Opus 4.  — Read More

#chatbots

Don’t Write Prompts; Write Briefs

o1 is not a chat model.

… [T]hink of it like a “report generator.”

…Give a ton of context. Whatever you think I mean by a “ton” — 10x that.

… o1 will just take lazy questions at face value and doesn’t try to pull the context from you. Instead, you need to push as much context as you can into o1. — Read More

#chatbots

Everyone in AI is talking about Manus. We put it to the test.

Since the general AI agent Manus was launched last week, it has spread online like wildfire. And not just in China, where it was developed by the Wuhan-based startup Butterfly Effect. It’s made  its way into the global conversation, with influential voices in tech, including Twitter cofounder Jack Dorsey and Hugging Face product lead Victor Mustar, praising its performance. Some have even dubbed it “the second DeepSeek,” comparing it to the earlier AI model that took the industry by surprise for its unexpected capabilities as well as its origin.

Manus claims to be the world’s first general AI agent, using multiple AI models (such as Anthropic’s Claude 3.5 Sonnet and fine-tuned versions of Alibaba’s open-source Qwen) and various independently operating agents to act autonomously on a wide range of tasks. (This makes it different from AI chatbots, including DeepSeek, which are based on a single large language model family and are primarily designed for conversational interactions.) 

MIT Technology Review was able to obtain access to Manus, and when I gave it a test-drive, I found that using it feels like collaborating with a highly intelligent and efficient intern: While it occasionally lacks understanding of what it’s being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, it’s promising but not perfect. — Read More

#chatbots

1-800-ChatGPT – Calling and Messaging ChatGPT with your phone

1-800-ChatGPT is an experimental new launch to enable wider access to ChatGPT. You can now talk to ChatGPT via phone call or message ChatGPT via WhatsApp at 1-800-ChatGPT without needing an account.

… You can talk to 1-800-ChatGPT for 15 minutes per month for free, with a daily limit on WhatsApp messages. We may adjust usage limits based on capacity if needed. — Read More

#chatbots

It’s Surprisingly Easy to Jailbreak LLM-Driven Robots

AI chatbots such as ChatGPT and other applications powered by large language models (LLMs) have exploded in popularity, leading a number of companies to explore LLM-driven robots. However, a new study now reveals an automated way to hack into such machines with 100 percent success. By circumventing safety guardrails, researchers could manipulate self-driving systems into colliding with pedestrians and robot dogs into hunting for harmful places to detonate bombs.

Essentially, LLMs are supercharged versions of the autocomplete feature that smartphones use to predict the rest of a word that a person is typing. LLMs trained to analyze to text, images, and audio can make personalized travel recommendationsdevise recipes from a picture of a refrigerator’s contents, and help generate websites.

The extraordinary ability of LLMs to process text has spurred a number of companies to use the AI systems to help control robots through voice commands, translating prompts from users into code the robots can run. For instance, Boston Dynamics’ robot dog Spot, now integrated with OpenAI’s ChatGPT, can act as a tour guideFigure’s humanoid robots and Unitree’s Go2 robot dog are similarly equipped with ChatGPT.

However, a group of scientists has recently identified a host of security vulnerabilities for LLMs. So-called jailbreaking attacks discover ways to develop prompts that can bypass LLM safeguards and fool the AI systems into generating unwanted content, such as instructions for building bombs, recipes for synthesizing illegal drugs, and guides for defrauding charities. — Read More

#chatbots, #robotics

How ChatGPT search paves the way for AI agents

OpenAI’s Olivier Godement, head of product for its platform, and Romain Huet, head of developer experience, are on a whistle-stop tour around the world. Last week, I sat down with the pair in London before DevDay, the company’s annual developer conference. London’s DevDay is the first one for the company outside San Francisco. Godement and Huet are heading to Singapore next.

It’s been a busy few weeks for the company. In London, OpenAI announced updates to its new Realtime API platform, which allows developers to build voice features into their applications. The company is rolling out new voices and a function that lets developers generate prompts, which will allow them to build apps and more helpful voice assistants more quickly. Meanwhile for consumers, OpenAI announced it was launching ChatGPT search, which allows users to search the internet using the chatbot. Read more here.

Both developments pave the way for the next big thing in AI: agents. These are AI assistants that can complete complex chains of tasks, such as booking flights. (You can read my explainer on agents here.)  — Read More

#chatbots