The scourge of tuberculosis (TB) may be largely a distant memory for most Americans and Europeans, but it killed roughly 1.25 million people last year around the world. A non-profit based in India, which accounts for more than a quarter of all cases, is developing AI tools that could boost efforts to eradicate the disease.
Roughly 10 million people a year fall ill with TB, making it one of the world’s most prevalent infectious diseases. In 2018, Indian Prime Minister Narendra Modi made an ambitious pledge to eliminate TB in India by 2025. With 2.5 million cases recorded in India last year, that goal clearly won’t be met; still, the country has invested hundreds of millions of dollars in a vast national TB program, and has reduced the disease’s incidence by about 18 percent between 2015 and 2023.
… Indian non-profit Wadhwani AI has developed a suite of AI-powered tools to assist health workers detect undiagnosed cases, decide on treatment plans, and prevent people from dropping out of treatment. Working with the Indian government and the U.S. Agency for International Development, the organization is currently piloting these tools across the country. And Wadhwani’s director of solutions, Nakul Jain, says 2025 could see several incorporated into India’s national TB patient management system, Nikshay. — Read More
Recent Updates Page 87
Transparency assessment of 15 Chinese large models: only 4 allow users to withdraw voiceprint data
None of the 15 tested large model products disclosed the source of training data; based on technical limitations, each company claimed that it could not fully guarantee the authenticity and accuracy of AI-generated content; the vast majority of large model products stated that the information content and prompts entered by users would be used for model training, and only 4 allowed users to revoke authorization of voice data.
…. The three AI products with the highest transparency scores are: Tencent Yuanbao (72 points), iFlytek’s SparkDesk (69 points), and Zhipu’s Qingyan (67 points); the three that rank the lowest are: Baichuan’s Baixiaoying (54), ModelBest’s Luca (51 points), and Metaso [秘塔] (43 points).
The “Report” calls for enhancing the transparency of large model services, which is directly linked to whether the model is trustworthy, and also related to users’ evaluation of the accuracy and reliability of AI-generated content, and better identification of potential AI risks. — Read More
Is OpenAI o3 Really AGI?
The world may have changed, and we might not have realized it yet.
Yesterday, OpenAI shocked (and this is not hyperbole) everyone with the announcement of OpenAI o3 and o3-mini, the brand new models of the ‘o’ family (they skipped ‘o2’ due to trademark reasons).
o3 results are so astonishing that some people are actually convinced that it is AGI, as it destroys some of the so-called ‘impossible’ benchmarks for current models. — Read More
The AI Trillion-Dollar Product
In a very recent interview, Satya Nadella, Microsoft’s CEO, claimed that current business applications will “collapse in the agent era.” Notably, he is referring to the very same apps his company is currently selling. Thus, he is predicting the death of its own current business model in favor of AI agents.
But this vision implies a much more powerful change that Satya is less keen on mentioning because it directly impacts Microsoft’s raison d’être: the introduction of AI as a structural part of general-purpose computing, the end game of ChatGPT: the LLM Operating System, or LLM OS.
This vision is so powerful that it is unequivocally OpenAI’s grand plan. Today, we are distilling their vision into simple words. I believe this is one of my most didactic articles on the future of AI. — Read More
1-800-ChatGPT – Calling and Messaging ChatGPT with your phone
1-800-ChatGPT is an experimental new launch to enable wider access to ChatGPT. You can now talk to ChatGPT via phone call or message ChatGPT via WhatsApp at 1-800-ChatGPT without needing an account.
… You can talk to 1-800-ChatGPT for 15 minutes per month for free, with a daily limit on WhatsApp messages. We may adjust usage limits based on capacity if needed. — Read More
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
Recent advancements in large language models (LLMs) have sparked optimism about their potential to accelerate scientific discovery, with a growing number of works proposing research agents that autonomously generate and validate new ideas. Despite this, no evaluations have shown that LLM systems can take the very first step of producing novel, expert-level ideas, let alone perform the entire research process. We address this by establishing an experimental design that evaluates research idea generation while controlling for confounders and performs the first head-to-head comparison between expert NLP researchers and an LLM ideation agent. By recruiting over 100 NLP researchers to write novel ideas and blind reviews of both LLM and human ideas, we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility. Studying our agent baselines closely, we identify open problems in building and evaluating research agents, including failures of LLM self-evaluation and their lack of diversity in generation. Finally, we acknowledge that human judgements of novelty can be difficult, even by experts, and propose an end-to-end study design which recruits researchers to execute these ideas into full projects, enabling us to study whether these novelty and feasibility judgements result in meaningful differences in research outcome. — Read More
HunyuanVideo
We present HunyuanVideo, a novel open-source video foundation model that exhibits performance in video generation that is comparable to, if not superior to, leading closed-source models. In order to train HunyuanVideo model, we adopt several key technologies for model learning, including data curation, image-video joint model training, and an efficient infrastructure designed to facilitate large-scale model training and inference. Additionally, through an effective strategy for scaling model architecture and dataset, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. — Read More
Microsoft’s smaller AI model beats the big guys: Meet Phi-4, the efficiency king
Microsoft launched a new artificial intelligence model today that achieves remarkable mathematical reasoning capabilities while using far fewer computational resources than its larger competitors. The 14-billion-parameter Phi-4 frequently outperforms much larger models like Google’s Gemini Pro 1.5, marking a significant shift in how tech companies might approach AI development.
The breakthrough directly challenges the AI industry’s “bigger is better” philosophy, where companies have raced to build increasingly massive models. While competitors like OpenAI’s GPT-4o and Google’s Gemini Ultra operate with hundreds of billions or possibly trillions of parameters, Phi-4’s streamlined architecture delivers superior performance in complex mathematical reasoning. — Read More
Google Veo 2 Demo – The Best AI Video Model Yet
I Went to the Premiere of the First Commercially Streaming AI-Generated Movies
Movies are supposed to transport you places. At the end of last month, I was sitting in the Chinese Theater, one of the most iconic movie theaters in Hollywood, in the same complex where the Oscars are held. And as I was watching the movie, I found myself transported to the past, thinking about one of my biggest regrets. When I was in high school, I went to a theater to watch a screening of a movie one of my classmates had made. I was 14 years old, and I reviewed it for the school newspaper. I savaged the film’s special effects, which were done by hand with love and care by someone my own age, and were lightyears better than anything I could do. I had no idea what I was talking about, how special effects were made, or how to review a movie. The student who made the film rightfully hated me, and I have felt bad about what I wrote ever since.
So, 20 years later, I’m sitting in the Chinese Theater watching AI-generated movies in which the directors sometimes cannot make the characters consistently look the same, or make audio sync with lips in a natural-seeming way, and I am thinking about the emotions these films are giving me. The emotion that I feel most strongly is “guilt,” because I know there is no way to write about what I am watching without explaining that these are bad films, and I cannot believe that they are going to be imminently commercially released, and the people who made them are all sitting around me.
Then I remembered that I am not watching student films made with love by an enthusiastic high school student. I am watching films that were made for TCL, the largest TV manufacturer on Earth as part of a pilot program designed to normalize AI movies and TV shows for an audience that it plans to monetize explicitly with targeted advertising and whose internal data suggests that the people who watch its free television streaming network are too lazy to change the channel. I know this is the plan because TCL’s executives just told the audience that this is the plan. – Read More