Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsely activated memory layers complement compute-heavy dense feed-forward layers, providing dedicated capacity to store and retrieve information cheaply. This work takes memory layers beyond proof-of-concept, proving their utility at contemporary scale. On downstream tasks, language models augmented with our improved memory layer outperform dense models with more than twice the computation budget, as well as mixture-of-expert models when matched for both compute and parameters. We find gains are especially pronounced for factual tasks. We provide a fully parallelizable memory layer implementation, demonstrating scaling laws with up to 128B memory parameters, pretrained to 1 trillion tokens, comparing to base models with up to 8B parameters. — Read More
Recent Updates Page 67
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL. — Read More
ChatGPT is used more for science in countries where it is prohibited
Regulating AI is a key societal challenge, but effective methods remain unclear. This study evaluates geographic restrictions on AI services, focusing on ChatGPT, which OpenAI blocks in several countries, including China and Russia. If restrictions were effective, ChatGPT usage in these countries should be minimal. We measured usage with a classifier trained to detect distinctive word choices (e.g., “delve”) typical of early ChatGPT outputs. The classifier, trained on pre- and post-ChatGPT “polished” abstracts, outperformed GPTZero and ZeroGPT on validation sets, including papers with self-reported AI use. Applying our classifier to preprints from Arxiv, BioRxiv, and MedRxiv revealed ChatGPT use in approximately 12.6% of preprints by August 2023, with usage 7.7% higher in restricted countries. This gap emerged before China’s first major domestic LLM became widely available. To address whether high demand could have driven even greater use without restrictions, we compared Asian countries with high expected demand (where English is not an official language) and found higher usage in countries with restrictions. ChatGPT use correlated with increased views and downloads but not with citations or journal placement. Overall, geographic restrictions on ChatGPT appear ineffective in science and potentially other domains, likely due to widespread workarounds. — Read More
Vision Language Models (Better, Faster, Stronger)
Vision Language Models (VLMs) are the talk of the town. In a previous blog post (from April 2024), we talked a lot about VLMs. A major chunk was about LLaVA, the first successful and easily reproducible open-source vision language model, along with tips on how to discover, evaluate, and fine-tune open models.
Since then, so much has changed. Models have become smaller yet more powerful. We’ve seen the rise of new architectures and capabilities (reasoning, agency, long video understanding, etc.). In parallel, entirely new paradigms, such as multimodal Retrieval Augmented Generation (RAG) and multimodal agents have taken shape.
In this blog post, we’ll take a look back and unpack everything that happened with vision language models the past year. You’ll discover key changes, emerging trends, and notable developments. — Read More
China built hundreds of AI data centers to catch the AI boom. Now many stand unused.
A year or so ago, Xiao Li was seeing floods of Nvidia chip deals on WeChat. A real estate contractor turned data center project manager, he had pivoted to AI infrastructure in 2023, drawn by the promise of China’s AI craze.
At that time, traders in his circle bragged about securing shipments of high-performing Nvidia GPUs that were subject to US export restrictions. Many were smuggled through overseas channels to Shenzhen. At the height of the demand, a single Nvidia H100 chip, a kind that is essential to training AI models, could sell for up to 200,000 yuan ($28,000) on the black market.
Now, his WeChat feed and industry group chats tell a different story. Traders are more discreet in their dealings, and prices have come back down to earth. Meanwhile, two data center projects Li is familiar with are struggling to secure further funding from investors who anticipate poor returns, forcing project leads to sell off surplus GPUs. “It seems like everyone is selling, but few are buying,” he says. — Read More
New-type AI Storage Research Report
In 2022, the Ministry of Science and Technology and six other departments issued the “Guiding Opinions on Accelerating Scenario Innovation and Promoting High-quality Economic Development with High-level Application of Artificial Intelligence”, proposing to accelerate the research and development of artificial intelligence technology, product development and industry cultivation, explore new models and paths for the development of artificial intelligence, and promote high-quality economic development with high-level applications of artificial intelligence. In 2023, the Ministry of Industry and Information Technology and six other departments issued the “Action Plan for the High-quality Development of Computing Power Infrastructure”, proposing to strengthen the efficient and flexible guarantee of storage capacity, accelerate the research and development and application of storage capacity technology, continuously improve the storage industry capabilities, and promote the coordinated development of storage, computing and network.
In the era of big models, data determines the heights of artificial intelligence. More training data is the prerequisite for the iteration and upgrading of big models, and higher data quality also determines the effect of big model training. At present, big model technology comprehensively promotes the development of underlying infrastructure, computing power demand continues to rise, and the storage and processing demand for massive data continues to grow, which puts forward higher requirements for the performance, scalability, data security, and data paradigm of artificial intelligence storage.
This report focuses on sorting out and analyzing the concept scope, challenges, key technologies and best practices of new-type AI storage. In terms of concept scope, the basic concepts of new-type artificial intelligence storage are sorted out, and the global artificial intelligence storage strategy is analyzed. In terms of challenges, it points out that new-type AI storage is the basis for large models, but at the same time there are many challenges in terms of massive data collection, training data access efficiency, and real-time reasoning. In terms of key technologies, it explains that new-type AI storage needs to be strengthened in terms of storage media, systems, architecture, data weaving, data paradigms, and data security. In terms of best practices, it introduces practical cases of new-type AI storage in the medical, financial, cloud service providers, and AI companies. Finally, in response to the challenges of the current development of AI storage, this report puts forward suggestions for the future development of new AI storage in China. New-type AI storage-related industries and technologies are in a stage of rapid development, and the new technology ecosystem is changing rapidly. There are still many shortcomings in the report, and we sincerely invite criticism and correction from all walks of life. — Read More
Try Public APIs for free
The Public APIs repository is manually curated by community members like you and folks working at APILayer. It includes an extensive list of public APIs from many domains that you can use for your own products. Consider it a treasure trove of APIs well-managed by the community over the years. — Read More
Which LLM writes the best analytical SQL?
We asked 19 popular LLMs (+1 human) to write analytical SQL queries to filter and aggregate a 200 million row dataset. The result is the first version of the LLM SQL Generation Benchmark.
Using a set of 50 analytical questions inspired by this list from maintainers of ClickHouse®, we measure how well each model can write accurate and efficient SQL. We benchmark success rates, exactness, efficiency, query latency, and other metrics, comparing them to queries produced by an experienced human engineer.
The dataset, which contains 200 million rows of public GitHub events data (sampled from the GH Archive), is hosted in Tinybird, allowing us to run all the queries interactively and measure performance at scale. The full dashboard with results is public here. We will continually update this dashboard as new models are developed and tested (Want us to test a model? Create an issue or run the test yourself and submit a PR with new results here). — Read More
Deepfakes Now Outsmarting Detection By Mimicking Heartbeats
The assumption that deepfakes lack physiological signals, such as heart rate, is no longer valid. Recent research reveals that high-quality deepfakes unintentionally retain the heartbeat patterns from their source videos, undermining traditional detection methods that relied on detecting subtle skin color changes linked to heartbeats. Researchers suggest shifting focus from just detecting heart rate signals to analyzing how blood flow is distributed across different facial regions, providing a more accurate detection strategy. — Read More
Jinmeng 550A model claims to have hit 100% on AIME24
… Jinmeng 550A is a neuro-symbolic AI model reportedly developed by a 14-year-old Chinese prodigy named Shihao Ji. It gained attention for achieving extraordinary results on prominent AI benchmarks:
100% accuracy on AIME24 (American Invitational Mathematics Examination 2024)
99.7% accuracy on MedQA (Medical Question Answering benchmark)
These results were reported on Papers With Code and highlighted in several Chinese tech media outlets, such as Tencent Cloud and Sohu. — Read More