New-type AI Storage Research Report

In 2022, the Ministry of Science and Technology and six other departments issued the “Guiding Opinions on Accelerating Scenario Innovation and Promoting High-quality Economic Development with High-level Application of Artificial Intelligence”, proposing to accelerate the research and development of artificial intelligence technology, product development and industry cultivation, explore new models and paths for the development of artificial intelligence, and promote high-quality economic development with high-level applications of artificial intelligence. In 2023, the Ministry of Industry and Information Technology and six other departments issued the “Action Plan for the High-quality Development of Computing Power Infrastructure”, proposing to strengthen the efficient and flexible guarantee of storage capacity, accelerate the research and development and application of storage capacity technology, continuously improve the storage industry capabilities, and promote the coordinated development of storage, computing and network. 

In the era of big models, data determines the heights of artificial intelligence. More training data is the prerequisite for the iteration and upgrading of big models, and higher data quality also determines the effect of big model training. At present, big model technology comprehensively promotes the development of underlying infrastructure, computing power demand continues to rise, and the storage and processing demand for massive data continues to grow, which puts forward higher requirements for the performance, scalability, data security, and data paradigm of artificial intelligence storage. 

This report focuses on sorting out and analyzing the concept scope, challenges, key technologies and best practices of new-type AI storage. In terms of concept scope, the basic concepts of new-type artificial intelligence storage are sorted out, and the global artificial intelligence storage strategy is analyzed. In terms of challenges, it points out that new-type AI storage is the basis for large models, but at the same time there are many challenges in terms of massive data collection, training data access efficiency, and real-time reasoning. In terms of key technologies, it explains that new-type AI storage needs to be strengthened in terms of storage media, systems, architecture, data weaving, data paradigms, and data security. In terms of best practices, it introduces practical cases of new-type AI storage in the medical, financial, cloud service providers, and AI companies. Finally, in response to the challenges of the current development of AI storage, this report puts forward suggestions for the future development of new AI storage in China. New-type AI storage-related industries and technologies are in a stage of rapid development, and the new technology ecosystem is changing rapidly. There are still many shortcomings in the report, and we sincerely invite criticism and correction from all walks of life. — Read More

#china-ai

Try Public APIs for free

The Public APIs repository is manually curated by community members like you and folks working at APILayer. It includes an extensive list of public APIs from many domains that you can use for your own products. Consider it a treasure trove of APIs well-managed by the community over the years. — Read More

#devops

Which LLM writes the best analytical SQL?

We asked 19 popular LLMs (+1 human) to write analytical SQL queries to filter and aggregate a 200 million row dataset. The result is the first version of the LLM SQL Generation Benchmark.

Using a set of 50 analytical questions inspired by this list from maintainers of ClickHouse®, we measure how well each model can write accurate and efficient SQL. We benchmark success rates, exactness, efficiency, query latency, and other metrics, comparing them to queries produced by an experienced human engineer.

The dataset, which contains 200 million rows of public GitHub events data (sampled from the GH Archive), is hosted in Tinybird, allowing us to run all the queries interactively and measure performance at scale. The full dashboard with results is public here. We will continually update this dashboard as new models are developed and tested (Want us to test a model? Create an issue or run the test yourself and submit a PR with new results here). — Read More

#devops

Deepfakes Now Outsmarting Detection By Mimicking Heartbeats

The assumption that deepfakes lack physiological signals, such as heart rate, is no longer valid. Recent research reveals that high-quality deepfakes unintentionally retain the heartbeat patterns from their source videos, undermining traditional detection methods that relied on detecting subtle skin color changes linked to heartbeats. Researchers suggest shifting focus from just detecting heart rate signals to analyzing how blood flow is distributed across different facial regions, providing a more accurate detection strategy. — Read More

#fake