China’s biggest public AI drop since DeepSeek, Baidu’s open source Ernie, is about to hit the market

On Monday, Chinese technology giant Baidu is making its Ernie generative AI large language model open source, a move by China’s tech sector that could be its biggest in the AI race since the emergence of DeepSeek. The open sourcing of Ernie will be a gradual roll-out, according to the company. 

Will it be a shock to the market on the order of DeepSeek? That’s a question which divides AI experts. [Some] say Ernie’s release could cement China’s position as the undisputed AI leader. — Read More

#china-ai

Life of an inference request (vLLM V1): How LLMs are served efficiently at scale

vLLM is an open-source inference engine that serves large language models. We deploy multiple vLLM instances across GPUs and load open weight models like Llama 4 into them. We then load balance traffic across vLLM instances, run health checks, and do upgrades. Our customers consume our managed service by sending their prompts to our API endpoints. This endpoint also determines the vLLM instance that serves their prompt.

vLLM sits at the intersection of AI and systems programming, so we thought that diving into its details might interest some of our readers. In this blog post, we describe how an inference request travels through vLLM’s OpenAI-compatible API server and core engine. We also provide key code pointers.

We assume readers are already familiar with the transformer architecture and large language models. If you’re not, we highly recommend this video by OpenAI co-founder Andrej Karpathy. We will focus on the new V1 architecture of vLLM and how it achieves state-of-the-art text generation performance. If you’re looking for the V0 behavior or multi-modal inference, please refer to other vLLM documentation. — Read More

#performance

Efficient Federated Learning with Encrypted Data Sharing for Data-Heterogeneous Edge Devices

As privacy protection gains increasing importance, more models are being trained on edge devices and subsequently merged into the central server through Federated Learning (FL). However, current research overlooks the impact of network topology, physical distance, and data heterogeneity on edge devices, leading to issues such as increased latency and degraded model performance. To address these issues, we propose a new federated learning scheme on edge devices that called Federated Learning with Encrypted Data Sharing(FedEDS). FedEDS uses the client model and the model’s stochastic layer to train the data encryptor. The data encryptor generates encrypted data and shares it with other clients. The client uses the corresponding client’s stochastic layer and encrypted data to train and adjust the local model. FedEDS uses the client’s local private data and encrypted shared data from other clients to train the model. This approach accelerates the convergence speed of federated learning training and mitigates the negative impact of data heterogeneity, making it suitable for application services deployed on edge devices requiring rapid convergence. Experiments results show the efficacy of FedEDS in promoting model performance. — Read More

#federated-learning