LLM Training: RLHF and Its Alternatives

I frequently reference a process called Reinforcement Learning with Human Feedback (RLHF) when discussing LLMs, whether in the research news or tutorials. RLHF is an integral part of the modern LLM training pipeline due to its ability to incorporate human preferences into the optimization landscape, which can improve the model’s helpfulness and safety.

In this article, I will break down RLHF in a step-by-step manner to provide a reference for understanding its central idea and importance. Following up on the previous Ahead of AI article that featured Llama 2, this article will also include a comparison between ChatGPT’s and Llama 2’s way of doing RLHF. — Read More

#training

The AI ‘Race’: China vs. the US with Jeffrey Ding and Karen Hao

In the debate over slowing down AI, we often hear the same argument against regulation. “What about China? We can’t let China get ahead.” To dig into the nuances of this argument, Tristan and Aza speak with academic researcher Jeffrey Ding and journalist Karen Hao, who take us through what’s really happening in Chinese AI development. They address China’s advantages and limitations, what risks are overblown, and what, in this multi-national competition, is at stake as we imagine the best possible future for everyone. — Read More

#china-vs-us, #podcasts

The Novel Written about—and with—Artificial Intelligence

THREE DISTINCT personalities, all female, walk into a bar together in Do You Remember Being Born?  and emerge with fat paycheques, a collaborative long poem slyly titled “Self-portrait,” and a lot of nagging doubt. Actually, the proverbial bar in Sean Michaels’s dizzying new novel is not a bar but the Mind Studio, an entry-by-key-card-and-retina-scan-only room on an unnamed tech giant’s San Francisco campus. And one of the three personalities, a “2.5-trillion-parameter neural network” named Charlotte, is better described as feminine than female. But the doubt, tucked under a lot of surface-level optimism, is real, instilled in characters and readers alike by the author. — Read More

#human

Meta’s VR technology is helping to train surgeons and treat patients, though costs remain a hurdle

Just days before assisting in his first major shoulder-replacement surgery last year, Dr. Jake Shine strapped on a virtual reality headset and got to work.

As a third-year orthopedics resident at Kettering Health Dayton in Ohio, Shine was standing in the medical center’s designated VR lab with his attending physician, who would oversee the procedure. 

Both doctors were wearing Meta Quest 2 headsets as they walked through a 3D simulation of the surgery. 

… Ultimately, there were no complications in the procedure and the patient made a full recovery.

While consumer VR remains a niche product and a massive money-burning venture for Meta CEO Mark Zuckerberg, the technology is proving to be valuable in certain corners of health care.  — Read More

#metaverse

LLMs and Tool Use

Last March, just two weeks after GPT-4 was released, researchers at Microsoft quietly announced a plan to compile millions of APIs—tools that can do everything from ordering a pizza to solving physics equations to controlling the TV in your living room—into a compendium that would be made accessible to large language models (LLMs). This was just one milestone in the race across industry and academia to find the best ways to teach LLMs how to manipulate tools, which would supercharge the potential of AI more than any of the impressive advancements we’ve seen to date.

The Microsoft project aims to teach AI how to use any and all digital tools in one fell swoop, a clever and efficient approach. Today, LLMs can do a pretty good job of recommending pizza toppings to you if you describe your dietary preferences and can draft dialog that you could use when you call the restaurant. But most AI tools can’t place the order, not even online. In contrast, Google’s seven-year-old Assistant tool can synthesize a voice on the telephone and fill out an online order form, but it can’t pick a restaurant or guess your order. By combining these capabilities, though, a tool-using AI could do it all. An LLM with access to your past conversations and tools like calorie calculators, a restaurant menu database, and your digital payment wallet could feasibly judge that you are trying to lose weight and want a low-calorie option, find the nearest restaurant with toppings you like, and place the delivery order. If it has access to your payment history, it could even guess at how generously you usually tip. If it has access to the sensors on your smartwatch or fitness tracker, it might be able to sense when your blood sugar is low and order the pie before you even realize you’re hungry.

Perhaps the most compelling potential applications of tool use are those that give AIs the ability to improve themselves.  — Read More

#big7, #multi-modal

State of GPT | BRK216HFS

Read More
#videos

FLM-101B: An Open LLM and How to Train It with $100K Budget

Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks. Despite these successes, their development faces two main challenges: (i) high computational cost; and (ii) difficulty in conducting fair and objective evaluations. LLMs are prohibitively expensive, making it feasible for only a few major players to undertake their training, thereby constraining both research and application opportunities. This underscores the importance of cost-effective LLM training. In this paper, we utilize a growth strategy to significantly reduce LLM training cost. We demonstrate that an LLM with 101B parameters and 0.31TB tokens can be trained on a 100K budget. We also adopt a systematic evaluation paradigm for the IQ evaluation of LLMs, in complement to existing evaluations that focus more on knowledge-oriented abilities. We introduce our benchmark including evaluations on important aspects of intelligence including symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model FLM-101B, trained with a budget of $100K, achieves comparable performance to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially in the IQ benchmark evaluations with contexts unseen in training data. The checkpoint of FLM-101B will be open-sourced at this https URL. — Read More

#nlp, #strategy