LLM Training: RLHF and Its Alternatives

I frequently reference a process called Reinforcement Learning with Human Feedback (RLHF) when discussing LLMs, whether in the research news or tutorials. RLHF is an integral part of the modern LLM training pipeline due to its ability to incorporate human preferences into the optimization landscape, which can improve the model’s helpfulness and safety.

In this article, I will break down RLHF in a step-by-step manner to provide a reference for understanding its central idea and importance. Following up on the previous Ahead of AI article that featured Llama 2, this article will also include a comparison between ChatGPT’s and Llama 2’s way of doing RLHF. — Read More

#training

FlexiViT: One Model for All Patch Sizes

Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at this https URLRead More

#image-recognition, #training

AI2 drops biggest open dataset yet for training language models

Language models like GPT-4 and Claude are powerful and useful, but the data on which they are trained is a closely guarded secret. The Allen Institute for AI (AI2) aims to reverse this trend with a new, huge text dataset that’s free to use and open to inspection.

Dolma, as the dataset is called, is intended to be the basis for the research group’s planned open language model, or OLMo (Dolma is short for “Data to feed OLMo’s Appetite). As the model is intended to be free to use and modify by the AI research community, so too (argue AI2 researchers) should be the dataset they use to create it. — Read More

#training, #devops

Tips for Taking Advantage of Open Large Language Models

Prompting? Few-Shot? Fine-Tuning? Pretraining from scratch? Open LLMs mean more options for developers.

An increasing variety of large language models (LLMs) are open source, or close to it. The proliferation of models with relatively permissive licenses gives developers more options for building applications. — Read More

#training

Machine unlearning: The critical art of teaching AI to forget

Have you ever tried to intentionally forget something you had already learned? You can imagine how difficult it would be.

As it turns out, it’s also difficult for machine learning (ML) models to forget information. So what happens when these algorithms are trained on outdated, incorrect or private data?

Retraining the model from scratch every time an issue arises with the original dataset is hugely impractical. This has led to the requirement of a new field in AI called machine unlearning. — Read More

#training

101.school

101.school is an experiment in creating AI generated course contents.

It works like this: you enter something you’re curious about, and then we generate a 13 week course on the subject.

You can choose to receive the course via email, or read it on the site. We’ll keep track of your progress. — Read More

#training

VeLO: Training Versatile Learned Optimizers by Scaling Up

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at this http URL. — Read More

#training

The people paid to train AI are outsourcing their work… to AI

A significant proportion of people paid to train AI models may be themselves outsourcing that work to AI, a new study has found. 

It takes an incredible amount of data to train AI systems to perform specific tasks accurately and reliably. Many companies pay gig workers on platforms like Mechanical Turk to complete tasks that are typically hard to automate, such as solving CAPTCHAs, labeling data and annotating text. This data is then fed into AI models to train them. The workers are poorly paid and are often expected to complete lots of tasks very quickly. 

No wonder some of them may be turning to tools like ChatGPT to maximize their earning potential. But how many? — Read More

#strategy, #training

The Curse of Recursion: Training on Generated Data Makes Models Forget

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet. — Read More

#training, #transfer-learning

A visual introduction to machine learning

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. These techniques can be used to make highly accurate predictions. This is a great interactive resource introducing machine learning and machine learning techniques. — Read More

#machine-learning, #training