Class Imbalance: Random Sampling and Data Augmentation with Imbalanced-Learn

An exploration of the class imbalance problem, the accuracy paradox and some techniques to solve this problem by using the Imbalanced-Learn library.

One of the challenges that arise when developing machine learning models for classification is class imbalance. Most of the machine learning algorithms for classification were developed assuming balanced classes however, in real life it is not common to have properly balanced data. Due to this, various alternatives have been proposed to address this problem as well as tools to apply these solutions. Such is the case imbalanced-learn [1], a python library that implements the most relevant algorithms to tackle the problem of class imbalance.

In this blog we are going to see what class imbalance is, the problem of implementing accuracy as a metric for unbalanced classes, what random under-sampling and random over-sampling is and imbalanced-learn as an alternative tool to address the class imbalance problem in an appropriate way. Read More

#machine-learning