We use the Elo rating system to calculate the relative performance of the models. Elo is a method for calculating the relative skill levels of players in zero-sum games, which was invented as an improved chess-rating system. The difference in the ratings between two models serves as a predictor of the model’s relative performance.You can view the voting data, basic analyses, and calculation procedure in this notebook. We will periodically release new leaderboards. — Read More
You can compare models’ relative performance for yourself, or add new models, here.