Optimum Transformers: how to save over $20k a year on NLP

In this tutorial we are going to check if it is possible to speed up NLP models more than 10x times and get 1ms latency as in Hugging Face Infinity and save over $20k a year.

Spoileryes, it is possible, and with the help of this article it is easy to reproduce and adapt it to your REAL projects.

And for those who are too lazy to read all this and want to get everything out of the box: https://github.com/AlekseyKorshuk/optimum-transformers. Read More

#nlp