Thanks to the amazing success of AI, we’ve seen more and more organizations implement Machine Learning into their pipelines. As the access to and collection of data increases, we have seen massive datasets being used to train giant deep learning models that reach superhuman performances. This has led to a lot of hype around domains like Data Science and Big Data, fueled even more by the recent boom in Large Language Models.
Big Tech companies (and Deep Learning Experts on Twitter/YouTube) have really fallen in love with the ‘add more data, increase model size, train for months’ approach that has become the status-quo in Machine Learning these days. However, heretics from Meta AI published research that was funded by Satan- and it turns out this way of doing things is extremely inefficient. And completely unnecessary. In this post, I will be going over their paper- Beyond neural scaling laws: beating power law scaling via data pruning, where they share ‘evidence’ about how selecting samples intelligently can increase your model performance, without ballooning your costs out of control. While this paper focuses on Computer Vision- the principles of their research will be interesting to you regardless of your specialization. Read More
Daily Archives: September 24, 2022
The Large Language Model Landscape
The number of commercial and open LLM providers has exploded in the last 2 years, and there are now many options to choose from for all types of language tasks. And while the main way of interacting with LLMs is still via APIs and rudimentary Playgrounds, I expect that an ecosystem of tooling that helps accelerate their wide adoption will be a growing market in the near future.
The TL;DR
- Large Language Models (LLMs) functionality can be segmented into five areas: Knowledge Answering, Translation, Text Generation, Response Generation and Classification.
- Classification is arguably the most important to today’s enterprise needs, and text generation the most impressive and versatile.
- The commercial offerings and more general offerings are Cohere, GooseAI, OpenAI and AI21labs. GooseAI currently only focuses on generation.
- The open-source offerings are Sphere, NLLB, Blender Bot, DialoGPT, GODEL and BLOOM.
- The tooling ecosystem is still in a nascent state with many areas of opportunity.
#nlp