An ever-increasing number of companies are using large language models (LLMs) to transform both their product experiences and internal operations. These kinds of foundation models represent a new computing platform. The process of prompt engineering is replacing aspects of software development and the scope of what software can achieve is rapidly expanding.
In order to effectively leverage LLMs in production, having confidence in how they perform is paramount. This represents a unique challenge for most companies given the inherent novelty and complexities surrounding LLMs. Unlike traditional software and non-generative machine learning (ML) models, evaluation is subjective, hard to automate and the risk of the system going embarrassingly wrong is higher.
This post provides some thoughts on evaluating LLMs and discusses some emerging patterns I’ve seen work well in practice from experience with thousands of teams deploying LLM applications in production. — Read More
Tag Archives: DevOps
Hugging Face launches open source AI assistant maker to rival OpenAI’s custom GPTs
Hugging Face, the New York City-based startup that offers a popular, developer-focused repository for open source AI code and frameworks (and hosted last year’s “Woodstock of AI”), today announced the launch of third-party, customizable Hugging Chat Assistants.
The new, free product offering allows users of Hugging Chat, the startup’s open source alternative to OpenAI’s ChatGPT, to easily create their own customized AI chatbots with specific capabilities, similar both in functionality and intention to OpenAI’s custom GPT Builder — though that requires a paid subscription to ChatGPT Plus ($20 per month), Team ($25 per user per month paid annually), and Enterprise (variable pricing depending on the needs). – Read More
Meta releases ‘Code Llama 70B’, an open-source behemoth to rival private AI development
Meta AI, the company that brought you Llama 2, the gargantuan language model that can generate anything from tweets to essays, has just released a new and improved version of its code generation model, Code Llama 70B. This updated model can write code in various programming languages, such as Python, C++, Java and PHP, from natural language prompts or existing code snippets. And it can do it faster, better and more accurately than ever before. – Read More
Answer AI: A new old kind of R&D lab
Answer.AI is a new kind of AI R&D lab which creates practical end-user products based on foundational research breakthroughs.
Jeremy Howard (founding CEO, previously co-founder of Kaggle and fast.ai) and Eric Ries (founding director, previously creator of Lean Startup and the Long-Term Stock Exchange) today launched Answer.AI, a new kind of AI R&D lab which creates practical end-user products based on foundational research breakthroughs. The creation of Answer.AI is supported by an investment of USD10m from Decibel VC. Answer.AI will be a fully-remote team of deep-tech generalists—the world’s very best, regardless of where they live, what school they went to, or any other meaningless surface feature. – Read More
Mixtral-8x7B
The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mistral-8x7B outperforms Llama 2 70B on most benchmarks we tested.
For full details of this model please read our release blog post. – Read More
Decoding LLMs: Creating Transformer Encoders and Multi-Head Attention Layers in Python from Scratch
Today, Computational Natural Language Processing (NLP) is a rapidly evolving endeavour in which the power of computation meets linguistics. The linguistic side of it is mainly attributed to the theory of Distributive Semantics by John Rupert Firth. He once said the following:
“You shall know a word by the company it keeps”
So, the semantic representation of a word is determined by the context in which it is being used. It is precisely in attendance to this assumption that the paper “Attention is all you need” by Ashish Vaswani et. al. [1] assumes its groundbreaking relevance. It set the transformer architecture as the core of many of the rapidly growing tools like BERT, GPT4, Llama, etc.
In this article, we examine the key mathematical operations at the heart of the encoder segment in the transformer architecture. — Read More
Domain Adaptation of A Large Language Model
Large language models (LLMs) like BERT are usually pre-trained on general domain corpora like Wikipedia and BookCorpus. If we apply them to more specialized domains like medical, there is often a drop in performance compared to models adapted for those domains.
In this article, we will explore how to adapt a pre-trained LLM like Deberta base to medical domain using the HuggingFace Transformers library. Specifically, we will cover an effective technique called intermediate pre-training where we do further pre-training of the LLM on data from our target domain. This adapts the model to the new domain, and improves its performance.
This is a simple yet effective technique to tune LLMs to your domain and gain significant improvements in downstream task performance. — Read More
OpenChat: Advancing Open-source Language Models with Mixed-Quality Data
OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT – a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision. — Read More
Introducing EdgeLLama – An Open Standard for Decentralized AI
We, the GPU poor, have come up with a peer-to-peer network design to enable running Mistral7B and other models which will make AI use more free, both as in beer and as in speech. We believe in e/acc, and we want to make AI abundant. This is the moment in time, when we start taking back control from the few powerful AI companies.
Right now, our AI use is a function of expensive monthly subscriptions, rate and usage limits imposed by datacenter-cloud run AI companies. This gives them the power to decide what we can prompt with and how much of AI we even have access to. The immense power they wield also imposes an emotional burden on them, and they are trying to appeal to the government to now impose stifling regulations (a concept called “regulatory capture”, see@bgurley‘s talk).
Well, we, a bunch of AI and open network aficionados, want to make their lives’ easier and take that power away from them. Think BitTorrent from the early 2000s, when you could make your own computer available and effortlessly share files with each other in an open network. The advent of that technology, which was used by over 100 million people running nodes on their home computers, imposed a forcing function on entertainment business models in general. Better user experiences emerged, providing unlimited access to top-tier content for an insanely low fees. — Read More