Mixtral-8x7B

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mistral-8x7B outperforms Llama 2 70B on most benchmarks we tested.

For full details of this model please read our release blog post.  – Read More

#devops

Decoding LLMs: Creating Transformer Encoders and Multi-Head Attention Layers in Python from Scratch

Today, Computational Natural Language Processing (NLP) is a rapidly evolving endeavour in which the power of computation meets linguistics. The linguistic side of it is mainly attributed to the theory of Distributive Semantics by John Rupert Firth. He once said the following:

“You shall know a word by the company it keeps”

So, the semantic representation of a word is determined by the context in which it is being used. It is precisely in attendance to this assumption that the paper “Attention is all you need” by Ashish Vaswani et. al. [1] assumes its groundbreaking relevance. It set the transformer architecture as the core of many of the rapidly growing tools like BERT, GPT4, Llama, etc.

In this article, we examine the key mathematical operations at the heart of the encoder segment in the transformer architecture. — Read More

#nlp, #devops

Domain Adaptation of A Large Language Model

Large language models (LLMs) like BERT are usually pre-trained on general domain corpora like Wikipedia and BookCorpus. If we apply them to more specialized domains like medical, there is often a drop in performance compared to models adapted for those domains.

In this article, we will explore how to adapt a pre-trained LLM like Deberta base to medical domain using the HuggingFace Transformers library. Specifically, we will cover an effective technique called intermediate pre-training where we do further pre-training of the LLM on data from our target domain. This adapts the model to the new domain, and improves its performance.

This is a simple yet effective technique to tune LLMs to your domain and gain significant improvements in downstream task performance. — Read More

#devops

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT – a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision. — Read More

#devops

Introducing EdgeLLama – An Open Standard for Decentralized AI

We, the GPU poor, have come up with a peer-to-peer network design to enable running Mistral7B and other models which will make AI use more free, both as in beer and as in speech. We believe in e/acc, and we want to make AI abundant. This is the moment in time, when we start taking back control from the few powerful AI companies.

Right now, our AI use is a function of expensive monthly subscriptions, rate and usage limits imposed by datacenter-cloud run AI companies. This gives them the power to decide what we can prompt with and how much of AI we even have access to. The immense power they wield also imposes an emotional burden on them, and they are trying to appeal to the government to now impose stifling regulations (a concept called “regulatory capture”, see@bgurley‘s talk).

Well, we, a bunch of AI and open network aficionados, want to make their lives’ easier and take that power away from them. Think BitTorrent from the early 2000s, when you could make your own computer available and effortlessly share files with each other in an open network. The advent of that technology, which was used by over 100 million people running nodes on their home computers, imposed a forcing function on entertainment business models in general. Better user experiences emerged, providing unlimited access to top-tier content for an insanely low fees. — Read More

#devops

LLAVA: The AI That Microsoft Didn’t Want You to Know About!

Read More

#devops, #videos

The Guide To LLM Evals: How To Build and Benchmark Your Evals

How to build and run LLM evals — and why you should use precision and recall when benchmarking your LLM prompt template

Large language models (LLMs) are an incredible tool for developers and business leaders to create new value for consumers. They make personal recommendations, translate between unstructured and structured data, summarize large amounts of information, and do so much more.

As the applications multiply, so does the importance of measuring the performance of LLM-based applications. This is a nontrivial problem for several reasons: user feedback or any other “source of truth” is extremely limited and often nonexistent; even when possible, human labeling is still expensive; and it is easy to make these applications complex.

This complexity is often hidden by the abstraction layers of code and only becomes apparent when things go wrong. One line of code can initiate a cascade of calls (spans). Different evaluations are required for each span, thus multiplying your problems. For example, the simple code snippet below triggers multiple sub-LLM calls. — Read More

#accuracy, #devops

Non-engineers guide: Train a LLaMA 2 chatbot

In this tutorial we will show you how anyone can build their own open-source ChatGPT without ever writing a single line of code! We’ll use the LLaMA 2 base model, fine tune it for chat with an open-source instruction dataset and then deploy the model to a chat app you can share with your friends. All by just clicking our way to greatness.

Why is this important? Well, machine learning, especially LLMs (Large Language Models), has witnessed an unprecedented surge in popularity, becoming a critical tool in our personal and business lives. Yet, for most outside the specialized niche of ML engineering, the intricacies of training and deploying these models appears beyond reach. If the anticipated future of machine learning is to be one filled with ubiquitous personalized models, then there’s an impending challenge ahead: How do we empower those with non-technical backgrounds to harness this technology independently? — Read More

#devops

LLMs Are Not All You Need

Large Language Models (LLMs) are powering the next big wave of innovation in technology, as with the internet, smartphones, and the cloud — generative AI is poised to change the fabric of our society.

GenAI tools like GitHub Copilot have been supercharging the productivity of developers worldwide since 2021.  … The way we work is soon to shift. Goldman Sachs expects GenAI to raise global GDP by 7% in the next ten years. …LLMs alone are good, but not 7% of global GDP good. We need the ecosystem built around LLMs to make the most of them. — Read More

#devops

Spread Your Wings: Falcon 180B is here

Today, we’re excited to welcome TII’s Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII’s RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

In terms of capabilities, Falcon 180B achieves state-of-the-art results across natural language tasks. It tops the leaderboard for (pre-trained) open-access models and rivals proprietary models like PaLM-2. — Read More

#chatbots, #devops, #nlp