The Guide To LLM Evals: How To Build and Benchmark Your Evals

How to build and run LLM evals — and why you should use precision and recall when benchmarking your LLM prompt template

Large language models (LLMs) are an incredible tool for developers and business leaders to create new value for consumers. They make personal recommendations, translate between unstructured and structured data, summarize large amounts of information, and do so much more.

As the applications multiply, so does the importance of measuring the performance of LLM-based applications. This is a nontrivial problem for several reasons: user feedback or any other “source of truth” is extremely limited and often nonexistent; even when possible, human labeling is still expensive; and it is easy to make these applications complex.

This complexity is often hidden by the abstraction layers of code and only becomes apparent when things go wrong. One line of code can initiate a cascade of calls (spans). Different evaluations are required for each span, thus multiplying your problems. For example, the simple code snippet below triggers multiple sub-LLM calls. — Read More

#accuracy, #devops

Non-engineers guide: Train a LLaMA 2 chatbot

In this tutorial we will show you how anyone can build their own open-source ChatGPT without ever writing a single line of code! We’ll use the LLaMA 2 base model, fine tune it for chat with an open-source instruction dataset and then deploy the model to a chat app you can share with your friends. All by just clicking our way to greatness.

Why is this important? Well, machine learning, especially LLMs (Large Language Models), has witnessed an unprecedented surge in popularity, becoming a critical tool in our personal and business lives. Yet, for most outside the specialized niche of ML engineering, the intricacies of training and deploying these models appears beyond reach. If the anticipated future of machine learning is to be one filled with ubiquitous personalized models, then there’s an impending challenge ahead: How do we empower those with non-technical backgrounds to harness this technology independently? — Read More

#devops

LLMs Are Not All You Need

Large Language Models (LLMs) are powering the next big wave of innovation in technology, as with the internet, smartphones, and the cloud — generative AI is poised to change the fabric of our society.

GenAI tools like GitHub Copilot have been supercharging the productivity of developers worldwide since 2021.  … The way we work is soon to shift. Goldman Sachs expects GenAI to raise global GDP by 7% in the next ten years. …LLMs alone are good, but not 7% of global GDP good. We need the ecosystem built around LLMs to make the most of them. — Read More

#devops

Spread Your Wings: Falcon 180B is here

Today, we’re excited to welcome TII’s Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII’s RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

In terms of capabilities, Falcon 180B achieves state-of-the-art results across natural language tasks. It tops the leaderboard for (pre-trained) open-access models and rivals proprietary models like PaLM-2. — Read More

#chatbots, #devops, #nlp

Why “AI” can’t succeed without APIs

Mega tech trends like the cloud, the mobile phone era, metaverse and now AI all depend on enabling technologies sitting right beneath the surface hidden from nearly everyone’s view. Their structural integrity depends on the flawless operation of those enabling technologies, which in many cases are Application Programming Interfaces (APIs). As such, their success depends on API adoption. Nowhere is this truer than in the rapid proliferation of AI technologies, like generative AI, which require a simple and very easy-to-use interface that gives everyone access to the technology. The secret here is that these AI tools are just thin UIs on top of APIs that connect into the highly complex and intensive work of a large language model (LLM).

It’s important to remember that AI models don’t think for themselves, they only appear to be so that we can interact with them in a familiar way. APIs are essentially acting as translators for AI platforms as they’re relatively straightforward, highly structured and standardized on a technological level. What most people think of as “AI” should be viewed through the lens of an API product; and with that mindset, organizations can best prepare for what potential use cases are possible and how to ensure their workforces have the skills to put them into action. — Read More

#devops

Machine Learning Libraries For Any Project

There are many libraries out there that can be used in machine learning projects. Of course, some of them gained considerable reputations through the years. Such libraries are the straight-away picks for anyone starting a new project which utilizes machine learning algorithms. However, choosing the correct set (or stack) may be quite challenging.

In this post, I would like to give you a general overview of the machine learning libraries landscape and share some of my thoughts about working with them.  — Read More

#devops

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

We [Alibaba] introduce the Qwen-VL series, a set of large-scale vision-language models designed to perceive and understand both text and images. Comprising Qwen-VL and Qwen-VL-Chat, these models exhibit remarkable performance in tasks like image captioning, question answering, visual localization, and flexible interaction. The evaluation covers a wide range of tasks including zero-shot captioning, visual or document visual question answering, and grounding. We demonstrate the Qwen-VL outperforms existing Large Vision Language Models (LVLMs). We present their architecture, training, capabilities, and performance, highlighting their contributions to advancing multimodal artificial intelligence. Code, demo and models are available at https://github.com/QwenLM/Qwen-VL. — Read More

#china-ai, #devops

AIColor: Colorize your old Photos with the power of AI

If you’re looking to colorize old black and white photos, our AI photo colorizer can help you bring your memories to life. — Read More

#devops

Introducing Code Llama, a state-of-the-art large language model for coding

Today, we are releasing Code Llama, a large language model (LLM) that can use text prompts to generate code. Code Llama is state-of-the-art for publicly available LLMs on code tasks, and has the potential to make workflows faster and more efficient for current developers and lower the barrier to entry for people who are learning to code. Code Llama has the potential to be used as a productivity and educational tool to help programmers write more robust, well-documented software. — Read More

Read the paper

Access the code

#devops

LLMStack

LLMStack is a no-code platform for building generative AI applications, chatbots, agents and connecting them to your data and business processes.

Build tailor-made generative AI applications, chatbots and agents that cater to your unique needs by chaining multiple LLMs. Seamlessly integrate your own data and GPT-powered models without any coding experience using LLMStack’s no-code builder. Trigger your AI chains from Slack or Discord. Deploy to the cloud or on-premise. — Read More

#devops