Elon Musk says xAI will open source Grok this week

Elon Musk’s AI startup xAI will open source Grok, its chatbot rivaling ChatGPT, this week, the entrepreneur said, days after suing OpenAI and complaining that the Microsoft-backed startup had deviated from its open source roots.

xAI released Grok last year, arming it with features including access to “real-time” information and views undeterred by “politically correct” norms. The service is available to customers paying for X’s $16 monthly subscription. — Read More

GitHub

#devops

Gemma: Introducing new state-of-the-art open models

At Google, we believe in making AI helpful for everyone. We have a long history of contributing innovations to the open community, such as with TransformersTensorFlowBERTT5JAXAlphaFold, and AlphaCode. Today, we’re excited to introduce a new generation of open models from Google to assist developers and researchers in building AI responsibly.

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models. Developed by Google DeepMind and other teams across Google, Gemma is inspired by Gemini, and the name reflects the Latin gemma, meaning “precious stone.” Accompanying our model weights, we’re also releasing tools to support developer innovation, foster collaboration, and guide responsible use of Gemma models. — Read More

#devops

Evaluating LLM Applications

An ever-increasing number of companies are using large language models (LLMs) to transform both their product experiences and internal operations. These kinds of foundation models represent a new computing platform. The process of prompt engineering is replacing aspects of software development and the scope of what software can achieve is rapidly expanding.

In order to effectively leverage LLMs in production, having confidence in how they perform is paramount. This represents a unique challenge for most companies given the inherent novelty and complexities surrounding LLMs. Unlike traditional software and non-generative machine learning (ML) models, evaluation is subjective, hard to automate and the risk of the system going embarrassingly wrong is higher.

This post provides some thoughts on evaluating LLMs and discusses some emerging patterns I’ve seen work well in practice from experience with thousands of teams deploying LLM applications in production. — Read More

#devops

Hugging Face launches open source AI assistant maker to rival OpenAI’s custom GPTs

Hugging Face, the New York City-based startup that offers a popular, developer-focused repository for open source AI code and frameworks (and hosted last year’s “Woodstock of AI”), today announced the launch of third-party, customizable Hugging Chat Assistants.

The new, free product offering allows users of Hugging Chat, the startup’s open source alternative to OpenAI’s ChatGPT, to easily create their own customized AI chatbots with specific capabilities, similar both in functionality and intention to OpenAI’s custom GPT Builder — though that requires a paid subscription to ChatGPT Plus ($20 per month), Team ($25 per user per month paid annually), and Enterprise (variable pricing depending on the needs).  – Read More

#devops

Meta releases ‘Code Llama 70B’, an open-source behemoth to rival private AI development

Meta AI, the company that brought you Llama 2, the gargantuan language model that can generate anything from tweets to essays, has just released a new and improved version of its code generation model, Code Llama 70B. This updated model can write code in various programming languages, such as Python, C++, Java and PHP, from natural language prompts or existing code snippets. And it can do it faster, better and more accurately than ever before.  – Read More

#devops

Answer AI: A new old kind of R&D lab

Answer.AI is a new kind of AI R&D lab which creates practical end-user products based on foundational research breakthroughs.

Jeremy Howard (founding CEO, previously co-founder of Kaggle and fast.ai) and Eric Ries (founding director, previously creator of Lean Startup and the Long-Term Stock Exchange) today launched Answer.AI, a new kind of AI R&D lab which creates practical end-user products based on foundational research breakthroughs. The creation of Answer.AI is supported by an investment of USD10m from Decibel VC. Answer.AI will be a fully-remote team of deep-tech generalists—the world’s very best, regardless of where they live, what school they went to, or any other meaningless surface feature.  – Read More

#strategy

#devops

Mixtral-8x7B

The Mixtral-8x7B Large Language Model (LLM) is a pretrained generative Sparse Mixture of Experts. The Mistral-8x7B outperforms Llama 2 70B on most benchmarks we tested.

For full details of this model please read our release blog post.  – Read More

#devops

Decoding LLMs: Creating Transformer Encoders and Multi-Head Attention Layers in Python from Scratch

Today, Computational Natural Language Processing (NLP) is a rapidly evolving endeavour in which the power of computation meets linguistics. The linguistic side of it is mainly attributed to the theory of Distributive Semantics by John Rupert Firth. He once said the following:

“You shall know a word by the company it keeps”

So, the semantic representation of a word is determined by the context in which it is being used. It is precisely in attendance to this assumption that the paper “Attention is all you need” by Ashish Vaswani et. al. [1] assumes its groundbreaking relevance. It set the transformer architecture as the core of many of the rapidly growing tools like BERT, GPT4, Llama, etc.

In this article, we examine the key mathematical operations at the heart of the encoder segment in the transformer architecture. — Read More

#nlp, #devops

Domain Adaptation of A Large Language Model

Large language models (LLMs) like BERT are usually pre-trained on general domain corpora like Wikipedia and BookCorpus. If we apply them to more specialized domains like medical, there is often a drop in performance compared to models adapted for those domains.

In this article, we will explore how to adapt a pre-trained LLM like Deberta base to medical domain using the HuggingFace Transformers library. Specifically, we will cover an effective technique called intermediate pre-training where we do further pre-training of the LLM on data from our target domain. This adapts the model to the new domain, and improves its performance.

This is a simple yet effective technique to tune LLMs to your domain and gain significant improvements in downstream task performance. — Read More

#devops

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

OpenChat is an innovative library of open-source language models, fine-tuned with C-RLFT – a strategy inspired by offline reinforcement learning. Our models learn from mixed-quality data without preference labels, delivering exceptional performance on par with ChatGPT, even with a 7B model. Despite our simple approach, we are committed to developing a high-performance, commercially viable, open-source large language model, and we continue to make significant strides toward this vision. — Read More

#devops