OLMo 2: The best fully open language model to date

Since the release of the first OLMo in February 2024, we’ve seen rapid growth in the open language model ecosystem, and a narrowing of the performance gap between open and proprietary models. OLMo-0424 saw a notable boost in downstream performance relative to our first release in February. We were also excited by increasing participation in fully open model development, notably including LLM360’s Amber, M-A-P’s Neo models, and DCLM’s baseline models. In September, we released OLMoE, a mixture-of-experts model and the first among its fully open peers to be on the Pareto frontier of performance and size.

Because fully open science requires more than just open weights, we are excited to share a new round of OLMo updates–including weights, data, code, recipes, intermediate checkpoints, and instruction–tuned models—with the broader language modeling community. — Read More

#devops

Qwen2.5-Coder just changed the game for AI programming—and it’s free

Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and it’s available to developers at no cost. — Read More

#china-ai, #devops

Illustrated LLM OS: An Implementational Perspective

This blog post explores the implementation of large language models (LLMs) as operating systems, inspired by Andrej Karpathy’s vision of AI resembling an OS, akin to Jarvis from Iron Man. The focus is on practical considerations, proposing an application-level integration for LLMs within a terminal session. A novel approach involves injecting state machines into the decoding process, enabling real-time code execution and interaction. Additionally, this post proposes Reinforcement Learning by System Feedback (RLSF),” a reinforcement learning technique applied to code generation tasks. This method leverages a reward model to evaluate code correctness through Python subprocess execution, enhancing LLM performance. The findings contribute insights into the dynamic control of LLMs and their potential applications beyond coding tasks. — Read More

AIOS: LLM Agent Operating System
MemGPT: Towards LLMs as Operating Systems

#devops

Anthropic’s AI can now run and write code

Anthropic’s Claude chatbot can now write and run JavaScript code.

Today, Anthropic launched a new analysis tool that helps Claude respond with what the company describes as “mathematically precise and reproducible answers.” With the tool enabled — it’s currently in preview — Claude can perform calculations and analyze data from files like spreadsheets and PDFs, rendering the results as interactive visualizations. — Read More

#devops

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs’ powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization approach that incorporates cell addresses, values, and formats. However, this approach was limited by LLMs’ token constraints, making it impractical for most applications. To tackle this challenge, we develop SheetCompressor, an innovative encoding framework that compresses spreadsheets effectively for LLMs. It comprises three modules: structural-anchor-based compression, inverse index translation, and data-format-aware aggregation. It significantly improves performance in spreadsheet table detection task, outperforming the vanilla approach by 25.6% in GPT4’s in-context learning setting. Moreover, fine-tuned LLM with SheetCompressor has an average compression ratio of 25 times, but achieves a state-of-the-art 78.9% F1 score, surpassing the best existing models by 12.3%. Finally, we propose Chain of Spreadsheet for downstream tasks of spreadsheet understanding and validate in a new and demanding spreadsheet QA task. We methodically leverage the inherent layout and structure of spreadsheets, demonstrating that SpreadsheetLLM is highly effective across a variety of spreadsheet tasks. — Read More

#devops

Arcee AI unveils SuperNova: A customizable, instruction-adherent model for enterprises

Arcee AI launched SuperNova today, a 70 billion parameter language model designed for enterprise deployment, featuring advanced instruction-following capabilities and full customization options. The model aims to provide a powerful, ownable alternative to API-based services from OpenAI and Anthropic, addressing key concerns around data privacy, model stability and customization.

In an AI landscape dominated by cloud-based APIs, Arcee AI is taking a different approach with SuperNova. The large language model (LLM) can be deployed and customized within an enterprise’s own infrastructure. Released today, SuperNova is built on Meta’s Llama-3.1-70B-Instruct architecture and employs a novel post-training process that Arcee claims results in superior instruction adherence and adaptability to specific business needs. — Read More

#devops

Anthropic’s new Claude prompt caching will save developers a fortune

Anthropic introduced prompt caching on its API, which remembers the context between API calls and allows developers to avoid repeating prompts. 

The prompt caching feature is available in public beta on Claude 3.5 Sonnet and Claude 3 Haiku, but support for the largest Claude model, Opus, is still coming soon. 

Prompt caching, described in this 2023 paper, lets users keep frequently used contexts in their sessions. As the models remember these prompts, users can add additional background information without increasing costs. This is helpful in instances where someone wants to send a large amount of context in a prompt and then refer back to it in different conversations with the model. It also lets developers and other users better fine-tune model responses.  — Read More

#devops

OpenDevin, an autonomous AI software engineer

Read More

#devops, #videos

Secret Llama

Fully private LLM chatbot that runs entirely with a browser with no server needed. Supports Mistral and LLama 3.

— Fully private = No conversation data ever leaves your computer

— Runs in the browser = No server needed and no install needed!

— Works offline

Read More

#devops

Meet Amazon Q, the AI assistant that generates apps for you

Amazon Web Services (AWS) has long offered generative AI solutions to optimize everyday business operations. Today, AWS added to those offerings with the general availability of its AI assistant Amazon Q.

AWS first announced Amazon Q in November 2023; on Tuesday, the company made the AI-powered assistant generally available for developers and businesses, as well as released free courses on using the AI assistant and a new Amazon Q capability in preview. — Read More

#devops