I have been building so many small products using LLMs. It has been fun, and useful. However, there are pitfalls that can waste so much time. A while back a friend asked me how I was using LLMs to write software. I thought “oh boy. how much time do you have!” and thus this post.
I talk to many dev friends about this, and we all have a similar approach with various tweaks in either direction. — Read More
Tag Archives: DevOps
Data Formulator: Create Rich Visualizations with AI
Data Formulator is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.
Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines user interface interactions (UI) and natural language (NL) inputs for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI. — Read More
Competitive Programming with Large Reasoning Models
We show that reinforcement learning applied to large language models (LLMs) significantly boosts performance on complex coding and reasoning tasks. Additionally, we compare two general-purpose reasoning models – OpenAI o1 and an early checkpoint of o3 – with a domain-specific system, o1-ioi, which uses hand-engineered inference strategies designed for competing in the 2024 International Olympiad in Informatics (IOI). We competed live at IOI 2024 with o1-ioi and, using hand-crafted test-time strategies, placed in the 49th percentile. Under relaxed competition constraints, o1-ioi achieved a gold medal. However, when evaluating later models such as o3, we find that o3 achieves gold without hand-crafted domain-specific strategies or relaxed constraints. Our findings show that although specialized pipelines such as o1-ioi yield solid improvements, the scaled-up, general-purpose o3 model surpasses those results without relying on hand-crafted inference heuristics. Notably, o3 achieves a gold medal at the 2024 IOI and obtains a Codeforces rating on par with elite human competitors. Overall, these results indicate that scaling general-purpose reinforcement learning, rather than relying on domain-specific techniques, offers a robust path toward state-of-the-art AI in reasoning domains, such as competitive programming. — Read More
The future belongs to idea guys who can just do things
There, I said it. I seriously can’t see a path forward where the majority of software engineers are doing artisanal hand-crafted commits by as soon as the end of 2026. If you are a software engineer and were considering taking a gap year/holiday this year it would be an incredibly bad decision/time to do it.
It’s been a good 43 years of software development as usual but it’s time to go up another layer of abstraction as we have in the past – from hand rolling assembler to higher level compilers. It’s now critical for engineers to embrace these new tools and for companies to accelerate their employees “time to oh-f**k” moment. — Read More
BrowserAI
How is Google using AI for internal code migrations?
In recent years, there has been a tremendous interest in using generative AI, and particularly large language models (LLMs) in software engineering; indeed there are now several commercially available tools, and many large companies also have created proprietary ML-based tools for their own software engineers. While the use of ML for common tasks such as code completion is available in commodity tools, there is a growing interest in application of LLMs for more bespoke purposes. One such purpose is code migration.
This article is an experience report on using LLMs for code migrations at Google. It is not a research study, in the sense that we do not carry out comparisons against other approaches or evaluate research questions/hypotheses. Rather, we share our experiences in applying LLM-based code migration in an enterprise context across a range of migration cases, in the hope that other industry practitioners will find our insights useful. Many of these learnings apply to any application of ML in software engineering. We see evidence that the use of LLMs can reduce the time needed for migrations significantly, and can reduce barriers to get started and complete migration programs. — Read More
OLMo 2: The best fully open language model to date
Since the release of the first OLMo in February 2024, we’ve seen rapid growth in the open language model ecosystem, and a narrowing of the performance gap between open and proprietary models. OLMo-0424 saw a notable boost in downstream performance relative to our first release in February. We were also excited by increasing participation in fully open model development, notably including LLM360’s Amber, M-A-P’s Neo models, and DCLM’s baseline models. In September, we released OLMoE, a mixture-of-experts model and the first among its fully open peers to be on the Pareto frontier of performance and size.
Because fully open science requires more than just open weights, we are excited to share a new round of OLMo updates–including weights, data, code, recipes, intermediate checkpoints, and instruction–tuned models—with the broader language modeling community. — Read More
Qwen2.5-Coder just changed the game for AI programming—and it’s free
Alibaba Cloud has released Qwen2.5-Coder, a new AI coding assistant that has already become the second most popular demo on Hugging Face Spaces. Early tests suggest its performance rivals GPT-4o, and it’s available to developers at no cost. — Read More
Illustrated LLM OS: An Implementational Perspective
This blog post explores the implementation of large language models (LLMs) as operating systems, inspired by Andrej Karpathy’s vision of AI resembling an OS, akin to Jarvis from Iron Man. The focus is on practical considerations, proposing an application-level integration for LLMs within a terminal session. A novel approach involves injecting state machines into the decoding process, enabling real-time code execution and interaction. Additionally, this post proposes Reinforcement Learning by System Feedback (RLSF),” a reinforcement learning technique applied to code generation tasks. This method leverages a reward model to evaluate code correctness through Python subprocess execution, enhancing LLM performance. The findings contribute insights into the dynamic control of LLMs and their potential applications beyond coding tasks. — Read More
AIOS: LLM Agent Operating System
MemGPT: Towards LLMs as Operating Systems
Anthropic’s AI can now run and write code
Anthropic’s Claude chatbot can now write and run JavaScript code.
Today, Anthropic launched a new analysis tool that helps Claude respond with what the company describes as “mathematically precise and reproducible answers.” With the tool enabled — it’s currently in preview — Claude can perform calculations and analyze data from files like spreadsheets and PDFs, rendering the results as interactive visualizations. — Read More