Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL. — Read More
Tag Archives: DevOps
Try Public APIs for free
The Public APIs repository is manually curated by community members like you and folks working at APILayer. It includes an extensive list of public APIs from many domains that you can use for your own products. Consider it a treasure trove of APIs well-managed by the community over the years. — Read More
Which LLM writes the best analytical SQL?
We asked 19 popular LLMs (+1 human) to write analytical SQL queries to filter and aggregate a 200 million row dataset. The result is the first version of the LLM SQL Generation Benchmark.
Using a set of 50 analytical questions inspired by this list from maintainers of ClickHouse®, we measure how well each model can write accurate and efficient SQL. We benchmark success rates, exactness, efficiency, query latency, and other metrics, comparing them to queries produced by an experienced human engineer.
The dataset, which contains 200 million rows of public GitHub events data (sampled from the GH Archive), is hosted in Tinybird, allowing us to run all the queries interactively and measure performance at scale. The full dashboard with results is public here. We will continually update this dashboard as new models are developed and tested (Want us to test a model? Create an issue or run the test yourself and submit a PR with new results here). — Read More
Ask HN: How much better are AI IDEs vs. copy pasting into chat apps?
I just wanted to hear peoples experiences with AI IDEs.
For context, I’m a heavy user of Gemini / ChatGPT for coding and Copilot. But I haven’t used Cursor / Windsurf / etc..
Copy pasting into chat apps is a first world problem: it will do the work for you, but you have to give it all the context in the prompt, which for a larger project, gets tedious.
The issue with Copilot is that it’s not as smart as the “thinking” chat apps.
This makes it clear why there’s such a need for AI IDEs. I don’t want to construct my context to a chat app. The context is already in my codebase, so the AI should pick up on it. But I also hear that it gets expensive because of the pay-per-use pricing, as opposed to effectively unlimited prompts for a thinking chat app if you pay the monthly subscription.
So I just wanted to get the lay of the land. How good are these IDEs on constructing your context to the LLMs? How much more expensive is it, and is it worth it for you? — Read More
Working with LLMs: A Few Lessons
An interesting part of working with LLMs is that you get to see a lot of people trying to work with them, inside companies both small and large, and fall prey to entirely new sets of problems. Turns out using them well isn’t just a matter of knowhow or even interest, but requires unlearning some tough lessons. So I figured I’d jot down a few observations. Here we go, starting with the hardest one, which is:
Perfect verifiability doesn’t exist
LLMs inherently are probabilistic. No matter how much you might want it, there is no perfect verifiability of what it produces. Instead what’s needed is to find ways to deal with the fact that occasionally it will get things wrong. — Read More
Why Developers Should Care About Generative AI (Even They Aren’t AI Expert)
Software development is about to undergo a generative change. What this means is that AI (Artificial Intelligence) has the potential to make developers more productive, as three systems on the market already provide this: GitHub Copilot, Anthropic’s Claude and OpenAI’s ChatGPT.
Hence, every developer, no matter if he or she specializes in AI or not, needs to understand and realize that as this technology is advancing so rapidly, any of us needs to know what it is, why it is relevant, and how to use it. — Read More
What «Shifting Left» Means and Why it Matters for Data Stacks
Moving Data Quality and Business Logic Upstream for More Efficient Data Systems
Shifting left is an interesting concept that’s gaining momentum in modern data engineering. SDF has been among those sharing this approach, even making “shifting left” one of their main slogans. As Elias DeFaria, SDF’s co-founder, describes it, shifting left means “improving data quality by moving closer toward the data source”.
However, the benefits extend beyond just data quality improvements. With dbt Labs’ recent acquisition of SDF, many are wondering: what does this mean for the shifting left movement, and more importantly, what exactly is shifting left in the data context?
In this article, we’ll explore the core principles behind shifting left, examine how code-first approaches have made moving logic upstream more efficient, and answer the questions: Why should data teams shift left? What elements need to be shifted? And how can your organization implement this approach to build more maintainable, efficient data systems? — Read More
Model Context Protocol (MCP)
MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools. — Read More
How to Build an Agent
It’s not that hard to build a fully functioning, code-editing agent.
It seems like it would be. When you look at an agent editing files, running commands, wriggling itself out of errors, retrying different strategies – it seems like there has to be a secret behind it.
There isn’t. It’s an LLM, a loop, and enough tokens. It’s what we’ve been saying on the podcast from the start. The rest, the stuff that makes Amp so addictive and impressive? Elbow grease.
But building a small and yet highly impressive agent doesn’t even require that. You can do it in less than 400 lines of code, most of which is boilerplate.
I’m going to show you how, right now. We’re going to write some code together and go from zero lines of code to “oh wow, this is… a game changer.” — Read More
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL. — Read More