“AI Engineering” is a term that I didn’t hear about two years ago, but today, AI engineers are in high demand. Companies like Meta, Google, and Amazon, offer higher base salaries for these roles than “regular” software engineers get, while AI startups and scaleups are scrambling to hire them.
However, closer inspection reveals AI engineers are often regular software engineers who have mastered the basics of large language models (LLM), such as working with them and integrating them.
So far, the best book I’ve found on this hot topic is AI Engineering by Chip Huyen, published in January by O’Reilly. Chip has worked as a researcher at Netflix, was a core developer at NVIDIA (building NeMo, NVIDIA’s GenAI framework), and cofounded Claypot AI. She has also taught machine learning (ML) at Stanford University. — Read More
Tag Archives: DevOps
OpenAlpha_Evolve
OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous coding agents like DeepMind’s AlphaEvolve. It’s a regeneration of the core idea: an intelligent system that iteratively writes, tests, and improves code using Large Language Models (LLMs) like Google’s Gemini, guided by the principles of evolution. — Read More
Meet AlphaEvolve, the Google AI that writes its own code—and just saved millions in computing costs
Google DeepMind today pulled the curtain back on AlphaEvolve, an artificial-intelligence agent that can invent brand-new computer algorithms — then put them straight to work inside the company’s vast computing empire.
AlphaEvolve pairs Google’s Gemini large language models with an evolutionary approach that tests, refines, and improves algorithms automatically. The system has already been deployed across Google’s data centers, chip designs, and AI training systems — boosting efficiency and solving mathematical problems that have stumped researchers for decades.
AlphaEvolve is a Gemini-powered AI coding agent that is able to make new discoveries in computing and mathematics. — Read More
DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level
Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL. — Read More
Try Public APIs for free
The Public APIs repository is manually curated by community members like you and folks working at APILayer. It includes an extensive list of public APIs from many domains that you can use for your own products. Consider it a treasure trove of APIs well-managed by the community over the years. — Read More
Which LLM writes the best analytical SQL?
We asked 19 popular LLMs (+1 human) to write analytical SQL queries to filter and aggregate a 200 million row dataset. The result is the first version of the LLM SQL Generation Benchmark.
Using a set of 50 analytical questions inspired by this list from maintainers of ClickHouse®, we measure how well each model can write accurate and efficient SQL. We benchmark success rates, exactness, efficiency, query latency, and other metrics, comparing them to queries produced by an experienced human engineer.
The dataset, which contains 200 million rows of public GitHub events data (sampled from the GH Archive), is hosted in Tinybird, allowing us to run all the queries interactively and measure performance at scale. The full dashboard with results is public here. We will continually update this dashboard as new models are developed and tested (Want us to test a model? Create an issue or run the test yourself and submit a PR with new results here). — Read More
Ask HN: How much better are AI IDEs vs. copy pasting into chat apps?
I just wanted to hear peoples experiences with AI IDEs.
For context, I’m a heavy user of Gemini / ChatGPT for coding and Copilot. But I haven’t used Cursor / Windsurf / etc..
Copy pasting into chat apps is a first world problem: it will do the work for you, but you have to give it all the context in the prompt, which for a larger project, gets tedious.
The issue with Copilot is that it’s not as smart as the “thinking” chat apps.
This makes it clear why there’s such a need for AI IDEs. I don’t want to construct my context to a chat app. The context is already in my codebase, so the AI should pick up on it. But I also hear that it gets expensive because of the pay-per-use pricing, as opposed to effectively unlimited prompts for a thinking chat app if you pay the monthly subscription.
So I just wanted to get the lay of the land. How good are these IDEs on constructing your context to the LLMs? How much more expensive is it, and is it worth it for you? — Read More
Working with LLMs: A Few Lessons
An interesting part of working with LLMs is that you get to see a lot of people trying to work with them, inside companies both small and large, and fall prey to entirely new sets of problems. Turns out using them well isn’t just a matter of knowhow or even interest, but requires unlearning some tough lessons. So I figured I’d jot down a few observations. Here we go, starting with the hardest one, which is:
Perfect verifiability doesn’t exist
LLMs inherently are probabilistic. No matter how much you might want it, there is no perfect verifiability of what it produces. Instead what’s needed is to find ways to deal with the fact that occasionally it will get things wrong. — Read More
Why Developers Should Care About Generative AI (Even They Aren’t AI Expert)
Software development is about to undergo a generative change. What this means is that AI (Artificial Intelligence) has the potential to make developers more productive, as three systems on the market already provide this: GitHub Copilot, Anthropic’s Claude and OpenAI’s ChatGPT.
Hence, every developer, no matter if he or she specializes in AI or not, needs to understand and realize that as this technology is advancing so rapidly, any of us needs to know what it is, why it is relevant, and how to use it. — Read More
What «Shifting Left» Means and Why it Matters for Data Stacks
Moving Data Quality and Business Logic Upstream for More Efficient Data Systems
Shifting left is an interesting concept that’s gaining momentum in modern data engineering. SDF has been among those sharing this approach, even making “shifting left” one of their main slogans. As Elias DeFaria, SDF’s co-founder, describes it, shifting left means “improving data quality by moving closer toward the data source”.
However, the benefits extend beyond just data quality improvements. With dbt Labs’ recent acquisition of SDF, many are wondering: what does this mean for the shifting left movement, and more importantly, what exactly is shifting left in the data context?
In this article, we’ll explore the core principles behind shifting left, examine how code-first approaches have made moving logic upstream more efficient, and answer the questions: Why should data teams shift left? What elements need to be shifted? And how can your organization implement this approach to build more maintainable, efficient data systems? — Read More