DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level

Through a joint collaboration between the Agentica team and Together AI, we release DeepCoder-14B-Preview, a code reasoning model finetuned from Deepseek-R1-Distilled-Qwen-14B via distributed RL. It achieves an impressive 60.6% Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of o3-mini-2025-01-031 (Low) and o1-2024-12-17 with just 14B parameters. We’ve open-sourced our dataset, code, training logs, and systems optimizations for everyone to progress on scaling and accelerating intelligence with RL. — Read More

#devops

AI Index 2025: State of AI in 10 Charts

Small models get better, regulation moves to the states, and more.

The new AI Index Report shows a maturing field, improvements in AI optimization, and a growing saturation of use – and abuse – of this technology.

The 2025 AI Index Report, published on April 7, 2025, is an independent initiative at the Stanford Institute for Human-Centered Artificial Intelligence (HAI), led by the AI Index Steering Committee, an interdisciplinary group of experts from across academia and industry.

Each year, the report covers the biggest technical advances, new achievements in benchmarking, investment flowing into generative AI, education trends, legislation around this technology, and more.

Read the full report here. — Read More

#strategy

One-Minute Video Generation with Test-Time Training

Transformers today still struggle to generate one-minute videos because self-attention layers are inefficient for long context. Alternatives such as Mamba layers struggle with complex multi-scene stories because their hidden states are less expressive. We experiment with Test-Time Training (TTT) layers, whose hidden states themselves can be neural networks, therefore more expressive. Adding TTT layers into a pre-trained Transformer enables it to generate one-minute videos from text storyboards. For proof of concept, we curate a dataset based on Tom and Jerry cartoons. Compared to baselines such as Mamba 2, Gated DeltaNet, and sliding-window attention layers, TTT layers generate much more coherent videos that tell complex stories, leading by 34 Elo points in a human evaluation of 100 videos per method. Although promising, results still contain artifacts, likely due to the limited capability of the pre-trained 5B model. The efficiency of our implementation can also be improved. We have only experimented with one-minute videos due to resource constraints, but the approach can be extended to longer videos and more complex stories. — Read More

#vfx

Xanthorox AI Surfaces on Dark Web as Full Spectrum Hacking Assistant

A sophisticated new artificial intelligence (AI) platform tailored for offensive cyber operations, named Xanthorox AI, has been identified by cybersecurity firm SlashNext. First appearing in late Q1 2025, Xanthorox AI is reportedly circulating within cybercrime communities on darknet forums and encrypted channels.XXXXAccording to SlashNext’s investigation, shared with Hackread.com ahead of its publishing on Monday, Xanthorox stands out from previous malicious AI tools like WormGPTFraudGPT and EvilGPT due to its independent, multi-model framework. The system is based on five distinct AI models optimized for specific cyber operations.

These models are hosted on private servers under the seller’s control rather than public cloud infrastructure or openly accessible APIs. This unique setup sets Xanthorox AI apart from previous malicious tools that often relied on existing large language models (LLMs). — Read More

#cyber

Amazon Nova Reel 1.1: Featuring up to 2-minutes multi-shot videos

At re:Invent 2024, we announced Amazon Nova models, a new generation of foundation models (FMs), including Amazon Nova Reel, a video generation model that creates short videos from text descriptions and optional reference images (together, the “prompt”).

Today, we introduce Amazon Nova Reel 1.1, which provides quality and latency improvements in 6-second single-shot video generation, compared to Amazon Nova Reel 1.0. This update lets you generate multi-shot videos up to 2-minutes in length with consistent style across shots. You can either provide a single prompt for up to a 2-minute video composed of 6-second shots, or design each shot individually with custom prompts. This gives you new ways to create video content through Amazon Bedrock. — Read More

#big7

The day I taught AI to think like a Senior Developer

Is it just me, or are the code generation AIs we’re all using fundamentally broken?

For months, I’ve watched developers praise AI coding tools while silently cleaning up their messes, afraid to admit how much babysitting they actually need.

I realized that AI IDEs don’t actually understand codebases — they’re just sophisticated autocomplete tools with good marketing. The emperor has no clothes, and I’m tired of pretending otherwise.

After two years of frustration watching my AI assistants constantly “forget” where files were located, create duplicates, and use completely incorrect patterns, I finally built what the big AI companies couldn’t (or wouldn’t.)

I decided to find out: What if I could make AI actually understand how my codebase works? — Read More

#devops

Google announces Sec-Gemini v1, a new experimental cybersecurity model

[D]efenders face the daunting task of securing against all cyber threats, while attackers need to successfully find and exploit only a single vulnerability. This fundamental asymmetry has made securing systems extremely difficult, time consuming and error prone. AI-powered cybersecurity workflows have the potential to help shift the balance back to the defenders by force multiplying cybersecurity professionals like never before.

Effectively powering SecOps workflows requires state-of-the-art reasoning capabilities and extensive current cybersecurity knowledge. Sec-Gemini v1 achieves this by combining Gemini’s advanced capabilities with near real-time cybersecurity knowledge and tooling. This combination allows it to achieve superior performance on key cybersecurity workflows, including incident root cause analysis, threat analysis, and vulnerability impact understanding. — Read More

#cyber

How to evaluate an LLM system

Evaluating large language model (LLM) based applications is inherently challenging due to the unique nature of these systems. Unlike traditional software applications, where outputs are deterministic and predictable, LLMs generate outputs that can vary each time they are run, even with the same input. This variability arises from the probabilistic nature of these models, which means there is no single correct output for any given input. Consequently, testing LLM-based applications requires specialized evaluation techniques — known today as ‘evals’ — to ensure they meet performance and reliability standards. — Read More

#performance

So You Uploaded Your Brain… Now What?

“Don’t worry, I’ve got this. I am you.”

That’s what it says, with your voice, your smile, and even your nervous laugh.

Except you’re still here. Still breathing. Still watching this machine act like it’s you — because technically, it is.

Same memories. Same passions.
Same fears.Read More

#human

Taking a responsible path to AGI

Artificial general intelligence (AGI), AI that’s at least as capable as humans at most cognitive tasks, could be here within the coming years.

Integrated with agentic capabilities, AGI could supercharge AI to understand, reason, plan, and execute actions autonomously. Such technological advancement will provide society with invaluable tools to address critical global challenges, including drug discovery, economic growth and climate change.

This means we can expect tangible benefits for billions of people. For instance, by enabling faster, more accurate medical diagnoses, it could revolutionize healthcare. By offering personalized learning experiences, it could make education more accessible and engaging. By enhancing information processing, AGI could help lower barriers to innovation and creativity. By democratising access to advanced tools and knowledge, it could enable a small organization to tackle complex challenges previously only addressable by large, well-funded institutions. — Read More

#ethics