Even as the geopolitical conversation around AI continues to grow more fraught following the U.S. government’s actions to limit the new models from Anthropic and OpenAI, Chinese open source darling DeepSeek is back with yet another open release that could once again change AI development around the globe.
Over the weekend, the firm released DSpark, a new, MIT-Licensed system designed to make large language models answer faster without changing what the underlying model is trying to say.
… DeepSeek published the work with a technical paper, model checkpoints and DeepSpec, a codebase for training and evaluating speculative decoding systems. The release is available through DeepSeek’s public GitHub and Hugging Face pages, both under the permissive, friendly, commonplace MIT license, making the new technique broadly usable by developers, researchers and commercial enterprise operations that want to study or adapt the approach. — Read More
Daily Archives: June 30, 2026
Local AI for Penetration Testing & Research
How competent are local AI models for cyber security bug hunting and research?
… I benchmarked four different approaches to identify a known-to-me vulnerability in order to evaluate how effectively each approach could find it.
…Of the four approaches I tested, the most successful one made something clear: the harness/approach matters more than the model. — Read More
China Has Matched Anthropic in Cybersecurity, Resetting AI Race
Chinese artificial-intelligence systems have matched the performance of Anthropic’s powerful model Mythos in some cybersecurity scenarios, a development poised to reset the global tech race and pressure the White House in its overhaul of U.S. AI policy.
Security researchers said that a new AI model, released this month by China’s Zhipu AI, also known as Z.ai, can match the latest U.S. models when it comes to finding security bugs, although it still lags behind Anthropic’s and OpenAI’s products in other tasks. — Read More
MOJO Programming Language
Write like Python, run like C++. Write fast code for diverse hardware, from CPUs to GPUs, without vendor lock-in, in a language that’s both user friendly and memory safe.
Mojo draws inspiration from the best parts of modern languages – like Python’s intuitive syntax, Rust’s memory safety, and Zig’s powerful and intuitive compile-time metaprogramming. It is built from the ground up to deliver the best performance on the diverse hardware that powers modern AI systems. As a compiled, statically-typed language, it’s also ideal for agentic programming.
… The Mojo standard library is fully open-source on GitHub and we welcome contributions! We also plan to open-source the Mojo compiler in 2026. — Read More
Memory Caching: RNNs with Growing Memory
Transformers have been established as the de-facto backbones for most recent advances in sequence modeling, mainly due to their growing memory capacity that scales with the context length. While plausible for retrieval tasks, it causes quadratic complexity and so has motivated recent studies to explore viable subquadratic recurrent alternatives. Despite showing promising preliminary results in diverse domains, such recurrent architectures underperform Transformers in recall-intensive tasks, often attributed to their fixed-size memory. In this paper, we introduce Memory Caching (MC), a simple yet effective technique that enhances recurrent models by caching checkpoints of their memory states (a.k.a. hidden states). Memory Caching allows the effective memory capacity of RNNs to grow with sequence length, offering a flexible trade-off that interpolates between the fixed memory (i.e., O(L) complexity) of RNNs and the growing memory (i.e., O(L2) complexity) of Transformers. We propose four variants of MC, including gated aggregation and sparse selective mechanisms, and discuss their implications on both linear and deep memory modules. Our experimental results on language modeling, and long-context understanding tasks show that MC enhances the performance of recurrent models, supporting its effectiveness. The results of in-context recall tasks indicate that while Transformers achieve the best accuracy, our MC variants show competitive performance, close the gap with Transformers, and performs better than state-of-the-art recurrent models. — Read More
From Brain Waves to Words: Brain2Qwerty Offers a New Path to Communication Without Surgery
Last year, we introduced Brain2Qwerty v1, research that uses AI to decode brain activity into text without any surgical implant. Now we’re sharing the next step: Brain2Qwerty v2, the highest-performing end-to-end pipeline capable of real-time sentence decoding from non-invasive brain recordings, approaching levels of accuracy previously exclusive to techniques that require brain surgery.
To help accelerate neuroscience breakthroughs, we’re releasing the full training code for Brain2Qwerty v1 and v2, and our partner, the Basque Center on Cognition, Brain, and Language (BCBL), is releasing the v1 dataset. We believe this research has the potential to make a real difference for the millions of people who suffer from brain lesions that prevent them from communicating. Invasive procedures like stereotactic electroencephalography and electrocorticography have shown that a neuroprosthesis feeding signals to an AI decoder can restore communication, but they’re difficult to scale. Our noninvasive approach can help bridge that gap. — Read More
Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models
Knowledge distillation improves large language model (LLM) reasoning by compressing the knowledge of a teacher LLM to train smaller LLMs. On-policy distillation advances this approach by having the student sample its own trajectories while a teacher LLM provides dense token-level supervision, addressing the distribution mismatch between training and inference in off-policy distillation methods. However, on-policy distillation typically requires a separate, often larger, teacher LLM and does not explicitly leverage ground-truth solutions available in reasoning datasets. Inspired by the intuition that a sufficiently capable LLM can rationalize external privileged reasoning traces and teach its weaker self, we introduce On-Policy Self-Distillation (OPSD), a learning algorithm where a single LLM acts as both teacher and student with different contexts. The teacher policy conditions on privileged information (e.g., verified reasoning traces) while the student policy sees only the question; training minimizes the per-token divergence between these distributions over the student’s own rollouts. We demonstrate the efficacy of our method on multiple mathematical reasoning benchmarks, achieving superior token efficiency compared to reinforcement learning methods and better performance over off-policy distillation methods. Code repo: this https URL.Knowledge distillation improves large language model (LLM) reasoning by compressing the knowledge of a teacher LLM to train smaller LLMs. On-policy distillation advances this approach by having the student sample its own trajectories while a teacher LLM provides dense token-level supervision, addressing the distribution mismatch between training and inference in off-policy distillation methods. However, on-policy distillation typically requires a separate, often larger, teacher LLM and does not explicitly leverage ground-truth solutions available in reasoning datasets. Inspired by the intuition that a sufficiently capable LLM can rationalize external privileged reasoning traces and teach its weaker self, we introduce On-Policy Self-Distillation (OPSD), a learning algorithm where a single LLM acts as both teacher and student with different contexts. The teacher policy conditions on privileged information (e.g., verified reasoning traces) while the student policy sees only the question; training minimizes the per-token divergence between these distributions over the student’s own rollouts. We demonstrate the efficacy of our method on multiple mathematical reasoning benchmarks, achieving superior token efficiency compared to reinforcement learning methods and better performance over off-policy distillation methods. — Read More
Code repo: this https URL.