Enabling Everyone To Build With AI

Read More

#videos

StochasTok: Improving Fine-Grained Subword Understanding in LLMs

Subword-level understanding is integral to numerous tasks, including understanding multi-digit numbers, spelling mistakes, abbreviations, rhyming, and wordplay. Despite this, current large language models (LLMs) still often struggle with seemingly simple subword-level tasks like How many ‘r’s in ‘strawberry’?. A key factor behind these failures is tokenization which obscures the fine-grained structure of words. Current alternatives, such as character-level and dropout tokenization methods, significantly increase computational costs and provide inconsistent improvements. In this paper we revisit tokenization and introduce StochasTok, a simple, efficient stochastic tokenization scheme that randomly splits tokens during training, allowing LLMs to ‘see’ their internal structure. Our experiments show that pretraining with StochasTok substantially improves LLMs’ downstream performance across multiple subword-level language games, including character counting, substring identification, and math tasks. Furthermore, StochasTok’s simplicity allows seamless integration at any stage of the training pipeline; and we demonstrate that post-training with StochasTok can instill improved subword understanding into existing pretrained models, thus avoiding costly pretraining from scratch. These dramatic improvements achieved with a minimal change suggest StochasTok holds exciting potential when applied to larger, more capable models. Code open-sourced at: this https URL. — Read More

#nlp

A Variational Framework for Improving Naturalness in Generative Spoken Language Models

The success of large language models in text processing has inspired their adaptation to speech modeling. However, since speech is continuous and complex, it is often discretized for autoregressive modeling. Speech tokens derived from self-supervised models (known as semantic tokens) typically focus on the linguistic aspects of speech but neglect prosodic information. As a result, models trained on these tokens can generate speech with reduced naturalness. Existing approaches try to fix this by adding pitch features to the semantic tokens. However, pitch alone cannot fully represent the range of paralinguistic attributes, and selecting the right features requires careful hand-engineering. To overcome this, we propose an end-to-end variational approach that automatically learns to encode these continuous speech attributes to enhance the semantic tokens. Our approach eliminates the need for manual extraction and selection of paralinguistic features. Moreover, it produces preferred speech continuations according to human raters. Code, samples and models are available at this https URL. — Read More

#nlp

Andrej Karpathy: Software Is Changing (Again)

Read More

#strategy, #videos

Inference Economics of Language Models

As the capabilities of AI models have expanded, and as the recent paradigm of test-time compute scaling has taken off, the demand for AI inference has grown enormously. Inference revenue at major AI companies such as OpenAI and Anthropic has been growing at a rate of 3x per year or more, even as their models continue to become smaller and cheaper compared to 2023.

A few years ago, the benchmark for whether a language model was fast enough was “human reading speed”: if a model could generate 10 tokens per second when responding to a user, that was good enough. Now, as models are asked to reason at length about complex problems and are placed inside elaborate agentic loops, this benchmark has become obsolete. The benefits to serving models faster for inference are greater than ever before. Despite this, there has been little work investigating how language models can be served quickly at scale and how much we can increase their speed at the expense of paying a higher price per token.

Today, we’re releasing a model of LLM inference economics which helps answer these questions. Working with the model reveals many important facts about inference at scale that are not widely appreciated. — Read More

#performance

Vibe Coding: The Revolutionary Approach Transforming Software Development

“No vibe coding while I’m on call!” declared Jessie Young, Principal Engineer at GitLab, encapsulating the fierce debate dividing the software development world. On one side stand cautious veterans like Brendan Humphreys, CTO of Canva, who insists, “No, you won’t be vibe coding your way to production.” On the other hand, technology giants like Google co-founder Sergey Brin actively encourage engineers to embrace AI-generated code, reporting “10 to 100x speedups” in productivity.

“Vibe coding”—a term coined by AI pioneer Dr. Andrej Karpathy, key architect behind ChatGPT at OpenAI—has rapidly evolved from casual meme to industry-transforming methodology. In their forthcoming book Vibe Coding: Building Production-Grade Software with GenAI, Chat, Agents, and Beyond, technology veterans Gene Kim and Steve Yegge wade into this contentious territory with a bold claim: this isn’t just another development fad but a fundamental paradigm shift that will render traditional manual coding obsolete. — Read More

#devops