Amazon New AI Models ‘NOVA’ Stun The Entire Industry!

Read More

#big7

The AI War Was Never Just About AI

For almost two years now, the world’s biggest tech companies have been at war over generative AI. Meta may be known for social media, Google for search, and Amazon for online shopping, but since the release of ChatGPT, each has made tremendous investments in an attempt to dominate in this new era. Along with start-ups such as OpenAI, Anthropic, and Perplexity, their spending on data centers and chatbots is on track to eclipse the costs of sending the first astronauts to the moon.

To be successful, these companies will have to do more than build the most “intelligent” software: They will need people to use, and return to, their products. Everyone wants to be Facebook, and nobody wants to be Friendster. To that end, the best strategy in tech hasn’t changed: build an ecosystem that users can’t help but live in. Billions of people use Google Search every day, so Google built a generative-AI product known as “AI Overviews” right into the results page, granting it an immediate advantage over competitors. — Read More

#big7

Google DeepMind has a new way to look inside an AI’s “mind”

AI has led to breakthroughs in drug discovery and robotics and is in the process of entirely revolutionizing how we interact with machines and the web. The only problem is we don’t know exactly how it works, or why it works so well. We have a fair idea, but the details are too complex to unpick. That’s a problem: It could lead us to deploy an AI system in a highly sensitive field like medicine without understanding that it could have critical flaws embedded in its workings.

A team at Google DeepMind that studies something called mechanistic interpretability has been working on new ways to let us peer under the hood. At the end of July, it released Gemma Scope, a tool to help researchers understand what is happening when AI is generating an output. The hope is that if we have a better understanding of what is happening inside an AI model, we’ll be able to control its outputs more effectively, leading to better AI systems in the future. — Read More

#big7

Meta’s AI Abundance

Stratechery has benefited from a Meta cheat code since its inception: wait for investors to panic, the stock to drop, and write an Article that says Meta is fine — better than fine even — and sit back and watch the take be proven correct. Notable examples include 2013’s post-IPO swoon, the 2018 Stories swoon, and most recently, the 2022 TikTok/Reels swoon (if you want a bonus, I was optimistic during the 2020 COVID swoon too).

Perhaps with that in mind I wrote a cautionary note earlier this year about Meta and Reasonable Doubt: while investors were concerned about the sustainability of Meta’s spending on AI, I was worried about increasing ad prices and the lack of new formats after Stories and then Reels; the long-term future, particularly in terms of the metaverse, was just as much of a mystery as always.

Six months on and I feel the exact opposite: it seems increasingly clear to me that Meta is in fact the most well-placed company to take advantage of generative AI.  — Read More

#big7

Meta’s Transfusion model handles text and images in a single architecture

Multi-modal models that can process both text and images are a growing area of research in artificial intelligence. However, training these models presents a unique challenge: language models deal with discrete values (words and tokens), while image generation models must handle continuous pixel values. 

Current multi-modal models use techniques that reduce the quality of representing data. In a new research paper, scientists from Meta and the University of Southern California introduce Transfusion, a novel technique that enables a single model to seamlessly handle both discrete and continuous modalities.  — Read More

#big7, #multi-modal

Introducing AI21 Labs Jamba 1.5

The AI21 Jamba 1.5 family of models is state-of-the-art, hybrid SSM-Transformer instruction following foundation models. The Jamba models are the most powerful & efficient long-context models on the market, which deliver up to 2.5X faster inference than leading models of comparable sizes.

The models demonstrate superior long context handling, speed, and quality. They mark the first time a non-Transformer model has been successfully scaled to the quality and strength of the market’s leading models. — Read More

The Paper

#big7

New in Gemini: Custom Gems and improved image generation with Imagen 3

We have new features rolling out, starting today, that we previewed at Google I/O. Gems, a new feature that lets you customize Gemini to create your own personal AI experts on any topic you want, are now available for Gemini Advanced, Business and Enterprise users. And our new image generation model, Imagen 3, will be rolling out across Gemini, Gemini Advanced, Business and Enterprise in the coming days. — Read More

#big7

Diffusion Models Are Real-Time Game Engines

We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories. — Read More

#big7

How Meta trains large language models at scale

As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs).

Traditionally, our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs. This was the case for our recommendation models (e.g., our feed and ranking models) that would ingest vast amounts of information to make accurate recommendations that power most of our products.

With the advent of generative AI (GenAI), we’ve seen a shift towards fewer jobs, but incredibly large ones. Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together. — Read More

#big7

Microsoft’s AI Copilot is coming to your messaging apps, starting with Telegram

Whether you love or hate Microsoft’s Copilot AI, there could be no escaping it soon as it has recently been spotted crawling around messaging apps, specifically Telegram. Microsoft seems to have sneakily introduced Copilot into the messaging app, allowing Telegram users to experience it firsthand.

According to Windows Latest, the move is part of a new project from Microsoft dubbed ‘copilot-for-social’, which is an initiative to bring generative AI to social media apps. — Read More

#big7