Multi-modal models that can process both text and images are a growing area of research in artificial intelligence. However, training these models presents a unique challenge: language models deal with discrete values (words and tokens), while image generation models must handle continuous pixel values.
Current multi-modal models use techniques that reduce the quality of representing data. In a new research paper, scientists from Meta and the University of Southern California introduce Transfusion, a novel technique that enables a single model to seamlessly handle both discrete and continuous modalities. — Read More
Tag Archives: Big7
Introducing AI21 Labs Jamba 1.5
The AI21 Jamba 1.5 family of models is state-of-the-art, hybrid SSM-Transformer instruction following foundation models. The Jamba models are the most powerful & efficient long-context models on the market, which deliver up to 2.5X faster inference than leading models of comparable sizes.
The models demonstrate superior long context handling, speed, and quality. They mark the first time a non-Transformer model has been successfully scaled to the quality and strength of the market’s leading models. — Read More
The Paper
New in Gemini: Custom Gems and improved image generation with Imagen 3
We have new features rolling out, starting today, that we previewed at Google I/O. Gems, a new feature that lets you customize Gemini to create your own personal AI experts on any topic you want, are now available for Gemini Advanced, Business and Enterprise users. And our new image generation model, Imagen 3, will be rolling out across Gemini, Gemini Advanced, Business and Enterprise in the coming days. — Read More
Diffusion Models Are Real-Time Game Engines
We present GameNGen, the first game engine powered entirely by a neural model that enables real-time interaction with a complex environment over long trajectories at high quality. GameNGen can interactively simulate the classic game DOOM at over 20 frames per second on a single TPU. Next frame prediction achieves a PSNR of 29.4, comparable to lossy JPEG compression. Human raters are only slightly better than random chance at distinguishing short clips of the game from clips of the simulation. GameNGen is trained in two phases: (1) an RL-agent learns to play the game and the training sessions are recorded, and (2) a diffusion model is trained to produce the next frame, conditioned on the sequence of past frames and actions. Conditioning augmentations enable stable auto-regressive generation over long trajectories. — Read More
How Meta trains large language models at scale
As we continue to focus our AI research and development on solving increasingly complex problems, one of the most significant and challenging shifts we’ve experienced is the sheer scale of computation required to train large language models (LLMs).
Traditionally, our AI model training has involved a training massive number of models that required a comparatively smaller number of GPUs. This was the case for our recommendation models (e.g., our feed and ranking models) that would ingest vast amounts of information to make accurate recommendations that power most of our products.
With the advent of generative AI (GenAI), we’ve seen a shift towards fewer jobs, but incredibly large ones. Supporting GenAI at scale has meant rethinking how our software, hardware, and network infrastructure come together. — Read More
Microsoft’s AI Copilot is coming to your messaging apps, starting with Telegram
Whether you love or hate Microsoft’s Copilot AI, there could be no escaping it soon as it has recently been spotted crawling around messaging apps, specifically Telegram. Microsoft seems to have sneakily introduced Copilot into the messaging app, allowing Telegram users to experience it firsthand.
According to Windows Latest, the move is part of a new project from Microsoft dubbed ‘copilot-for-social’, which is an initiative to bring generative AI to social media apps. — Read More
AI eats the web
Google’s shift toward AI-generated search results, displacing the familiar list of links, is rewiring the internet — and could accelerate the decline of the 30+-year-old World Wide Web.
Why it matters: A world where Google answers most questions in a single machine voice makes online life more convenient — and duller.
— The change also threatens to cut into Google’s revenue from search ads, and starve future AIs of the human data they’ll need. — Read More
Google Is About to Change Everything—and Hopes You Won’t Find Out
It’s difficult to overstate the magnitude and impact of the changes Google has been making to its search engine and overall product suite this month, some of which were laid out during Tuesday’s I/O 2024 conference. The reason is not just that parent company Alphabet is determined to shove some form of “artificial intelligence” and machine learning software into your Chrome browser and your phone calls and your photo galleries and your YouTube habits. It’s that the central tool that powers and shapes the modern internet is about to permanently change—and it may make for an even worse search experience than that which has defined Google’s most recent era.
Google Search, that powerful, white, oblong textbox that became the default portal for organizing, showcasing, platforming, exploring, optimizing, and determining the ultimate reach of every single webpage across the entirety of cyberspace (often by paying other gatekeepers to favor it over other search tools), is becoming something else entirely: a self-ingesting singular webpage of its own, powered by the breadth of web information to which it once gave you access. Google is attempting to transform itself from a one-stop portal into a one-stop shop via “search generative experience,” where the Gemini chatbot will spit out a general “AI Overview” answer at the top of your search results. These answers will be informed by (or even plagiarized from) the very links now crowded out by a chatbox.
Yet the company doesn’t seem to want you to know anything about that. — Read More
New Microsoft AI model may challenge GPT-4 and Google Gemini
Microsoft is working on a new large-scale AI language model called MAI-1, which could potentially rival state-of-the-art models from Google, Anthropic, and OpenAI, according to a report by The Information. This marks the first time Microsoft has developed an in-house AI model of this magnitude since investing over $10 billion in OpenAI for the rights to reuse the startup’s AI models. OpenAI’s GPT-4 powers not only ChatGPT but also Microsoft Copilot. — Read More
How Meta is paving the way for synthetic social networks
On Thursday, the AI hype train rolled through Meta’s family of apps. The company’s Meta AI assistant, a ChatGPT-like bot that can answer a wide range of questions, is beginning to roll out broadly across Facebook, Messenger, Instagram and WhatsApp.
Powering the bot is Llama 3, the latest and most capable version of Meta’s large language model. As with its predecessors — and in contrast to models from OpenAI, Google, and Anthropic — Llama 3 is open source. Today Meta made it available in two sizes: one with 8 billion parameters, and one with 70 billion parameters. (Parameters are the variables inside a large language model; in general, the more parameters a model contains, the smarter and more sophisticated its output.) — Read More