We present a general-purpose framework for image modelling and vision tasks based on probabilistic frame prediction. Our approach unifies a broad range of tasks, from image segmentation, to novel view synthesis and video interpolation. We pair this framework with an architecture we term Transframer, which uses U-Net and Transformer components to condition on annotated context frames, and outputs sequences of sparse, compressed image features. Transframer is the state-of-the-art on a variety of video generation benchmarks, is competitive with the strongest models on few-shot view synthesis, and can generate coherent 30 second videos from a single image without any explicit geometric information. A single generalist Transframer simultaneously produces promising results on 8 tasks, including semantic segmentation, image classification and optical flow prediction with no task-specific architectural components, demonstrating that multi-task computer vision can be tackled using probabilistic image models. Our approach can in principle be applied to a wide range of applications that require learning the conditional structure of annotated image-formatted data Read More
#big7, #image-recognitionTag Archives: Big7
Meta open-sources advanced AI text translation system with 50B+ parameters
Meta Platforms Inc. today released the code for NLLB-200, an internally developed artificial intelligence system capable of translating text across 200 languages.
The company is also releasing a set of tools designed to help researchers more easily apply NLLB-200 in software projects.
Many of the 200 languages that NLLB-200 understands are not supported well by other AI translation systems, according to Meta. The company says fewer than 25 African languages are currently supported by widely used translation tools. NLLB-200 supports 55 African languages. Read More
China’s Tech Giants Lost Their Swagger and May Never Get It Back
On trading floors in New York and Hong Kong, the brightening mood toward Chinese technology companies is unmistakable: With stocks like Alibaba Group Holding Ltd. and Tencent Holdings Ltd. surging from multi-year lows, talk of a new bull market is growing louder.
Yet speak to executives, entrepreneurs and venture capital investors intimately involved in China’s tech sector and a more downbeat picture emerges. Interviews with more than a dozen industry players suggest the outlook is still far from rosy, despite signs that the Communist Party’s crackdown on big tech is softening at the edges.
These insiders describe an ongoing sense of paranoia and paralysis, along with an unsettling realization that the sky-high growth rates of the past two decades are likely never coming back. Read More
Google’s New AI: Flying Through Virtual Worlds!
How much longer can Google own the internet?
This story is part of a Recode series about Big Tech and antitrust. Over the last several weeks, we’ve covered what’s happening with Apple, Amazon, Microsoft, Meta, and Google.
There’s a new Big Tech antitrust bill in town, and this one is especially painful for Google.
A group of lawmakers led by Sen. Mike Lee (R-UT) introduced the Competition and Transparency in Digital Advertising Act on Thursday. This bipartisan and bicameral legislation would forbid any company with more than $20 billion in digital advertising revenue — that’s Google and Meta, basically — from owning multiple parts of the digital advertising chain. Google would have to choose between being a buyer or a seller or running the ad exchange between the two. It currently owns all three parts, and has been dogged by allegations, which it denies, that it uses that power to unfairly manipulate that market to its own advantage.
“This lack of competition in digital advertising means that monopoly rents are being imposed upon every website that is ad-supported and every company — small, medium, or large — that relies on internet advertising to grow its business,” Sen. Lee said in a statement. “It is essentially a tax on thousands of American businesses, and thus a tax on millions of American consumers.” Read More
Meta’s Challenge to OpenAI — Give Away a Massive Language Model
Meta is giving away some of the family jewels: That’s the gist of an announcement from the company formerly known as Facebook this week. In a blog post on the Meta AI site, the company’s researchers announced that they’ve created a massive and powerful language AI system and are making it available free to all researchers in the artificial-intelligence community. Meta describes the move as an effort to democratize access to a powerful kind of AI—but some argue that not very many researchers will actually benefit from this largesse. And even as these models become more accessible to researchers, many questions remain about the path to commercial use.
Large language models are one of the hottest things in AI right now. Models like OpenAI’s GPT-3 can generate remarkably fluid and coherent text in just about any format or style: They can write convincing news articles, legal summaries, poems, and advertising copy, or hold up their end of conversation as customer-service chatbots or video-game characters. GPT-3, which broke the mold with its 175 billion parameters, is available to academic and commercial entities only via OpenAI’s application and vetting process.
Meta’s Open Pretrained Transformer (known as OPT-175B) matches GPT-3 with 175 billion parameters of its own. Meta is offering the research community not only the model itself, but also its codebase and extensive notes and logbooks about the training process. The model was trained on 800 gigabytes of data from five publicly available data sets, which are described in the “data card” that accompanies a technical paper posted by the Meta researchers to the ArXiv online preprint server. Read More
The Future of Search Is Boutique
For most queries, Google search is pretty underwhelming these days. Google is great at answering questions with an objective answer, like “# of billionaires in the world” or “What is the population of Iceland?” It’s pretty bad at answering questions that require judgment and context like “What do NFT collectors think about NFTs?”
The evidence is everywhere. These days, I find myself suppressing the garbage Internet by searching on Google for “Substack + future of learning” to find the best takes on education. We hack Twitter with the “what is the best” posts over and over again. When I’m researching a new product, I type “X item reddit” into Google. I find enormous value in small, niche, often forgotten sites like Spaghetti Directory.
There’s an emergence of tools like Notion, Airtable, and Readwise where people are aggregating content and resources, reviving the curated web. But at the moment these are mostly solo affairs — hidden in private or semi-private corners of the Internet, fragmented, poorly indexed, and unavailable for public use. We haven’t figured out how to make them multiplayer. In cases where we’ve made them public and collaborative — here is a great example — these projects are often short-lived and poorly maintained. Read More
Former Intelligence Officials, Citing Russia, Say Big Tech Monopoly Power is Vital to National Security
When the U.S. security state announces that Big Tech’s centralized censorship power must be preserved, we should ask what this reveals about whom this regime serves.
A group of former intelligence and national security officials on Monday issued a jointly signed letter warning that pending legislative attempts to restrict or break up the power of Big Tech monopolies — Facebook, Google, and Amazon — would jeopardize national security because, they argue, their centralized censorship power is crucial to advancing U.S. foreign policy. The majority of this letter is devoted to repeatedly invoking the grave threat allegedly posed to the U.S. by Russia as illustrated by the invasion of Ukraine, and it repeatedly points to the dangers of Putin and the Kremlin to justify the need to preserve Big Tech’s power in its maximalist form. Any attempts to restrict Big Tech’s monopolistic power would therefore undermine the U.S. fight against Moscow. Read More
Amazon releases 51-language dataset for language understanding
MASSIVE dataset and Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.
Imagine that all people around the world could use voice AI systems such as Alexa in their native tongues.
One promising approach to realizing this vision is massively multilingual natural-language understanding (MMNLU), a paradigm in which a single machine learning model can parse and understand inputs from many typologically diverse languages. By learning a shared data representation that spans languages, the model can transfer knowledge from languages with abundant training data to those in which training data is scarce. Read More
Google rolls out AI improvements to aid with Search safety and ‘personal crisis’ queries
Google today announced it will be rolling out improvements to its AI model to make Google Search a safer experience and one that’s better at handling sensitive queries, including those around topics like suicide, sexual assault, substance abuse and domestic violence. It’s also using other AI technologies to improve its ability to remove unwanted explicit or suggestive content from Search results when people aren’t specifically seeking it out.
Currently, when people search for sensitive information — like suicide, abuse or other topics — Google will display the contact information for the relevant national hotlines above its search results. But the company explains that people who are in crisis situations may search in all kinds of ways, and it’s not always obvious to a search engine that they’re in need, even if it would raise flags if a human saw their search queries. With machine learning and the latest improvements to Google’s AI model called MUM (Multitask Unified Model), Google says it will be able to automatically and more accurately detect a wider range of personal crisis searches because of how MUM is able to better understand the intent behind people’s questions and queries. Read More