Apple has announced a new feature called Live Text, which will digitize the text in all your photos. This unlocks a slew of handy functions, from turning handwritten notes into emails and messages to searching your camera roll for receipts or recipes you’ve photographed.
…Apple says the feature is enabled using “deep neural networks” and “on-device intelligence,” with the latter being the company’s preferred phrasing for machine learning. (It stresses Apple’s privacy-heavy approach to AI, which focuses on processing data on-device rather than sending it to the cloud. Read More
Tag Archives: Big7
From conversation to code: Microsoft introduces its first product features powered by GPT-3
At its Build developers conference, Microsoft unveiled its first features in a customer product powered by GPT-3, the powerful natural language model developed by OpenAI, which will help users build apps without needing to know how to write computer code or formulas.
GPT-3 will be integrated in Microsoft Power Apps, the low code app development platform that helps everyone from people with little or no coding experience — so-called “citizen developers” — to professional developers with deep programming expertise build applications to improve business productivity or processes. Read More
High-performance speech recognition with no supervision at all
Whether it’s giving directions, answering questions, or carrying out requests, speech recognition makes life easier in countless ways. But today the technology is available for only a small fraction of the thousands of languages spoken around the globe. This is because high-quality systems need to be trained with large amounts of transcribed speech audio. This data simply isn’t available for every language, dialect, and speaking style. Transcribed recordings of English-language novels, for example, will do little to help machines learn to understand a Basque speaker ordering food off a menu or a Tagalog speaker giving a business presentation.
This is why we developed wav2vec Unsupervised (wav2vec-U), a way to build speech recognition systems that require no transcribed data at all. It rivals the performance of the best supervised models from only a few years ago, which were trained on nearly 1,000 hours of transcribed speech. We’ve tested wav2vec-U with languages such as Swahili and Tatar, which do not currently have high-quality speech recognition models available because they lack extensive collections of labeled training data.
Wav2vec-U is the result of years of Facebook AI’s work in speech recognition, self-supervised learning, and unsupervised machine translation. It is an important step toward building machines that can solve a wide range of tasks just by learning from their observations. We think this work will bring us closer to a world where speech technology is available for many more people. Read More
Google’s Cinematic Moments is like Apple’s Live Photos — but a lot creepier
Google Photos will see several new features later this year — but some could be a bit creepier than others.
Arriving this summer, Cinematic Moments is a Google Photos tool that yields similar results to Apple’s Live Photos. The difference is that Cinematic Moments uses artificial intelligence (AI) to fill in the gaps of a few photos, rather than record short video, to produce in-motion media.
With a handful of still images, Cinematic Moments can create a complete and animated action shot. It uses neural networks to synthesize the moment, materializing frames from thin air, practically. Read More
MUM: A new AI milestone for understanding information
…People issue eight queries on average for complex tasks. Today’s search engines aren’t quite sophisticated enough to answer the way an expert would. But with a new technology called Multitask Unified Model, or MUM, we’re getting closer to helping you with these types of complex needs. So in the future, you’ll need fewer searches to get things done.
MUM has the potential to transform how Google helps you with complex tasks. Like BERT, MUM is built on a Transformer architecture, but it’s 1,000 times more powerful. MUM not only understands language, but also generates it. It’s trained across 75 different languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge than previous models. And MUM is multimodal, so it understands information across text and images and, in the future, can expand to more modalities like video and audio. Read More
Watson Orchestrate
Watson Orchestrate gives you interactive AI–in tools like email and Slack–to increase your productivity. This isn’t a static bot programmed by IT. You initiate work in natural language, and Watson Orchestrate uses a powerful AI engine to combine pre-packaged skills, on-the-fly and in-context, based on organizational knowledge and your prior interactions. Read More
See Demo
Facebook details self-supervised AI that can segment images and videos
Facebook today announced that it developed an algorithm in collaboration with Inria called DINO that enables the training of transformers, a type of machine learning model, without labeled training data. The company claims it sets a new state-of-the-art among unlabeled data training methods and leads to a model that can discover and segment objects in an image or video without a specific objective.
Segmenting objects is used in tasks ranging from swapping out the background of a video chat to teaching robots that navigate through a factory. But it’s considered among the hardest challenges in computer vision because it requires an AI to understand what’s in an image. Read More
Microsoft details the latest developments in machine learning at GTC 21
With the rapid pace of change taking place in AI and machine learning technology, it’s no surprise Microsoft had its usual strong presence at this year’s Nvidia GTC event.
Representatives of the company shared their latest machine learning innovations in multiple sessions, covering inferencing at scale, a new capability to train machine learning models across hybrid environments, and the debut of the new PyTorch Profiler that will help data scientists be more efficient when they’re analyzing and troubleshooting ML performance issues.
In all three cases, Microsoft has paired its own technologies, like Azure, with open source tools and NVIDIA’s GPU hardware and technologies to create these powerful new innovations. Read More
The Limits of Political Debate
I.B.M. taught a machine to debate policy questions. What can it teach us about the limits of rhetorical persuasion?
We need A.I. to be more like a machine, supplying troves of usefully organized information. It can leave the bullshitting to us.
In February, 2011, an Israeli computer scientist named Noam Slonim proposed building a machine that would be better than people at something that seems inextricably human: arguing about politics. …In February, 2019, the machine had its first major public debate, hosted by Intelligence Squared, in San Francisco. The opponent was Harish Natarajan, a thirty-one-year-old British economic consultant, who, a few years earlier, had been the runner-up in the World Universities Debating Championship. The machine lost.
As Arthur Applbaum, a political philosopher who is the Adams Professor of Political Leadership and Democratic Values at Harvard’s Kennedy School, saw it, the particular adversarial format chosen for this debate had the effect of elevating technical questions and obscuring ethical ones. The audience had voted Natarajan the winner of the debate. But, Applbaum asked, what had his argument consisted of? “He rolled out standard objections: it’s not going to work in practice, and it will be wasteful, and there will be unintended consequences. If you go through Harish’s argument line by line, there’s almost no there there,” he said. Natarajan’s way of defeating the computer, at some level, had been to take a policy question and strip it of all its meaningful specifics. “It’s not his fault,” Applbaum said. There was no way that he could match the computer’s fact-finding. “So, instead, he bullshat.” Read More
Google starts trialing its FLoC cookie alternative in Chrome
Google today announced that it is rolling out Federated Learning of Cohorts (FLoC), a crucial part of its Privacy Sandbox project for Chrome, as a developer origin trial.
FLoC is meant to be an alternative to the kind of cookies that advertising technology companies use today to track you across the web. Instead of a personally identifiable cookie, FLoC runs locally and analyzes your browsing behavior to group you into a cohort of like-minded people with similar interests (and doesn’t share your browsing history with Google). That cohort is specific enough to allow advertisers to do their thing and show you relevant ads, but without being so specific as to allow marketers to identify you personally.
This “interest-based advertising,” as Google likes to call it, allows you to hide within the crowd of users with similar interests. All the browser displays is a cohort ID and all your browsing history and other data stay locally. Read More