The buzz-term Web3 is trending quickly. The current reality of Web3 is this— most people have absolutely no idea what it means; large corporations and political institutions are playing Public Relations games with the term; and technical people are having a terrible time defining it.
We are in limbo somewhere between “wtf is Web3?” and “let’s make sure we both have the same definition of Web3…” I’ve been stumped by both the former and the latter.
… There are large corporate, institutional, and political players trying to co-opt the term Web3. Management consulting firms like McKenzie and Deloitte need a new term that can be synonymous with the next wave of Internet innovation consulting services.
… On the surface, the ambiguity and lack of consensus around Web3 are all a bit silly. However, beneath the surface of the non-specific, insecure, and postural Web3 narratives lies a fascinating set of concepts and innovations that are 1) exposing a new ‘access layer’ of distributed Internet-based applications, and 2) growing into an absolutely dissonant threat to the dominant order of our existing monolithic financial and political institutions. Read More
Daily Archives: January 8, 2022
AI that understands speech by looking as well as hearing
People use AI for a wide range of speech recognition and understanding tasks, from enabling smart speakers to developing tools for people who are hard of hearing or who have speech impairments. But oftentimes these speech understanding systems don’t work well in the everyday situations when we need them most: Where multiple people are speaking simultaneously or when there’s lots of background noise. Even sophisticated noise-suppression techniques are often no match for, say, the sound of the ocean during a family beach trip or the background chatter of a bustling street market.
One reason why people can understand speech better than AI in these instances is that we use not just our ears but also our eyes. We might see someone’s mouth moving and intuitively know the voice we’re hearing must be coming from her, for example. That’s why Meta AI is working on new conversational AI systems that can recognize the nuanced correlations between what they see and what they hear in conversation, like we do.
To help us build these more versatile and robust speech recognition tools, we are announcing Audio-Visual Hidden Unit BERT (AV-HuBERT), a state-of-the-art self-supervised framework for understanding speech that learns by both seeing and hearing people speak. It is the first system to jointly model speech and lip movements from unlabeled data — raw video that has not already been transcribed. Read More