Speech brain–computer interfaces (BCIs) have the potential to restore rapid communication to people with paralysis by decoding neural activity evoked by attempted speech into text1,2 or sound3,4. Early demonstrations, although promising, have not yet achieved accuracies sufficiently high for communication of unconstrained sentences from a large vocabulary1,2,3,4,5,6,7. Here we demonstrate a speech-to-text BCI that records spiking activity from intracortical microelectrode arrays. Enabled by these high-resolution recordings, our study participant—who can no longer speak intelligibly owing to amyotrophic lateral sclerosis—achieved a 9.1% word error rate on a 50-word vocabulary (2.7 times fewer errors than the previous state-of-the-art speech BCI2) and a 23.8% word error rate on a 125,000-word vocabulary (the first successful demonstration, to our knowledge, of large-vocabulary decoding). Our participant’s attempted speech was decoded at 62 words per minute, which is 3.4 times as fast as the previous record8 and begins to approach the speed of natural conversation (160 words per minute9). Finally, we highlight two aspects of the neural code for speech that are encouraging for speech BCIs: spatially intermixed tuning to speech articulators that makes accurate decoding possible from only a small region of cortex, and a detailed articulatory representation of phonemes that persists years after paralysis. These results show a feasible path forward for restoring rapid communication to people with paralysis who can no longer speak. — Read More
Monthly Archives: August 2023
Analyzing an Expert Proposal for China’s Artificial Intelligence Law
A few months after the introduction of OpenAI’s ChatGPT captured imaginations around the world, China’s State Council quietly announced that it would work toward drafting an Artificial Intelligence Law. The government had already acted relatively quickly, drafting, significantly revising, and finally implementing on August 15 rules on generative AI that build on existing laws. Still, broader questions about AI’s role in society remain, and the May announcement signaled that more holistic legislative thinking was on the horizon.
… In the case of this scholars’ draft of an AI Law, the accompanying explanation notes that it is to serve as a reference for legislative work and is expected to be revised in a 2.0 version. Although the connection between this text and any eventual Chinese AI Law is uncertain, its publication from a team led by Zhou Hui, deputy director of the CASS Cyber and Information Law Research Office and chair of a research project on AI ethics and regulation, makes it an early indication of how some influential policy thinkers are approaching the State Council-announced AI Law effort.
We invited DigiChina community members to share their analysis of the scholars’ draft, a translation of which was led by Concordia AI and is published here. Their responses are below. — Read More
FlexiViT: One Model for All Patch Sizes
Vision Transformers convert images to sequences by slicing them into patches. The size of these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher accuracy at greater computational cost, but changing the patch size typically requires retraining the model. In this paper, we demonstrate that simply randomizing the patch size at training time leads to a single set of weights that performs well across a wide range of patch sizes, making it possible to tailor the model to different compute budgets at deployment time. We extensively evaluate the resulting model, which we call FlexiViT, on a wide range of tasks, including classification, image-text retrieval, open-world detection, panoptic segmentation, and semantic segmentation, concluding that it usually matches, and sometimes outperforms, standard ViT models trained at a single patch size in an otherwise identical setup. Hence, FlexiViT training is a simple drop-in improvement for ViT that makes it easy to add compute-adaptive capabilities to most models relying on a ViT backbone architecture. Code and pre-trained models are available at this https URL — Read More
This paper convinced me LLMs are not just “applied statistics”, but learn world models and structure
This paper convinced me LLMs are not just “applied statistics”, but learn world models and structure: https://thegradient.pub/othello/
You can look at an LLM trained on Othello moves, and extract from its internal state the current state of the board after each move you tell it. In other words, an LLM trained on only moves, like “E3, D3,..” contains within it a model of a 8×8 board grid and the current state of each square. — Read More
Exploring Artificial Intelligence’s Potential & Threats | Andrew Ng | Eye on AI #131
Language to rewards for robotic skill synthesis
Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents.
While these methods impart new modes of compositional generalization, they focus on using language to link together new behaviors from an existing library of control primitives that are either manually engineered or learned a priori. Despite having internal knowledge about robot motions, LLMs struggle to directly output low-level robot commands due to the limited availability of relevant training data. As a result, the expression of these methods are bottlenecked by the breadth of the available primitives, the design of which often requires extensive expert knowledge or massive data collection.
In “Language to Rewards for Robotic Skill Synthesis”, we propose an approach to enable users to teach robots novel actions through natural language input. — Read More
Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Model
We are excited to release IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), an open-access visual language model. IDEFICS is based on Flamingo, a state-of-the-art visual language model initially developed by DeepMind, which has not been released publicly. Similarly to GPT-4, the model accepts arbitrary sequences of image and text inputs and produces text outputs. IDEFICS is built solely on publicly available data and models (LLaMA v1 and OpenCLIP) and comes in two variants—the base version and the instructed version. Each variant is available at the 9 billion and 80 billion parameter sizes. — Read More
Artist-created images and animations about artificial intelligence (AI) made freely available online
What does artificial intelligence (AI) look like? Searching online, the answer is likely streams of code, glowing blue brains or white robots with men in suits.
… Since launching, Visualising AI has commissioned 13 artists to create more than 100 artworks, gaining over 100 million views, 800,000 downloads, and our imagery has been used by media outlets, research and civil society organisations. — Read More
View images on Unsplash
View videos on Pexels
Google and YouTube are trying to have it both ways with AI and copyright
Google has made clear it is going to use the open web to inform and create anything it wants, and nothing can get in its way. Except maybe Frank Sinatra.
There’s only one name that springs to mind when you think of the cutting edge in copyright law online: Frank Sinatra.
There’s nothing more important than making sure his estate — and his label, Universal Music Group — gets paid when people do AI versions of Ol’ Blue Eyes singing “Get Low” on YouTube, right? Even if that means creating an entirely new class of extralegal contractual royalties for big music labels just to protect the online dominance of your video platform while simultaneously insisting that training AI search results on books and news websites without paying anyone is permissible fair use? Right? Right? — Read More
The human costs of the AI boom
If you use apps from world-leading technology companies such as OpenAI, Amazon, Microsoft or Google, there is a big chance you have already consumed services produced by online remote work — also known as cloudwork. Big and small organizations across the economy increasingly rely on outsourced labor available to them via platforms like Scale AI, Freelancer.com, Amazon Mechanical Turk, Fiverr and Upwork.
Recently, these platforms have become crucial for artificial intelligence (AI) companies to train their AI systems and ensure they operate correctly. OpenAI is a client of Scale AI and Remotasks, labeling data for their apps ChatGPT and DALL-E. Social networks hire platforms for content moderation. Beyond the tech world, universities, businesses and NGOs (nongovernmental organizations) regularly use these platforms to hire translators, graphic designers or IT experts.
Cloudwork platforms have become an essential earning opportunity for a rising number of people. A breakout study by the University of Oxford scholars Otto Kässi, Vili Lehdonvirta and Fabian Stephany estimated that more than 163 million people have registered on those websites. — Read More