Empowering end-users to interactively teach robots to perform novel tasks is a crucial capability for their successful integration into real-world applications. For example, a user may want to teach a robot dog to perform a new trick, or teach a manipulator robot how to organize a lunch box based on user preferences. The recent advancements in large language models (LLMs) pre-trained on extensive internet data have shown a promising path towards achieving this goal. Indeed, researchers have explored diverse ways of leveraging LLMs for robotics, from step-by-step planning and goal-oriented dialogue to robot-code-writing agents.
While these methods impart new modes of compositional generalization, they focus on using language to link together new behaviors from an existing library of control primitives that are either manually engineered or learned a priori. Despite having internal knowledge about robot motions, LLMs struggle to directly output low-level robot commands due to the limited availability of relevant training data. As a result, the expression of these methods are bottlenecked by the breadth of the available primitives, the design of which often requires extensive expert knowledge or massive data collection.
In “Language to Rewards for Robotic Skill Synthesis”, we propose an approach to enable users to teach robots novel actions through natural language input. — Read More
Recent Updates Page 169
Introducing IDEFICS: An Open Reproduction of State-of-the-Art Visual Language Model
We are excited to release IDEFICS (Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS), an open-access visual language model. IDEFICS is based on Flamingo, a state-of-the-art visual language model initially developed by DeepMind, which has not been released publicly. Similarly to GPT-4, the model accepts arbitrary sequences of image and text inputs and produces text outputs. IDEFICS is built solely on publicly available data and models (LLaMA v1 and OpenCLIP) and comes in two variants—the base version and the instructed version. Each variant is available at the 9 billion and 80 billion parameter sizes. — Read More
Artist-created images and animations about artificial intelligence (AI) made freely available online
What does artificial intelligence (AI) look like? Searching online, the answer is likely streams of code, glowing blue brains or white robots with men in suits.
… Since launching, Visualising AI has commissioned 13 artists to create more than 100 artworks, gaining over 100 million views, 800,000 downloads, and our imagery has been used by media outlets, research and civil society organisations. — Read More
View images on Unsplash
View videos on Pexels
Google and YouTube are trying to have it both ways with AI and copyright
Google has made clear it is going to use the open web to inform and create anything it wants, and nothing can get in its way. Except maybe Frank Sinatra.
There’s only one name that springs to mind when you think of the cutting edge in copyright law online: Frank Sinatra.
There’s nothing more important than making sure his estate — and his label, Universal Music Group — gets paid when people do AI versions of Ol’ Blue Eyes singing “Get Low” on YouTube, right? Even if that means creating an entirely new class of extralegal contractual royalties for big music labels just to protect the online dominance of your video platform while simultaneously insisting that training AI search results on books and news websites without paying anyone is permissible fair use? Right? Right? — Read More
The human costs of the AI boom
If you use apps from world-leading technology companies such as OpenAI, Amazon, Microsoft or Google, there is a big chance you have already consumed services produced by online remote work — also known as cloudwork. Big and small organizations across the economy increasingly rely on outsourced labor available to them via platforms like Scale AI, Freelancer.com, Amazon Mechanical Turk, Fiverr and Upwork.
Recently, these platforms have become crucial for artificial intelligence (AI) companies to train their AI systems and ensure they operate correctly. OpenAI is a client of Scale AI and Remotasks, labeling data for their apps ChatGPT and DALL-E. Social networks hire platforms for content moderation. Beyond the tech world, universities, businesses and NGOs (nongovernmental organizations) regularly use these platforms to hire translators, graphic designers or IT experts.
Cloudwork platforms have become an essential earning opportunity for a rising number of people. A breakout study by the University of Oxford scholars Otto Kässi, Vili Lehdonvirta and Fabian Stephany estimated that more than 163 million people have registered on those websites. — Read More
This AI Watches Millions Of Cars Daily And Tells Cops If You’re Driving Like A Criminal
Artificial intelligence is helping American cops look for “suspicious” patterns of movement, digging through license plate databases with billions of records. A drug trafficking case in New York has uncloaked — and challenged — one of the biggest rollouts of the controversial technology to date.
InMarch of 2022, David Zayas was driving down the Hutchinson River Parkway in Scarsdale. His car, a gray Chevrolet, was entirely unremarkable, as was its speed. But to the Westchester County Police Department, the car was cause for concern and Zayas a possible criminal; its powerful new AI tool had identified the vehicle’s behavior as suspicious.
Searching through a database of 1.6 billion license plate records collected over the last two years from locations across New York State, the AI determined that Zayas’ car was on a journey typical of a drug trafficker. — Read More
Introducing SeamlessM4T, a Multimodal AI Model for Speech and Text Translations
The world we live in has never been more interconnected, giving people access to more multilingual content than ever before. This also makes the ability to communicate and understand information in any language increasingly important.
Today, we’re introducing SeamlessM4T, the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages. — Read More
Stable Diffusion -XL 1.0-base
SDXL consists of an ensemble of experts pipeline for latent diffusion: In a first step, the base model is used to generate (noisy) latents, which are then further processed with a refinement model (available here: https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0/) specialized for the final denoising steps. Note that the base model can be used as a standalone module.
Alternatively, we can use a two-stage pipeline as follows: First, the base model is used to generate latents of the desired output size. In the second step, we use a specialized high-resolution model and apply a technique called SDEdit (https://arxiv.org/abs/2108.01073, also known as “img2img”) to the latents generated in the first step, using the same prompt. This technique is slightly slower than the first one, as it requires more function evaluations. — Read More
Source code is available at https://github.com/Stability-AI/generative-models .
Reinforced Self-Training (ReST) for Language Modeling
Reinforcement learning from human feedback (RLHF) can improve the quality of large language model’s (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner. — Read More
Is the AI boom already over?
Generative AI tools are generating less interest than just a few months ago.
When generative AI products started rolling out to the general public last year, it kicked off a frenzy of excitement and fear.
People were amazed at the images and words these tools could create from just a single text prompt. Silicon Valley salivated over the prospect of a transformative new technology, one that it could make a lot of money off of after years of stagnation and the flops of crypto and the metaverse. And then there were the concerns about what the world would be after generative AI transformed it. Millions of jobs could be lost. It might become impossible to tell what was real or what was made by a computer. And if you want to get really dramatic about it, the end of humanity may be near. We glorified and dreaded the incredible potential this technology had. — Read More