Meta’s next big thing is open source ‘artificial general intelligence’

Meta, formerly known as Facebook, is restructuring its artificial intelligence (AI) research teams to create artificial general intelligence (AGI), a form of AI that can match or surpass human intelligence. Mark Zuckerberg, the CEO of Meta, said the reorganization would help the company “speed up” its research and enhance the metaverse, the virtual world that he envisions as the future of social interaction.

Meta currently has two separate teams working on AI research: the Fundamental AI Research (FAIR) team, which was established in 2013, and a team dedicated to creating generative AI experiences for the users of its apps. Zuckerberg said the company would bring the two teams “closer together” as it plans to expand both groups.  – Read More

#big7

OpenVoice: Versatile Instant Voice Cloning

We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.  – Read More

#nlp, #audio

DeepMind AI solves hard geometry problems from mathematics olympiad

AlphaGeometry scores almost as well as the best students on geometry questions from the International Mathematical Olympiad

An AI from Google DeepMind can solve some International Mathematical Olympiad (IMO) questions on geometry almost as well as the best human contestants.

“The results of AlphaGeometry are stunning and breathtaking,” says Gregor Dolinar, the IMO president. “It seems that AI will win the IMO gold medal much sooner than was thought even a few months ago.”

The IMO, aimed at secondary school students, is one of the most difficult maths competitions in the world. Answering questions correctly requires mathematical creativity that AI systems have long struggled with.  – Read More

#human

Get Ready for the Great AI Disappointment

Rose-tinted predictions for artificial intelligence’s grand achievements will be swept aside by underwhelming performance and dangerous results.

In the decades to come, 2023 may be remembered as the year of generative AI hype, where ChatGPT became arguably the fastest-spreading new technology in human history and expectations of AI-powered riches became commonplace. The year 2024 will be the time for recalibrating expectations.

Of course, generative AI is an impressive technology, and it provides tremendous opportunities for improving productivity in a number of tasks. But because the hype has gone so far ahead of reality, the setbacks of the technology in 2024 will be more memorable.  – Read More

#strategy

In the race for AI supremacy, China and the US are travelling on entirely different tracks

Of the many events that stand out as noteworthy in online discussions across Chinese social media in 2023, it’s perhaps the rise of ChatGPT that will prove to be the most significant.

Although the chatbot made by the US-based OpenAI was officially launched in late 2022, it took until 2023 for its unprecedented growth to raise eyebrows in China, where the government has set the goal of becoming the global AI leader by 2030.

… It took months for China to launch its own alternative, models that seemed to lag behind their western variants in multiple ways. Even the minister of science and technology acknowledged that China’s chatbots were struggling against their US competition and Chinese internet users were left asking why – given that China was meant to dominate the AI era.  – Read More

#china-vs-us

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

A new study finds disturbing and pervasive errors among three popular models on a wide range of legal tasks.

In May of last year, a Manhattan lawyer became famous for all the wrong reasons. He submitted a legal brief generated largely by ChatGPT. And the judge did not take kindly to the submission. Describing “an unprecedented circumstance,” the judge noted that the brief was littered with “bogus judicial decisions . . . bogus quotes and bogus internal citations.” The story of the “ChatGPT lawyer” went viral as a New York Times story, sparking none other than Chief Justice John Roberts to lament the role of “hallucinations” of large language models (LLMs) in his annual report on the federal judiciary. 

Yet how prevalent are such legal hallucinations, really?   – Read More

#legal

George Carlin is coming back to life in new AI-generated comedy special

George Carlin‘s family is pushing back against a new artificial intelligence-generated comedy special claiming to bring the legend’s work back to life.

The AI icon is true to form with an inflammatory set featuring opinions on Trump, transgender Americans, reality TV and tech. The hourlong comedy special from Dudesy features an AI spin on Carlin’s takes on current events. Dudesy is an AI comedy platform from Mad TV alum Will Sasso and podcaster Chad Kultgen.  –  Read More

YouTube Video

#vfx

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

While large-scale unsupervised language models (LMs) learn broad world knowledge and some reasoning skills, achieving precise control of their behavior is difficult due to the completely unsupervised nature of their training. Existing methods for gaining such steerability collect human labels of the relative quality of model generations and fine-tune the unsupervised LM to align with these preferences, often with reinforcement learning from human feedback (RLHF). However, RLHF is a complex and often unstable procedure, first fitting a reward model that reflects the human preferences, and then fine-tuning the large unsupervised LM using reinforcement learning to maximize this estimated reward without drifting too far from the original model. In this paper we introduce a new parameterization of the reward model in RLHF that enables extraction of the corresponding optimal policy in closed form, allowing us to solve the standard RLHF problem with only a simple classification loss. The resulting algorithm, which we call Direct Preference Optimization (DPO), is stable, performant, and computationally lightweight, eliminating the need for sampling from the LM during fine-tuning or performing significant hyperparameter tuning. Our experiments show that DPO can fine-tune LMs to align with human preferences as well as or better than existing methods. Notably, fine-tuning with DPO exceeds PPO-based RLHF in ability to control sentiment of generations, and matches or improves response quality in summarization and single-turn dialogue while being substantially simpler to implement and train.  – Read More

#reinforcement-learning

OpenAI’s custom GPT Store is now open for business

OpenAI’s GPT Store, where users can share their custom chatbots, finally launched Wednesday after a monthslong delay. The store brings more potential use cases to ChatGPT and expands OpenAI’s ecosystem beyond what the company builds for customers.  – Read More

#strategy

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers’ computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pretraining and downstream evaluation.  – Read More

#nlp