Rick's Cafe AI 8:25 am on September 13, 2024
Tags: DevOps ( 264 )

Arcee AI unveils SuperNova: A customizable, instruction-adherent model for enterprises

Arcee AI launched SuperNova today, a 70 billion parameter language model designed for enterprise deployment, featuring advanced instruction-following capabilities and full customization options. The model aims to provide a powerful, ownable alternative to API-based services from OpenAI and Anthropic, addressing key concerns around data privacy, model stability and customization.

In an AI landscape dominated by cloud-based APIs, Arcee AI is taking a different approach with SuperNova. The large language model (LLM) can be deployed and customized within an enterprise’s own infrastructure. Released today, SuperNova is built on Meta’s Llama-3.1-70B-Instruct architecture and employs a novel post-training process that Arcee claims results in superior instruction adherence and adaptability to specific business needs. — Read More

#devops

Rick's Cafe AI 8:18 am on September 13, 2024
Tags: NLP ( 486 )

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Models like GPT-4o enable real-time interaction with large language models (LLMs) through speech, significantly enhancing user experience compared to traditional text-based interaction. However, there is still a lack of exploration on how to build speech interaction models based on open-source LLMs. To address this, we propose LLaMA-Omni, a novel model architecture designed for low-latency and high-quality speech interaction with LLMs. LLaMA-Omni integrates a pretrained speech encoder, a speech adaptor, an LLM, and a streaming speech decoder. It eliminates the need for speech transcription, and can simultaneously generate text and speech responses directly from speech instructions with extremely low latency. We build our model based on the latest Llama-3.1-8B-Instruct model. To align the model with speech interaction scenarios, we construct a dataset named InstructS2S-200K, which includes 200K speech instructions and corresponding speech responses. Experimental results show that compared to previous speech-language models, LLaMA-Omni provides better responses in both content and style, with a response latency as low as 226ms. Additionally, training LLaMA-Omni takes less than 3 days on just 4 GPUs, paving the way for the efficient development of speech-language models in the future. — Read More

#nlp

Rick's Cafe AI 8:14 am on September 13, 2024
Tags: Training ( 62 )

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM’s reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs. — Read More

#training

Rick's Cafe AI 8:08 am on September 13, 2024
Tags: Performance ( 102 )

Reflection 70B model maker breaks silence amid fraud accusations

Matt Shumer, co-founder and CEO of OthersideAI, also known as its signature AI assistant writing product HyperWrite, has broken his near two days of silence after being accused of fraud when third-party researchers were unable to replicate the supposed top performance of a new large language model (LLM) he released on Thursday, September 5.

On his account on the social network X, Shumer apologized and claimed he “Got ahead of himself,” adding “I know that many of you are excited about the potential for this and are now skeptical.” — Read More

#performance

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Rick's Cafe AI

The latest in Artificial Intelligence carefully curated into its own special blend

Daily Archives: September 13, 2024

Arcee AI unveils SuperNova: A customizable, instruction-adherent model for enterprises

LLaMA-Omni: Seamless Speech Interaction with Large Language Models

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Reflection 70B model maker breaks silence amid fraud accusations