Mistral 7B is a transformer model designed for fast inference and handling longer sequences. It achieves this by utilizing grouped-query attention and sliding-window attention. Group query attention combines multi-query and multi-head attention to balance output quality and speed. Sliding-window attention extends context length by looking beyond the window size. Mistral 7B offers an 8,000-token context length, delivering low latency, high throughput, and strong performance in comparison to larger models. It also has low memory requirements at a 7B model size. This model is freely available under the permissive Apache 2.0 license without usage restrictions. — Read More
Daily Archives: December 4, 2023
Unlocking new AI translation capabilities with a suite of publicly available models
Seamless merges the quality and multilinguality of SeamlessM4T v2, the low latency of SeamlessStreaming and the expression preservation of SeamlessExpressive into one unified system. It’s the first streaming translation model to maintain both vocal style and prosody, which can be particularly challenging in streaming, where the system only has access to partial input. — Read More
Read the Paper
[1hr Talk] Intro to Large Language Models
Orca 2: Teaching Small Language Models How to Reason
Orca 1 learns from rich signals, such as explanation traces, allowing it to outperform conventional instruction-tuned models on benchmarks like BigBench Hard and AGIEval. In Orca 2, we continue exploring how improved training signals can enhance smaller LMs’ reasoning abilities. Research on training small LMs has often relied on imitation learning to replicate the output of more capable models. We contend that excessive emphasis on imitation may restrict the potential of smaller models. We seek to teach small LMs to employ different solution strategies for different tasks, potentially different from the one used by the larger model. For example, while larger models might provide a direct answer to a complex task, smaller models may not have the same capacity. In Orca 2, we teach the model various reasoning techniques (step-by-step, recall then generate, recall-reason-generate, direct answer, etc.). More crucially, we aim to help the model learn to determine the most effective solution strategy for each task. We evaluate Orca 2 using a comprehensive set of 15 diverse benchmarks (corresponding to approximately 100 tasks and over 36,000 unique prompts). Orca 2 significantly surpasses models of similar size and attains performance levels similar or better to those of models 5-10x larger, as assessed on complex tasks that test advanced reasoning abilities in zero-shot settings. make Orca 2 weights publicly available at this http URL to support research on the development, evaluation, and alignment of smaller LMs. — Read More