We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). — Read More
Monthly Archives: April 2024
GitHub previews Copilot Workspace, an AI developer environment to turn ideas into software
GitHub has revealed Copilot Workspace, its AI-native developer environment. Using natural language, developers can brainstorm, plan, build, test and run code faster and easier than before. First teased in 2023 at its user conference, GitHub Copilot Workspace is now available in technical preview and interested developers can sign up for the waitlist. — Read More
China’s S1 robot impresses with its ‘human-like’ speed and precision
The era of humanoid robots seems to flourish, with new models being developed and trained at exceptional speeds.
Another Chinese firm making advanced strides in this realm is Astribot. The Senzhen-based subsidiary of Stardust Intelligence is a robotics firm focused on developing AI robot assistants.
In a video released by the firm, its humanoid S1 is seen doing household tasks at an unprecedented pace, which marks a significant advancement for a robot. — Read More
Video
What can LLMs never do?
Every time over the past few years that we came up with problems LLMs can’t do, they passed them with flying colours. But even as they passed them with flying colours, they still can’t answer questions that seem simple, and it’s unclear why.
And so, over the past few weeks I have been obsessed by trying to figure out the failure modes of LLMs. This started off as an exploration of what I found. It is admittedly a little wonky but I think it is interesting. The failures of AI can teach us a lot more about what it can do than the successes. — Read More
The Rise of Large-Language-Model Optimization
The web has become so interwoven with everyday life that it is easy to forget what an extraordinary accomplishment and treasure it is. In just a few decades, much of human knowledge has been collectively written up and made available to anyone with an internet connection.
But all of this is coming to an end. The advent of AI threatens to destroy the complex online ecosystem that allows writers, artists, and other creators to reach human audiences. — Read More
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
The conventional recipe for maximizing model accuracy is to (1) train multiple models with various hyperparameters and (2) pick the individual model which performs best on a held-out validation set, discarding the remainder. In this paper, we revisit the second step of this procedure in the context of fine-tuning large pre-trained models, where fine-tuned models often appear to lie in a single low error basin. We show that averaging the weights of multiple models fine-tuned with different hyperparameter configurations often improves accuracy and robustness. Unlike a conventional ensemble, we may average many models without incurring any additional inference or memory costs — we call the results “model soups.” When fine-tuning large pre-trained models such as CLIP, ALIGN, and a ViT-G pre-trained on JFT, our soup recipe provides significant improvements over the best model in a hyperparameter sweep on ImageNet. The resulting ViT-G model, which attains 90.94% top-1 accuracy on ImageNet, achieved a new state of the art. Furthermore, we show that the model soup approach extends to multiple image classification and natural language processing tasks, improves out-of-distribution performance, and improves zero-shot performance on new downstream tasks. Finally, we analytically relate the performance similarity of weight-averaging and logit-ensembling to flatness of the loss and confidence of the predictions, and validate this relation empirically. Code is available at this https URL. — Read More
Evolutionary Optimization of Model Merging Recipes
We present a novel application of evolutionary algorithms to automate the creation of powerful foundation models. While model merging has emerged as a promising approach for LLM development due to its cost-effectiveness, it currently relies on human intuition and domain knowledge, limiting its potential. Here, we propose an evolutionary approach that overcomes this limitation by automatically discovering effective combinations of diverse open-source models, harnessing their collective intelligence without requiring extensive additional training data or compute. Our approach operates in both parameter space and data flow space, allowing for optimization beyond just the weights of the individual models. This approach even facilitates cross-domain merging, generating models like a Japanese LLM with Math reasoning capabilities. Surprisingly, our Japanese Math LLM achieved state-of-the-art performance on a variety of established Japanese LLM benchmarks, even surpassing models with significantly more parameters, despite not being explicitly trained for such tasks. Furthermore, a culturally-aware Japanese VLM generated through our approach demonstrates its effectiveness in describing Japanese culture-specific content, outperforming previous Japanese VLMs. This work not only contributes new state-of-the-art models back to the open-source community, but also introduces a new paradigm for automated model composition, paving the way for exploring alternative, efficient approaches to foundation model development. — Read More
There’s An AI For That (TAAFT)
“There’s An AI For That” is a leading AI aggregator offering a database of over 12400 AIs available for over 15000 tasks. The platform provides remarkable inventory of cutting-edge AI Solutions for almost every need. — Read More
USAF Test Pilot School and DARPA announce breakthrough in aerospace machine learning
The U.S. Air Force Test Pilot School and the Defense Advanced Research Projects Agency were finalists for the 2023 Robert J. Collier Trophy, a formal acknowledgement of recent breakthroughs that have launched the machine-learning era within the aerospace industry.
The teams worked together to test breakthrough executions in artificial intelligence algorithms using the X-62A VISTA aircraft as part of DARPA’s Air Combat Evolution (ACE) program.
… In less than a calendar year the teams went from the initial installation of live AI agents into the X-62A’s systems, to demonstrating the first AI versus human within-visual-range engagements, otherwise known as a dogfight. In total, the team made over 100,000 lines of flight-critical software changes across 21 test flights. — Read More