A recent paper set the fastest record for multiplying two matrices. But it also marks the end of the line for a method researchers have relied on for decades to make improvements.
For computer scientists and mathematicians, opinions about “exponent two” boil down to a sense of how the world should be.
“It’s hard to distinguish scientific thinking from wishful thinking,” said Chris Umans of the California Institute of Technology. “I want the exponent to be two because it’s beautiful.”
“Exponent two” refers to the ideal speed — in terms of number of steps required — of performing one of the most fundamental operations in math: matrix multiplication. If exponent two is achievable, then it’s possible to carry out matrix multiplication as fast as physically possible. If it’s not, then we’re stuck in a world misfit to our dreams. Read More
Tag Archives: Performance
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
We propose a new framework for reasoning about generalization in deep learning.The core idea is to couple the Real World, where optimizers take stochastic gradient steps on the empirical loss, to an Ideal World, where optimizers take steps on the population loss. This leads to an alternate decomposition of test error into: (1)the Ideal World test error plus (2) the gap between the two worlds. If the gap (2)is universally small, this reduces the problem of generalization in offline learning to the problem of optimization in online learning. We then give empirical evidence that this gap between worlds can be small in realistic deep learning settings,in particular supervised image classification. For example, CNNs generalize better than MLPs on image distributions in the Real World, but this is “because” they optimize faster on the population loss in the Ideal World. This suggests our frame-work is a useful tool for understanding generalization in deep learning, and lays a foundation for future research in the area. Read More
Switch Transformers: Scaling to Trillion Parameter Models With Simple and Efficient Sparsity
In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) models defy this and instead select different parameters for each incoming example. The result is a sparsely-activated model – with an outrageous number of parameters – but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs, and training instability. We address these with the Switch Transformer. We simplify the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs. Our proposed training techniques mitigate the instabilities, and we show large sparse models may be trained, for the first time, with lower precision (bfloat16) formats. We design models based off T5-Base and T5-Large (Raffel et al., 2019) to obtain up to 7x increases in pre-training speed with the same computational resources. These improvements extend into multilingual settings where we measure gains over the mT5-Base version across all 101 languages. Finally, we advance the current scale of language models by pre-training up to trillion parameter models on the “Colossal Clean Crawled Corpus”, and achieve a 4x speedup over the T5-XXL model. Read More
#performanceRe-imagining Algorithmic Fairness in India and Beyond
Conventional algorithmic fairness is West-centric, as seen in its sub-groups, values, and optimisations. In this paper, we de-center algorithmic fairness and analyse AI power in India. Based on 36 qualitative interviews and a discourse analysis of algorithmic deployments in India, we find that several assumptions of algorithmic fairness are challenged in India. We find that data is not always reliable due to socio-economic factors, users are given third world treatment by ML makers, and AI signifies unquestioning aspiration. We contend that localising model fairness alone can be window dressing in India, where the distance between models and oppressed communities is large. Instead, we re-imagine algorithmic fairness in India and provide a roadmap to re-contextualise data and models, empower oppressed communities, and enable Fair-ML ecosystems. Read More
AI chips in the real world: Interoperability, constraints, cost, energy efficiency, and models
The answer to the question of how to make the best of AI hardware may not be solely, or even primarily, related to hardware
How do you make the best out of the proliferating array of emerging custom silicon hardware while not spreading yourself thin to keep up with each and every one of them?
If we were to put a price tag on that question, it would be in the multi-billion dollar territory. That’s what the combined estimated value of the different markets it touches upon is. As AI applications are exploding, so is the specialized hardware that supports them. Read More
Artificial Intelligence is a Supercomputing problem
The next generation of Artificial Intelligence applications impose new and demanding computing infrastructures. How are the computer systems that support artificial intelligence? How did we get here? Who has access to these systems? What is our responsibility as Artificial Intelligence practitioners?
[These posts will be used in the master course Supercomputers Architecture at UPC Barcelona Tech with the support of the BSC]
Part 1
Part 2
Machine learning at the speed of light: New paper demonstrates use of photonic structures for AI
As we enter the next chapter of the digital age, data traffic continues to grow exponentially. To further enhance artificial intelligence and machine learning, computers will need the ability to process vast amounts of data as quickly and as efficiently as possible.
Conventional computing methods are not up to the task, but in looking for a solution, researchers have seen the light—literally.
Light-based processors, called photonic processors, enable computers to complete complex calculations at incredible speeds. New research published this week in the journal Nature examines the potential of photonic processors for artificial intelligence applications. The results demonstrate for the first time that these devices can process information rapidly and in parallel, something that today’s electronic chips cannot do. Read More
Accelerating AI computing to the speed of light
Artificial intelligence and machine learning are already an integral part of our everyday lives online. … As the demands for AI online continue to grow, so does the need to speed up AI performance and find ways to reduce its energy consumption. Now a team of researchers has come up with a system that could help: an optical computing core prototype that uses phase-change material. This system is fast, energy efficient and capable of accelerating the neural networks used in AI and machine learning. The technology is also scalable and directly applicable to cloud computing. The team published these findings Jan. 4 in Nature Communications. Read More
Light-carrying chips advance machine learning
Researchers found that so-called photonic processors, with which data is processed by means of light, can process information very much more rapidly and in parallel than electronic chips. Read More
DeepMind researchers claim neural networks can outperform neurosymbolic models
So-called neurosymbolic models, which combine algorithms with symbolic reasoning techniques, appear to be much better-suited to predicting, explaining, and considering counterfactual possibilities than neural networks. But researchers at DeepMind claim neural networks can outperform neurosymbolic models under the right testing conditions. In a preprint paper, coauthors describe an architecture for spatiotemporal reasoning about videos in which all components are learned and all intermediate representations are distributed (rather than symbolic) throughout the layers of the neural network. The team says that it surpasses the performance of neurosymbolic models across all questions in a popular dataset, with the greatest advantage on the counterfactual questions. Read More