Pooled, also referred to as “converged”, clusters in a unified data environment support disparate workload better than separate, siloed clusters. Vendors now provide direct support for converged clusters to run key HPC-AI-HPDA (AI, HPC, and High Performance Data Analytic) workloads.
The success of workload optimized compute servers has created the need for converged clusters as organizations have generally added workload optimized clusters piecemeal to support their disparate AI, HPC, and HPDA needs.
Unfortunately, many disparate clusters operate in isolation, dedicating their resources for specific workloads and managed by humans, essentially placing them in a silo that prevents their benefitting the entire organization. This makes little sense from an operating efficiency perspective as it wastes both operation time plus OpEx (operating expense) and CapEx (capital expense) monies.[i] Read More
Tag Archives: Nvidia
Neuromorphic computing finds new life in machine learning
Neuromorphic computing has had little practical success in building machines that can tackle standard tests such as logistic regression or image recognition. But work by prominent researchers is combining the best of machine learning with simulated networks of spiking neurons, bringing new hope for neuromorphic breakthroughs.
Efforts have been underway for forty years to build computers that might emulate some of the structure of the brain in the way they solve problems. To date, they have shown few practical successes. But hope for so-called neuromorphic computing springs eternal, and lately, the endeavor has gained some surprising champions. Read More
AI is changing the entire nature of compute
The world of computing, from chips to software to systems, is going to change dramatically in coming years as a result of the spread of machine learning. We may still refer to these computers as “Universal Turing Machines,” as we have for eighty years or more. But in practice they will be different from the way they have been built and used up to now.
Such a change is of interest both to anyone who cares about what computers do, and to anyone who’s interested in machine learning in all its forms. Read More
Nvidia will support Arm hardware for high-performance computing
At the International Supercomputing Conference (ISC) in Frankfurt, Germany this week, Santa Clara-based chipmaker Nvidia announced that it will support processors architected by British semiconductor design company Arm. Nvidia anticipates that the partnership will pave the way for supercomputers capable of “exascale” performance — in other words, of completing at least a quintillion floating point computations (“flops”) per second, where a flop equals two 15-digit numbers multiplied together.
Nvidia says that by 2020 it will contribute its full stack of AI and high-performance computing (HPC) software to the Arm ecosystem, which by Nvidia’s estimation now accelerates over 600 HPC applications and machine learning frameworks. Among other resources and services, it will make available CUDA-X libraries, graphics-accelerated frameworks, software development kits, PGI compilers with OpenACC support, and profilers. Read More
Habana Labs launches its Gaudi AI training processor
Habana Labs, a Tel Aviv-based AI processor startup, today announced its Gaudi AI training processor, which promises to easily beat GPU-based systems by a factor of four. While the individual Gaudi chips beat GPUs in raw performance, it’s the company’s networking technology that gives it the extra boost to reach its full potential.
Gaudi will be available as a standard PCIe card that supports eight ports of 100GB Ethernet, as well as a mezzanine card that is compliant with the relatively new Open Compute Project accelerator module specs. This card supports either the same 10 100GB Ethernet ports or 20 ports of 50GB Ethernet. The company is also launching a system with eight of these mezzanine cards. Read More
Google’s AI Processor’s (TPU) Heart Throbbing Inspiration
Google has finally released the technical details of its its Tensor Process Unit (TPU) ASIC. Surprisingly, at its core, you find something that sounds like its inspired by the heart and not the brain. It’s called a “Systolic Array” and this computational device contains 256 x 256 8bit multiply-add computational units. That’s a grand total of 65,536 processors capable of cranking out 92 trillion operations per second! A systolic array is not a new thing, it was described way back in 1982 by Kung from CMU in “Why Systolic Architectures?” Just to get myself dated, I still recall a time when Systolic machines were all the rage. Read More
The 3 critical AI research questions
AI is dramatically enhancing industries, products, and core capabilities. But to make AI truly ubiquitous, it needs to run on end devices within a tight power and thermal budget. To learn more about the research that is advancing AI adoption, don’t miss this VB Live event featuring Qualcomm’s Senior Director of Engineering, Jilei Hou, and analystJack Gold.
“We’re not anywhere near a steady state with AI,” says Jack Gold, tech analyst and founder and president of J. Gold Associates. “AI is starting to take off, but we’re nowhere near the top of the hockey stick.” Read More
Fuzzy Math Is Key to AI Chip That Promises Human-Like Intuition
Simon Knowles, chief technology officer of Graphcore Ltd., is smiling at a whiteboard as he maps out his vision for the future of machine learning. He uses a black marker to dot and diagram the nodes of the human brain: the parts that are “ruminative, that think deeply, that ponder.” His startup is trying to approximate these neurons and synapses in its next-generation computer processors, which the company is betting can “mechanize intelligence.”
Artificial intelligence is often thought of as complex software that mines vast datasets, but Knowles and his co-founder, Chief Executive Officer Nigel Toon, argue that more important obstacles still exist in the computers that run the software. The problem, they say, sitting in their airy offices in the British port city of Bristol, is that chips—known, depending on their function, as CPUs (central processing units) or GPUs (graphics processing units)—weren’t designed to “ponder” in any recognizably human way. Whereas human brains use intuition to simplify problems such as identifying an approaching friend, a computer might try to analyze every pixel of that person’s face, comparing it to a database of billions of images before attempting to say hello. That precision, which made sense when computers were primarily calculators, is massively inefficient for AI, burning huge quantities of energy to process all the relevant data. Read More
SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers
The vast majority of processors in the world are actually microcontroller units (MCUs), which find widespread use performing simple control tasks in applications ranging from automobiles to medical devices and office equipment. The Internet of Things (IoT) promises to inject machine learning into many of these every-day objects via tiny, cheap MCUs. However, these resource-impoverished hardware platforms severely limit the complexity of machine learning models that can be deployed. For example, although convolutional neural networks (CNNs) achieve state-of-theart results on many visual recognition tasks, CNN inference on MCUs is challenging due to severe finite memory limitations. To circumvent the memory challenge associated with CNNs, various alternatives have been proposed that do fit within the memory budget of an MCU, albeit at the cost of prediction accuracy. This paper challenges the idea that CNNs are not suitable for deployment on MCUs. We demonstrate that it is possible to automatically design CNNs which generalize well, while also being small enough to fit onto memory-limited MCUs. Our Sparse Architecture Search method combines neural architecture search with pruning in a single, unified approach, which learns superior models on four popular IoT datasets. The CNNs we find are more accurate and up to 4.35× smaller than previous approaches, while meeting the strict MCU working memory constraint. Read More
Artificial Intelligence (AI) Solutions on Edge Devices
Artificial Intelligence (AI) Solutions, particularly those based on Deep Learning in the areas of Computer Vision, are done in a cloud-based environment requiring heavy computing capacity.
Inference is a relatively lower compute-intensive task than training, where latency is of greater importance for providing real-time results on a model. Most inference is still performed in the cloud or on a server, but as the diversity of AI applications grows, the centralized training and inference paradigm is coming into question.
It is possible, and becoming easier, to run AI and Machine Learning with analytics at the Edge today, depending on the size and scale of the Edge site and the particular system being used. While Edge site computing systems are much smaller than those found in central data centers, they have matured, and now successfully run many workloads due to an immense growth in the processing power of today’s x86 commodity servers. It’s quite amazing how many workloads can now run successfully at the Edge. Read More