Data Augmentation | How to use Deep Learning when you have Limited Data — Part 2

This article is a comprehensive review of Data Augmentation techniques for Deep Learning, specific to images. This is Part 2 of How to use Deep Learning when you have Limited Data. Checkout Part 1 here.

…Why is there a need for a large amount of data?

When you train a machine learning model, what you’re really doing is tuning its parameters such that it can map a particular input (say, an image) to some output (a label). Our optimization goal is to chase that sweet spot where our model’s loss is low, which happens when your parameters are tuned in the right way.

Naturally, if you have a lot of parameters, you would need to show your machine learning model a proportional amount of examples, to get good performance. Also, the number of parameters you need is proportional to the complexity of the task your model has to perform. Read More

#performance

NanoNets: How to use Deep Learning when you have Limited Data

There has been a recent surge in popularity of Deep Learning, achieving state of the art performance in various tasks like Language Translation, playing Strategy Games and Self Driving Cars requiring millions of data points. One common barrier for using deep learning to solve problems is the amount of data needed to train a model. The requirement of large data arises because of the large number of parameters in the model that machines have to learn.

…There is an interesting almost linear relationship in the amount of data required and the size of the model. Basic reasoning is that your model should be large enough to capture relations in your data (eg textures and shapes in images, grammar in text and phonemes in speech) along with specifics of your problem (eg number of categories). Early layers of the model capture high level relations between the different parts of the input (like edges and patterns). Later layers capture information that helps make the final decision; usually information that can help discriminate between the desired outputs. Therefore if the complexity of the problem is high (like Image Classification) the number of parameters and the amount of data required is also very large. Read More

#performance

Audio Deep Learning Made Simple: Automatic Speech Recognition (ASR), How it Works

Speech-to-Text algorithm and architecture, including Mel Spectrograms, MFCCs, CTC Loss and Decoder, in Plain English

Over the last few years, Voice Assistants have become ubiquitous with the popularity of Google Home, Amazon Echo, Siri, Cortana, and others. These are the most well-known examples of Automatic Speech Recognition (ASR). This class of applications starts with a clip of spoken audio in some language and extracts the words that were spoken, as text. For this reason, they are also known as Speech-to-Text algorithms.

Of course, applications like Siri and the others mentioned above, go further. Not only do they extract the text but they also interpret and understand the semantic meaning of what was spoken, so that they can respond with answers, or take actions based on the user’s commands.

In this article, I will focus on the core capability of Speech-to-Text using deep learning. My goal throughout will be to understand not just how something works but why it works that way. Read More

#nlp

Top Python Libraries for Data Science, Data Visualization & Machine Learning

It has been some time since we last performed a Python libraries roundup, and as such we have taken the opportunity to start the month of November with just such a fresh list.

Last time we at KDnuggets did this, editor and author Dan Clark split up the vast array of Python data science related libraries up into several smaller collections, including data science libraries, machine learning libraries, and deep learning libraries. While splitting libraries into categories is inherently arbitrary, this made sense at the time of previous publication.

This time, however, we have split the collected on open source Python data science libraries in two. This first post (this) covers “data science, data visualization & machine learning,” and can be thought of as “traditional” data science tools covering common tasks. The second post, to be published next week, will cover libraries for use in building neural networks, and those for performing natural language processing and computer vision tasks. Read More

#python

Learned Motion Matching

Read More

#vfx, #videos

Matrix Multiplication Inches Closer to Mythic Goal

A recent paper set the fastest record for multiplying two matrices. But it also marks the end of the line for a method researchers have relied on for decades to make improvements.

For computer scientists and mathematicians, opinions about “exponent two” boil down to a sense of how the world should be.

“It’s hard to distinguish scientific thinking from wishful thinking,” said Chris Umans of the California Institute of Technology. “I want the exponent to be two because it’s beautiful.”

“Exponent two” refers to the ideal speed — in terms of number of steps required — of performing one of the most fundamental operations in math: matrix multiplication. If exponent two is achievable, then it’s possible to carry out matrix multiplication as fast as physically possible. If it’s not, then we’re stuck in a world misfit to our dreams. Read More

#performance

A Chat with Andrew on MLOps: From Model-centric to Data-centric AI

Read More

#data-science, #devops, #mlops, #videos

Text-to-Image Generation Grounded by Fine-Grained User Attention

Localized Narratives [29] is a dataset with detailed natural language descriptions of images paired with mouse traces that provide a sparse, fine-grained visual grounding for phrases. We propose TRECS, a sequential model that exploits this grounding to generate images. TRECS uses descriptions to retrieve segmentation masks and predict object labels aligned with mouse traces. These alignments are used to select and position masks to generate a fully covered segmentation canvas; the final image is produced by a segmentation-to-image generator using this canvas. This multi-step, retrieval-based approach outperforms existing direct text-to-image generation models on both automatic metrics and human evaluations: overall, its generated images are more photo-realistic and better match descriptions. Read More

#image-recognition, #nlp

Inside ‘TALON,’ the Nationwide Network of AI-Enabled Surveillance Cameras

Hundreds of pages of emails obtained by Motherboard show how little-known company Flock has expanded from surveilling individual neighborhoods into a network of smart cameras that spans the United States.

“Give your neighborhood peace of mind,” an advertisement for Flock, a line of smart surveillance cameras, reads. A February promotional video claims that the company’s “mission is to eliminate nonviolent crime across the country. We can only do that by working with every neighborhood and every police department throughout the country.”

Quietly, this seems to be happening. Read More

#surveillance

Brain2Pix: Fully convolutional naturalistic video reconstruction from brain activity

Reconstructing complex and dynamic visual perception from brain activity remains a major challenge in machine learning applications to neuroscience. Here we present a new method for reconstructing naturalistic images and videos from very large single-participant functional magnetic resonance data that leverages the recent success of image-to-image transformation networks. This is achieved by exploiting spatial information obtained from retinotopic mappings across the visual system. More specifically, we first determine what position each voxel in a particular region of interest would represent in the visual field based on its corresponding receptive field location. Then, the 2D image representation of the brain activity on the visual field is passed to a fully convolutional image-to-image network trained to recover the original stimuli using VGG feature loss with an adversarial regularizer. In our experiments, we show that our method offers a significant improvement over existing video reconstruction techniques. Read More

#human, #image-recognition