A group of more than 20 organizations, including tech giants like AWS, Google, IBM, and Nvidia, joined schools like Stanford University and Ohio State University today in backing the idea of a national AI research cloud. Nonprofit groups like Mozilla and the Allen Institute for AI also support the idea. The cloud would help researchers across the United States gain access to compute power and data sets freely available to companies like Google, but not researchers in academia. Compute resources available to academics could grow even more scarce in the near future as COVID-19 fallout constricts university budgets. Read More
Original Blog Post
Tag Archives: Universities
Hate spoilers? This AI tool spots them for you
Did social media spoil the Avengers’ Endgame movie for you? Or maybe one of the Game of Thrones books? A team of researchers from the University of California San Diego is working to make sure that doesn’t happen again. They have developed an AI-based system that can flag spoilers in online reviews of books and TV shows.
Spoilers are everywhere on the internet, and are very common on social media. As internet users, we understand the pain of spoilers, and how they can ruin one’s experience,” said Ndapa Nakashole, a professor of computer science at UC San Diego and one of the paper’s senior authors. Read More
MIT's new interactive machine learning prediction tool could give everyone AI superpowers
Soon, you might not need anything more specialized than a readily accessible touchscreen device and any existing data sets you have access to in order to build powerful prediction tools. A new experiment from MIT and Brown University researchers have added a capability to their ‘Northstar’ interactive data system that can “instantly generate machine-learning models” to use with their exiting data sets in order to generate useful predictions.
One example the researchers provide is that doctors could make use of the system to make predictions about the likelihood their patients have of contracting specific diseases based on their medial history. Or, they suggest, a business owner could use their historical sales data to develop more accurate forecasts, quickly and without a ton of manual analytics work. Read More
Crowd Workers Are Not Online Shakespeares, But Carnegie Mellon Research Shows They Can Write
Study Finds Crowdsourced Articles Compare Favorably to Those by Single Authors
PITTSBURGH—Writing can be a solitary, intellectual pursuit, but researchers at Carnegie Mellon University have shown that the task of writing an informational article also can be accomplished by dozens of people working independently online.

Synthetic Speech Generated from Brain Recordings
A state-of-the-art brain-machine interface created by UC San Francisco neuroscientists can generate natural-sounding synthetic speech by using brain activity to control a virtual vocal tract – an anatomically detailed computer simulation including the lips, jaw, tongue and larynx. The study was conducted in research participants with intact speech, but the technology could one day restore the voices of people who have lost the ability to speak due to paralysis and other forms of neurological damage. Read More
Georgia Tech, UC Davis, Texas A&M Join NVAIL Program with Focus on Graph Analytics
NVIDIA is partnering with three leading universities — Georgia Tech, the University of California, Davis, and Texas A&M — as part of our NVIDIA AI Labs program, to build the future of graph analytics on GPUs.
NVIDIA’s work with these three new NVAIL partners aims to ultimately create a one-stop shop for customers to take advantage of accelerated graph analytics algorithms. Read More
The role of academia in data science education
I was recently asked to moderate an academic panel on the role of universities in training the data science workforce. I preceded each question with opinionated introductions which I have fused into this blog post. These are weakly held opinions so please consider commenting if you disagree with anything.
To discuss data science education we first need to clearly state what it means. The panel organizers defined data science as “an emerging discipline that draws upon knowledge in statistical methodology and computer science to create impactful predictions and insights for a wide range of traditional scholarly fields.“ But is it an academic discipline? If so, what are the shared fundamental principles, expertise, skills, and knowledge-based shared by data scientists? Is there a core curriculum for Data Science? Providing a more detailed definition might help. Read More
Visipedia
Visipedia, short for “Visual Encyclopedia,” is a network of people and machines that is designed to harvest and organize visual information and make it accessible to anyone anywhere. Visipedia machines can learn from experts how to discover and classify animals, plants and objects in images. Communities of scientists and interested citizens may use Visipedia software to share, annotate and organize meaningful content in images. Recent experiments include software that can detect and classify trees from satellite and street-level images, and an app that can recognize North American birds. Visipedia is a joint project between Pietro Perona’s Vision Group at Caltech and Serge Belongie’s Vision Group at Cornell Tech. Read More
Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection
We introduce tools and methodologies to collect high quality, large scale fine-grained computer vision datasets using citizen scientists – crowd annotators who are passion-ate and knowledgeable about specific domains such as birds or airplanes. We worked with citizen scientists and domain experts to collect NABirds, a new high quality dataset containing 48,562 images of North American birds with 555 categories, part annotations and bounding boxes. We find that citizen scientists are significantly more accurate than Mechanical Turkers at zero cost. We worked with bird experts to measure the quality of popular datasets like CUB-200-2011 and ImageNet and found class label error rates of at least4%. Nevertheless, we found that learning algorithms are surprisingly robust to annotation errors and this level of training data corruption can lead to an acceptably small increase in test error if the training set has sufficient size. At the same time, we found that an expert-curated high quality test set like NABirds is necessary to accurately measure the performance of fine-grained computer vision systems. We used NABirds to train a publicly available bird recognition service deployed on the web site of the Cornell Lab of Ornithology.1 Read More
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object’s pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located and normalized by the pose. We perform an empirical study of a number of pose normalization schemes, including an investigation of higher order geometric warping functions. We propose a novel graph-based clustering algorithm for learning a compact pose normalization space. We perform a detailed investigation of stateof-the-art deep convolutional feature implementations [17, 22, 26, 28] and finetuning feature learning for fine-grained classification. We observe that a model that integrates lower-level feature layers with pose-normalized extraction routines and higher-level feature layers with unaligned image features works best. Our experiments advance state-of-the-art performance on bird species recognition, with a large improvement of correct classification rates over previous methods (75% vs. 55-65%). Read More