Visipedia, short for “Visual Encyclopedia,” is a network of people and machines that is designed to harvest and organize visual information and make it accessible to anyone anywhere. Visipedia machines can learn from experts how to discover and classify animals, plants and objects in images. Communities of scientists and interested citizens may use Visipedia software to share, annotate and organize meaningful content in images. Recent experiments include software that can detect and classify trees from satellite and street-level images, and an app that can recognize North American birds. Visipedia is a joint project between Pietro Perona’s Vision Group at Caltech and Serge Belongie’s Vision Group at Cornell Tech. Read More
Daily Archives: April 19, 2019
Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection
We introduce tools and methodologies to collect high quality, large scale fine-grained computer vision datasets using citizen scientists – crowd annotators who are passion-ate and knowledgeable about specific domains such as birds or airplanes. We worked with citizen scientists and domain experts to collect NABirds, a new high quality dataset containing 48,562 images of North American birds with 555 categories, part annotations and bounding boxes. We find that citizen scientists are significantly more accurate than Mechanical Turkers at zero cost. We worked with bird experts to measure the quality of popular datasets like CUB-200-2011 and ImageNet and found class label error rates of at least4%. Nevertheless, we found that learning algorithms are surprisingly robust to annotation errors and this level of training data corruption can lead to an acceptably small increase in test error if the training set has sufficient size. At the same time, we found that an expert-curated high quality test set like NABirds is necessary to accurately measure the performance of fine-grained computer vision systems. We used NABirds to train a publicly available bird recognition service deployed on the web site of the Cornell Lab of Ornithology.1 Read More
Bird Species Categorization Using Pose Normalized Deep Convolutional Nets
We propose an architecture for fine-grained visual categorization that approaches expert human performance in the classification of bird species. Our architecture first computes an estimate of the object’s pose; this is used to compute local image features which are, in turn, used for classification. The features are computed by applying deep convolutional nets to image patches that are located and normalized by the pose. We perform an empirical study of a number of pose normalization schemes, including an investigation of higher order geometric warping functions. We propose a novel graph-based clustering algorithm for learning a compact pose normalization space. We perform a detailed investigation of stateof-the-art deep convolutional feature implementations [17, 22, 26, 28] and finetuning feature learning for fine-grained classification. We observe that a model that integrates lower-level feature layers with pose-normalized extraction routines and higher-level feature layers with unaligned image features works best. Our experiments advance state-of-the-art performance on bird species recognition, with a large improvement of correct classification rates over previous methods (75% vs. 55-65%). Read More