Monthly Archives: October 2021
Aggregating Nested Transformers
Although hierarchical structures are popular in recent vision transformers, they require sophisticated designs and massive datasets to work well. In this work, we explore the idea of nesting basic local transformers on non-overlapping image blocks and aggregating them in a hierarchical manner. We find that the block aggregation function plays a critical role in enabling cross-block non-local information communication. This observation leads us to design a simplified architecture with minor code changes upon the original vision transformer and obtains improved performance compared to existing methods. Our empirical results show that the proposed method NesT converges faster and requires much less training data to achieve good generalization. For example, a NesT with 68M parameters trained on ImageNet for 100/300 epochs achieves 82.3%/83.8% accuracy evaluated on 224 × 224 image size, outperforming previous methods with up to 57% parameter reduction. Training a NesT with 6M parameters from scratch on CIFAR10 achieves 96% accuracy using a single GPU, setting a new state of the art for vision transformers. Beyond image classification, we extend the key idea to image generation and show NesT leads to a strong decoder that is 8×faster than previous transformer based generators. Furthermore, we also propose a novel method for visually interpreting the learned model. Read More
#performanceMultimodal datasets: misogyny, pornography, andmalignant stereotypes
We have now entered the era of trillion parameter machine learning models trained on billion-sized datasets scraped from the internet. The rise of these gargantuan datasets has given rise to formidable bodies of critical work that has called for caution while generating these large datasets. These address concerns surrounding the dubious curation practices used to generate these datasets, the sordid quality of alt-text data available on the world wide web, the problematic content of the CommonCrawl dataset often used as a source for training large language models, and the entrenched biases in large-scale visio-linguistic models (such as OpenAI’s CLIP model) trained on opaque datasets (WebImageText). In the backdrop of these specific calls of caution, we examine the recently released LAION-400M dataset, which is a CLIP-filtered dataset of Image-Alt-text pairs parsed from the Common-Crawl dataset. We found that the dataset contains, troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content. We outline numerous implications, concerns and downstream harms regarding the current state of large scale datasets while raising open questions for various stakeholders including the AI community, regulators, policy makers and data subjects. Read More
Forget Boston Dynamics. This robot taught itself to walk
Slick, viral videos from Boston Dynamics are impressive but teaching a robot to walk by itself is a lot harder.
A pair of robot legs called Cassie has been taught to walk using reinforcement learning, the training technique that teaches AIs complex behavior via trial and error. The two-legged robot learned a range of movements from scratch, including walking in a crouch and while carrying an unexpected load. Read More
State of AI Report 2021
The State of AI Report analyses the most interesting developments in AI. We aim to trigger an informed conversation about the state of AI and its implication for the future. The Report is produced by AI investorsNathan Benaich and Ian Hogarth.
Now in its fourth year, the State of AI Report 2021 is reviewed by AI practioners in industry and research, and features invited contributions from a range of well-known and up-and-coming companies and research groups. The Report considers the following key dimensions:
- Research: Technology breakthroughs and capabilities.
- Talent: Supply, demand and concentration of AI talent.
- Industry: Areas of commercial application for AI and its business impact.
- Politics: Regulation of AI, its economic implications and the emerging geopolitics of AI.
- Predictions: What we believe will happen and a performance review to keep us honest.
The Death and Birth of Technological Revolutions
What was especially remarkable about Carlota Perez’s Technological Revolutions and Financial Capital was its timing: 2002 was the middle of the cold winter that followed the Dotcom Bubble, and here was Perez arguing that the IT revolution and the Internet were not in fact dead ideas, but in the middle of a natural transition to a new Golden Age.
Perez’s thesis was based on over 200 years of history and the patterns she identified in four previous technological revolutions. … Perez’s argument was that the four technological revolutions that proceeded the Age of Information and Telecommunications followed a similar cycle. Read More

Synthetic Media: How deepfakes could soon change our world
JAIC chief wants AI progress to be ‘slow and incremental’
The Department of Defense’s Joint Artificial Intelligence Center is looking to field AI across the military slowly, so products can be broadly usable across combatant commands, the center’s director said Friday.
That mindset appears to be different from some innovative upstart organizations within the government that have emphasized the private-sector mentality of speed and agility in finding solutions to pressing challenges. Growth for the center’s AI tools will come from solutions to common challenges that senior leaders across the military face, JAIC Director Lt. Gen. Michael Groen said during the Billington Cybersecurity Summit. Read More
Does Your Dermatology Classifier Know What It Doesn’t Know? Detecting the Long-Tail of Unseen Conditions
Supervised deep learning models have proven to be highly effective in classification of dermatological conditions. These models rely on the availability of abundant labeled training examples. However, in the real-world, many dermatological conditions are individually too infrequent for per-condition classification with supervised learning. Although individually infrequent, these conditions may collectively be common and therefore are clinically significant in aggregate. To prevent models from generating erroneous outputs on such examples, there remains a considerable unmet need for deep learning systems that can better detect such infrequent conditions. These infrequent ‘outlier’ conditions are seen very rarely (or not at all) during training. In this paper, we frame this task as an out-of-distribution (OOD) detection problem. We set up a benchmark ensuring that outlier conditions are disjoint between the model training, validation, and test sets. Unlike traditional OOD detection benchmarks where the task is to detect dataset distribution shift, we aim at the more challenging task of detecting subtle semantic differences. We propose a novel hierarchical outlier detection (HOD) loss, which assigns multiple abstention classes corresponding to each training outlier class and jointly performs a coarse classification of inliers vs. outliers, along with fine-grained classification of the individual classes. We demonstrate that the proposed HOD loss based approach outperforms leading methods that leverage outlier data during training. Further, performance is significantly boosted by using recent representation learning methods (BiT, SimCLR, MICLe). Further, we explore ensembling strategies for OOD detection and propose a diverse ensemble selection process for the best result. We also perform a subgroup analysis over conditions of varying risk levels and different skin types to investigate how OOD performance changes over each subgroup and demonstrate the gains of our framework in comparison to baseline. Furthermore, we go beyond traditional performance metrics and introduce a cost matrix for model trust analysis to approximate downstream clinical impact. We use this cost matrix to compare the proposed method against the baseline, thereby making a stronger case for its effectiveness in real-world scenarios. Read More
#performance, #machine-learningDesigning effective traditional and deep learning-based inspection systems for machine vision applications
When best practices are followed, machine vision and deep learning-based imaging systems are capable of effective visual inspection and will improve efficiency, increase throughput, and drive revenue.
For decades, machine vision technology has performed automated inspection tasks—including defect detection, flaw analysis, assembly verification, sorting, and counting—in industrial settings. Recent computer vision software advances and processing techniques have further enhanced the capabilities of these imaging systems in new and expanding uses. The imaging system itself remains a critically important vision component, yet its role and execution can be underestimated or misunderstood.
Without a well-designed and properly installed imaging system, software will struggle to reliably detect defects. For example, even though the imaging setup in Figure 1 (left) displays an attractive image of a gear, only the image on the right clearly shows a dent. When best practices are followed, machine vision and deep learning-based imaging systems are capable of effective visual inspection and will improve efficiency, increase throughput, and drive revenue. This article takes an in-depth dive into the best practices for iterative design and provides a roadmap for success for designing each type of system. Read More