The Future of Search Is Boutique

For most queries, Google search is pretty underwhelming these days. Google is great at answering questions with an objective answer, like “# of billionaires in the world” or “What is the population of Iceland?” It’s pretty bad at answering questions that require judgment and context like “What do NFT collectors think about NFTs?”

The evidence is everywhere. These days, I find myself suppressing the garbage Internet by searching on Google for “Substack + future of learning” to find the best takes on education. We hack Twitter with the “what is the best” posts over and over again. When I’m researching a new product, I type “X item reddit” into Google. I find enormous value in small, niche, often forgotten sites like Spaghetti Directory.

There’s an emergence of tools like Notion, Airtable, and Readwise where people are aggregating content and resources, reviving the curated web. But at the moment these are mostly solo affairs — hidden in private or semi-private corners of the Internet, fragmented, poorly indexed, and unavailable for public use. We haven’t figured out how to make them multiplayer. In cases where we’ve made them public and collaborative — here is a great example — these projects are often short-lived and poorly maintained. Read More

#big7

Former Intelligence Officials, Citing Russia, Say Big Tech Monopoly Power is Vital to National Security

When the U.S. security state announces that Big Tech’s centralized censorship power must be preserved, we should ask what this reveals about whom this regime serves.

A group of former intelligence and national security officials on Monday issued a jointly signed letter warning that pending legislative attempts to restrict or break up the power of Big Tech monopolies — Facebook, Google, and Amazon — would jeopardize national security because, they argue, their centralized censorship power is crucial to advancing U.S. foreign policy. The majority of this letter is devoted to repeatedly invoking the grave threat allegedly posed to the U.S. by Russia as illustrated by the invasion of Ukraine, and it repeatedly points to the dangers of Putin and the Kremlin to justify the need to preserve Big Tech’s power in its maximalist form. Any attempts to restrict Big Tech’s monopolistic power would therefore undermine the U.S. fight against Moscow. Read More

#big7, #ic, #russia

Amazon releases 51-language dataset for language understanding

MASSIVE dataset and Massively Multilingual NLU (MMNLU-22) competition and workshop will help researchers scale natural-language-understanding technology to every language on Earth.

Imagine that all people around the world could use voice AI systems such as Alexa in their native tongues.

One promising approach to realizing this vision is massively multilingual natural-language understanding (MMNLU), a paradigm in which a single machine learning model can parse and understand inputs from many typologically diverse languages. By learning a shared data representation that spans languages, the model can transfer knowledge from languages with abundant training data to those in which training data is scarce. Read More

Read the Paper

#big7, #nlp

Google rolls out AI improvements to aid with Search safety and ‘personal crisis’ queries

Google today announced it will be rolling out improvements to its AI model to make Google Search a safer experience and one that’s better at handling sensitive queries, including those around topics like suicide, sexual assault, substance abuse and domestic violence. It’s also using other AI technologies to improve its ability to remove unwanted explicit or suggestive content from Search results when people aren’t specifically seeking it out.

Currently, when people search for sensitive information — like suicide, abuse or other topics — Google will display the contact information for the relevant national hotlines above its search results. But the company explains that people who are in crisis situations may search in all kinds of ways, and it’s not always obvious to a search engine that they’re in need, even if it would raise flags if a human saw their search queries. With machine learning and the latest improvements to Google’s AI model called MUM (Multitask Unified Model), Google says it will be able to automatically and more accurately detect a wider range of personal crisis searches because of how MUM is able to better understand the intent behind people’s questions and queries. Read More

#big7

Good News About the Carbon Footprint of Machine Learning Training

Machine learning (ML) has become prominent in information technology, which has led some to raise concerns about the associated rise in the costs of computation, primarily the carbon footprint, i.e., total greenhouse gas emissions. While these assertions rightfully elevated the discussion around carbon emissions in ML, they also highlight the need for accurate data to assess true carbon footprint, which can help identify strategies to mitigate carbon emission in ML.

In “The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink”, accepted for publication in IEEE Computer, we focus on operational carbon emissions — i.e., the energy cost of operating ML hardware, including data center overheads — from training of natural language processing (NLP) models and investigate best practices that could reduce the carbon footprint. We demonstrate four key practices that reduce the carbon (and energy) footprint of ML workloads by large margins, which we have employed to help keep ML under 15% of Google’s total energy use. Read More

#big7, #performance

DeepMind’s AI can control superheated plasma inside a fusion reactor

DeepMind’s streak of applying its world-class AI to hard science problems continues. In collaboration with the Swiss Plasma Center at EPFL—a university in Lausanne, Switzerland—the UK-based AI firm has now trained a deep reinforcement learning algorithm to control the superheated soup of matter inside a nuclear fusion reactor. The breakthrough, published in the journal Nature, could help physicists better understand how fusion works, and potentially speed up the arrival of an unlimited source of clean energy. Read More

#big7

FILM: Frame Interpolation for Large Scene Motion

Tensorflow 2 implementation of our high quality frame interpolation neural network. We present a unified single-network approach that doesn’t use additional pre-trained networks, like optical flow or depth, and yet achieve state-of-the-art results. We use a multi-scale feature extractor that shares the same convolution weights across the scales. Our model is trainable from frame triplets alone. Read More

#big7, #devops, #image-recognition

Next Gen Stats: Intro to Passing Score metric

Next Gen Stats teamed up with the AWS Proserve data science group to develop a more comprehensive metric for evaluating passing performance: the Next Gen Stats Passing Score. Built off of seven different AWS-powered machine-learning models, the NGS Passing Score seeks to assess a quarterback’s execution on every pass attempt and transform that evaluation into a digestible score with a range between 50 and 99. The score can be aggregated on any sample of pass attempts while still maintaining validity in rank order.

… Instead of simply awarding all passing yards, touchdowns and interceptions to the quarterback, the NGS Passing Score equation leverages the outputs of our models to form the components that best 

  • Evaluate passing performance relative to a league-average expectation.
  • Isolate the factors that the quarterback can control.
  • Represent the most indicative features of winning football games.
  • Encompass passing performance in a single composite score (ranging from 50 to 99).
  • Generate valid scores at any sample size of pass attempts.
Read More #big7, #machine-learning

Don’t forget Microsoft

Despite its scale, Microsoft is one of the most overlooked companies in tech.

  • It is not a beloved consumer brand like Apple, Facebook, Amazon, or Google.
  • It was not a venture capital success story: Microsoft was too profitable to raise real VC money, so the founders owned 70% at IPO.
  • It is the oldest of FAMGA, hidden away in a different state.
But there is a lot more to Microsoft than meets the eye. If it plays its cards right, Microsoft can become the first $10T company. And startup founders would be wise to learn from the behemoth in Redmond.

This piece undertakes a daunting set of tasks: 1) understand what Microsoft is, 2) chart a path for its global domination, and 3) apply learnings from the company to the startup ecosystem. Read More

#big7

Fake It Till You Make It

We demonstrate that it is possible to perform face-related computer vision in the wild using synthetic data alone.

The community has long enjoyed the benefits of synthesizing training data with graphics, but the domain gap between real and synthetic data has remained a problem, especially for human faces. Researchers have tried to bridge this gap with data mixing, domain adaptation, and domain-adversarial training, but we show that it is possible to synthesize data with minimal domain gap, so that models trained on synthetic data generalize to real in-the-wild datasets.

We describe how to combine a procedurally-generated parametric 3D face model with a comprehensive library of hand-crafted assets to render training images with unprecedented realism and diversity. We train machine learning systems for face-related tasks such as landmark localization and face parsing, showing that synthetic data can both match real data in accuracy as well as open up new approaches where manual labelling would be impossible. Read More

Dataset

#big7, #fake, #image-recognition