We propose a generalization of transformer neural network architecture for arbitrary graphs. The original transformer was designed for Natural Language Processing (NLP), which operates on fully connected graphs representing all connections between the words in a sequence. Such architecture does not leverage the graph connectivity inductive bias, and can perform poorly when the graph topology is important and has not been encoded into the node features. We introduce a graph transformer with four new properties compared to the standard model. First, the attention mechanism is a function of the neighborhood connectivity for each node in the graph. Second, the positional encoding is represented by the Laplacian eigenvectors, which naturally generalize the sinusoidal positional encodings often used in NLP. Third, the layer normalization is replaced by a batch normalization layer, which provides faster training and better generalization performance. Finally, the architecture is extended to edge feature representation, which can be critical to tasks s.a. chemistry (bond type) or link prediction (entity relationship in knowledge graphs). Numerical experiments on a graph benchmark demonstrate the performance of the proposed graph transformer architecture. This work closes the gap between the original transformer, which was designed for the limited case of line graphs, and graph neural networks, that can work with arbitrary graphs. As our architecture is simple and generic, we believe it can be used as a black box for future applications that wish to consider transformer and graphs. Read More
#graph-neural-networkMonthly Archives: June 2022
Papercup enhances AI dubbing capabilities with new $20M
Papercup, a startup based in the U.K., has raised $20 million for an AI-powered dubbing service for translating speech and expression into other languages. The funding will allow Papercup to enhance its research around expressive voices, expand into new languages and scale their offering in markets in which they know their technology works well. Read More
Link to Service
Toronto wants to kill the smart city forever
In February, the city of Toronto announced plans for a new development along its waterfront. They read like a wish list for any passionate urbanist: 800 affordable apartments, a two-acre forest, a rooftop farm, a new arts venue focused on indigenous culture, and a pledge to be zero-carbon.
The idea of an affordable, off-the-grid Eden in the heart of the city sounds great. But there was an entirely different urban utopia planned for this same 12-acre plot, known as Quayside, just a few years ago. It was going to be the place where Sidewalk Labs, the urban innovation arm of Alphabet, was going to prove out its vision for the smart city.
Sandwiched between the elevated Gardiner Expressway and Lake Ontario, and occupied by a few one-story commercial buildings and a mothballed grain silo, Quayside shouldn’t have been that hard to develop. But controversy ensued almost from the moment in October 2017 that Waterfront Toronto, a governmental agency overseeing the redevelopment of 2,000 acres along the lake shore, announced that Sidewalk had submitted the winning proposal. Read More
China’s Tech Giants Lost Their Swagger and May Never Get It Back
On trading floors in New York and Hong Kong, the brightening mood toward Chinese technology companies is unmistakable: With stocks like Alibaba Group Holding Ltd. and Tencent Holdings Ltd. surging from multi-year lows, talk of a new bull market is growing louder.
Yet speak to executives, entrepreneurs and venture capital investors intimately involved in China’s tech sector and a more downbeat picture emerges. Interviews with more than a dozen industry players suggest the outlook is still far from rosy, despite signs that the Communist Party’s crackdown on big tech is softening at the edges.
These insiders describe an ongoing sense of paranoia and paralysis, along with an unsettling realization that the sky-high growth rates of the past two decades are likely never coming back. Read More
Find the smartest technologist in the company and make them CEO
Marc Andreessen arrived in Silicon Valley 28 years ago, fresh from the University of Illinois, where he and a colleague developed NCSA Mosaic, the graphic web browser that opened the world’s eyes to the potential of the internet. As an entrepreneur, Andreessen launched Netscape, whose IPO was the bellwether event of the first internet boom, and Opsware, an early cloud and software-as-a-service (SaaS) company. He then cofounded Andreessen Horowitz with Ben Horowitz, building it into one of the world’s premiere venture capital firms.
Andreessen’s experience gives him a unique perspective on how new technologies develop, disrupt, and create opportunities for business. It’s a perspective that is of particular interest at a time like this, when so much is unclear about the future of technology. Andreessen recently joined McKinsey senior partner Tracy Francis and the Quarterly editorial director Rick Tetzeli for a wide-ranging discussion. An edited version of the conversation follows. Read More
The Year in AI So Far: Massive Models and How to Use Them
The world of artificial intelligence and machine learning moves very fast. So fast, in fact, that it’s remarkable to think that it was only a decade ago when the AlexNet model dominated the ImageNet competition and kicked off the process that made deep learning a bona fide technology movement. Today, after years of headlines about game-playing, we see ever-increasing innovation that applies to the real world.
In the last couple of years alone, AI/ML models like GPT-3 and AlphaFold delivered capabilities that catalyzed new products and companies, and that stretched our understanding of what computers can do.
With that in mind, we thought we’d revisit our AI/ML coverage in Future over the first half of the year, as well as catch you up on some — but certainly not all — of the major industry developments during that time. As you’ll see, some combination of large language models, generative models, and foundation models are a major source of attention, and we’re just skimming the surface in terms of understanding what they can do and how the world outside of large research labs can utilize their power. Read More
Google engineer identifies anonymous faces in WWII photos with AI facial recognition
Walking past the countless photos of Holocaust survivors and victims at Warsaw’s POLIN Museum of the History of Polish Jews in 2016, New York-native Daniel Patt was haunted by the possibility that he was passing the faces of his own relatives without even knowing it.
For Patt, a 40-year-old software engineer now working for Google, that sort of conundrum presented the potential for a creative solution. And so he set to work creating and developing From Numbers to Names (N2N), an artificial intelligence-driven facial recognition platform that can scan through photos from prewar Europe and the Holocaust, linking them to people living today. Read More
Amazon Launches CodeWhisperer, a GitHub Copilot-like AI pair programming tool
At its re:Mars conference, Amazon today announced the launch of CodeWhisperer, an AI pair programming tool similar to GitHub’s Copilot that can autocomplete entire functions based on only a comment or a few keystrokes. The company trained the system, which currently supports Java, JavaScript and Python, on billions of lines of publicly available open source code and its own codebase, as well as publicly available documentation and code on public forums.
It’s now available in preview as part of the AWS IDE Toolkit, which means developers can immediately use it right inside their preferred IDEs, including Visual Studio Code, IntelliJ IDEA, PyCharm, WebStorm and Amazon’s own AWS Cloud 9. Support for the AWS Lambda Console is also coming soon. Read More
Meta open sources OPT-66B
On June 23, 2022, Meta announced the release of the Open Pretrained Transformer(OPT-66B), the largest unrestricted open-sourced model to date. The tech giant has also released the logbooks used for training all their baselines: 125M through 66B. This completes a full release of logbooks detailing the development of all the OPT models and marks the first time in the AI industry that such extensive notes are released with the models and associated paper. Read More
GitHub
DeepNet: Scaling Transformers to 1,000 Layers
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DEEPNORM) to modify the residual connection in Transformer, accompanying with theoretically derived initialization. In-depth theoretical analysis shows that model updates can be bounded in a stable way. The proposed method combines the best of two worlds, i.e., good performance of Post-LN and stable training of Pre-LN, making DEEPNORM a preferred alternative. We successfully scale Transformers up to 1,000 layers (i.e., 2,500 attention and feed-forward network sublayers) without difficulty, which is one order of magnitude deeper than previous deep Transformers. Remarkably, on a multilingual benchmark with 7,482 translation directions, our 200-layer model with 3.2B parameters significantly outperforms the 48-layer state-of-the-art model with 12B parameters by 5 BLEU points, which indicates a promising scaling direction. Read More
#training