Invented in 2017 and first presented in the ground-breaking paper “Attention is All You Need”(Vaswani et al. 2017), the transformer model has been a revolutionary contribution to deep learning and arguably, to computer science as a whole. Born as a tool for neural machine translation, it has proven to be far-reaching, extending its applicability beyond Natural Language Processing (NLP) and cementing its position as a versatile and general-purpose neural network architecture.
In this comprehensive guide, we will dissect the transformer model to its core, thoroughly exploring every key component from its attention mechanism to its encoder-decoder structure. Not stopping at the foundational level, we will traverse the landscape of large language models that leverage the power of the transformer, delving into their unique design attributes and functionalities. Further expanding the horizons, we will explore the applications of transformer models beyond NLP and probe into the current challenges and potential future directions of this influential architecture. Additionally, a curated list of open-source implementations and supplementary resources will be provided for those intrigued to explore further.
Without bells and whistles, let’s dive in! — Read More
Monthly Archives: July 2023
Reconstructing the Mind’s Eye: fMRI-to-image with Contrastive Learning and Diffusion Priors
We present MindEye, a novel fMRI-to-image approach to retrieve and reconstruct viewed images from brain activity. Our model comprises two parallel submodules that are specialized for retrieval (using contrastive learning) and reconstruction (using a diffusion prior). MindEye can map fMRI brain activity to any high dimensional multimodal latent space, like CLIP image space, enabling image reconstruction using generative models that accept embeddings from this latent space. We comprehensively compare our approach with other existing methods, using both qualitative side-by-side comparisons and quantitative evaluations, and show that MindEye achieves state-of-the-art performance in both reconstruction and retrieval tasks. In particular, MindEye can retrieve the exact original image even among highly similar candidates indicating that its brain embeddings retain fine-grained image-specific information. This allows us to accurately retrieve images even from large-scale databases like LAION-5B. We demonstrate through ablations that MindEye’s performance improvements over previous methods result from specialized submodules for retrieval and reconstruction, improved training techniques, and training models with orders of magnitude more parameters. Furthermore, we show that MindEye can better preserve low-level image features in the reconstructions by using img2img, with outputs from a separate autoencoder. — Read More
Read the Paper
Speaking robot: Our new AI model translates vision and language into robotic actions
For decades, when people have imagined the distant future, they’ve almost always included a starring role for robots. Robots have been cast as dependable, helpful and even charming. Yet across those same decades, the technology has remained elusive — stuck in the imagined realm of science fiction.
Today, we’re introducing a new advancement in robotics that brings us closer to a future of helpful robots. Robotics Transformer 2, or RT-2, is a first-of-its-kind vision-language-action (VLA) model. A Transformer-based model trained on text and images from the web, RT-2 can directly output robotic actions. Just like language models are trained on text from the web to learn general ideas and concepts, RT-2 transfers knowledge from web data to inform robot behavior.
In other words, RT-2 can speak robot. — Read More
A Silent New AI Bombshell Launch Nobody Saw Coming
Would you use a (great) free AI product that makes you the product?
Meta’s pulling out the big guns. LLaMA 2, their shiny new AI, is now open-source. Free for anyone. And I mean anyone. Your grandma, your dog, even your weird neighbor who still uses a flip phone.
But why?
Is it a noble quest for democratizing AI? Or a desperate attempt to catch up with the cool kids, Microsoft and Google? — Read More
‘World Of Warcraft’ Players Trick AI-Scraping Games Website Into Publishing Nonsense
As someone who writes about video games for a living, I am deeply annoyed/terrified about the prospect of AI-run websites not necessarily replacing me, but doing things like at the very least, crowding me out of Google, given that Google does not seem to care whatsoever whether content is AI-generated or not.
That’s why it’s refreshing to see a little bit of justice dished out in a very funny way from a gaming community. The World of Warcraft subreddit recently realized that a website, zleague.gg (I am not linking to it), which runs a blog attached to some of sort of gaming app which is its main business, has been scraping reddit threads, feeding them through an AI and summarizing them with “key takeaways” and regurgitated paragraphs that all follow the same format. It’s gross, and yet it generates an article long enough with enough keywords to show up on Google.
Well, the redditors got annoyed and decided to mess with the bots. On r/WoW, they made a lengthy thread discussing the arrival of Glorbo in the game, a new feature that, as you may be able to guess from the name, is not real. — Read More
Major generative AI players join to create the Frontier Model Forum
Google, OpenAI, Microsoft, and Anthropic will be the founding members of the Frontier Model Forum, an umbrella group for the generative AI industry. The group plans to focus on safety research, as well as the identification of best practices, public policy, and use cases for the rapidly advancing technology that can benefit society as a whole.
According to a statement issued by the four companies Wednesday, the Forum will offer membership to organizations that design and develop large-scale generative AI tools and platforms that push the boundaries of what’s currently possible in the field. — Read More
The movement to limit face recognition tech might finally get a win
A Massachusetts bill restricting police use could set the standard for how the technology is regulated in America. If it fails, it’ll be a blow to a once-promising movement.
Just four years ago, the movement to ban police departments from using face recognition in the US was riding high. By the end of 2020, around 18 cities had enacted laws forbidding the police from adopting the technology. US lawmakers proposed a pause on the federal government’s use of the tech.
In the years since, that effort has slowed to a halt.
… However, in Massachusetts there is hope for those who want to restrict police access to face recognition. The state’s lawmakers are currently thrashing out a bipartisan state bill that seeks to limit police use of the technology. Although it’s not a full ban, it would mean that only state police could use it, not all law enforcement agencies. — Read More
No More Paperwork? Amazon AI Tool Transcribes Patient Visits for Doctors
Amazon’s AWS division today unveiled a new AI and speech-recogition tool intended to help doctors enter patient visit notes into their systems.
For now, AWS HealthScribe is only available as a preview in Northern Virginia (home of Amazon HQ2). But it promises to generate transcripts with “word-level timestamps” of patient visits, and automatically “identifies speaker roles, like patient and clinician, for each dialogue in the transcript,” Amazon says. — Read More
AI Machine Learning: Remedies Other Than Copyright Law?
In my last post, I discussed some of the allegations that “machine learning” (ML) with the use of copyrighted works constitutes mass infringement. Citing the class action lawsuits Andersen and Tremblay, I predicted that if the courts do not find that ML unavoidably violates the reproduction right (§106(1)), copyright law may not offer much relief to the creators of the works used for AI development. As of last week, it remains to be seen whether we’ll get to that question after Judge Orrick of the Northern District of California stated that he is tentatively prepared to dismiss the suit with leave to amend the complaint. The judge did indicate that a claim of direct infringement could survive, but we’ll have to see what comes of an amended complaint.
As mentioned in the last post, if the court does not find a valid claim of copyright infringement, the other allegations will likely fail as a result. Nevertheless, though the state allegations may be moot in the class cases filed thus far, I had intended in this post to look at whether any non-copyright remedies present much hope for creators. For instance, the Andersen complaint alleges violations of statutory and common law rights of publicity and violations of statutory unfair practice prohibitions in the State of California. — Read More
The AI-Powered, Totally Autonomous Future of War Is Here
Ships without crews. Self-directed drone swarms. How a US Navy task force is using off-the-shelf robotics and artificial intelligence to prepare for the next age of conflict.
A fleet of robot ships bobs gently in the warm waters of the Persian Gulf, somewhere between Bahrain and Qatar, maybe 100 miles off the coast of Iran. I am on the nearby deck of a US Coast Guard speedboat, squinting off what I understand is the port side. On this morning in early December 2022, the horizon is dotted with oil tankers and cargo ships and tiny fishing dhows, all shimmering in the heat. As the speedboat zips around the robot fleet, I long for a parasol, or even a cloud.
The robots do not share my pathetic human need for shade, nor do they require any other biological amenities. This is evident in their design. A few resemble typical patrol boats like the one I’m on, but most are smaller, leaner, lower to the water. One looks like a solar-powered kayak. Another looks like a surfboard with a metal sail. Yet another reminds me of a Google Street View car on pontoons. — Read More