Reconstructing visual experiences from human brain activity offers a unique way to understand how the brain represents the world, and to interpret the connection between computer vision models and our visual system. While deep generative models have recently been employed for this task, reconstructing realistic images with high semantic fidelity is still a challenging problem. Here, we propose a new method based on a diffusion model (DM) to reconstruct images from human brain activity obtained via functional magnetic resonance imaging (fMRI). More specifically, we rely on a latent diffusion model (LDM) termed Stable Diffusion. This model reduces the computational cost of DMs, while preserving their high generative performance. We also characterize the inner mechanisms of the LDM by studying how its different components (such as the latent vector of image Z, conditioning inputs C, and different elements of the denoising U-Net) relate to distinct brain functions. We show that our proposed method can reconstruct high-resolution images with high fidelity in straight-forward fashion, without the need for any additional training and fine-tuning of complex deep-learning models. We also provide a quantitative interpretation of different LDM components from a neuroscientific perspective. Overall, our study proposes a promising method for reconstructing images from human brain activity, and provides a new framework for understanding DMs. Please check out our webpage at https://sites.google.com/view/stablediffusion-with-brain/. — Read More
Tag Archives: Human
AI unlikely to gain human-like cognition, unless connected to real world through robots
Connecting artificial intelligence systems to the real world through robots and designing them using principles from evolution is the most likely way AI will gain human-like cognition, according to research from the University of Sheffield.
In a paper published in Science Robotics, Professor Tony Prescott and Dr Stuart Wilson from the University’s Department of Computer Science, say that AI systems are unlikely to resemble real brain processing no matter how large their neural networks or the datasets used to train them might become, if they remain disembodied. — Read More
I-JEPA: The first AI model based on Yann LeCun’s vision for more human-like AI
Last year, Meta’s Chief AI Scientist Yann LeCun proposed a new architecture intended to overcome key limitations of even the most advanced AI systems today. His vision is to create machines that can learn internal models of how the world works so that they can learn much more quickly, plan how to accomplish complex tasks, and readily adapt to unfamiliar situations.
We’re excited to introduce the first AI model based on a key component of LeCun’s vision. This model, the Image Joint Embedding Predictive Architecture (I-JEPA), learns by creating an internal model of the outside world, which compares abstract representations of images (rather than comparing the pixels themselves). I-JEPA delivers strong performance on multiple computer vision tasks, and it’s much more computationally efficient than other widely used computer vision models. The representations learned by I-JEPA can also be used for many different applications without needing extensive fine tuning. For example, we train a 632M parameter visual transformer model using 16 A100 GPUs in under 72 hours, and it achieves state-of-the-art performance for low-shot classification on ImageNet, with only 12 labeled examples per class. Other methods typically take two to 10 times more GPU-hours and achieve worse error rates when trained with the same amount of data.
Our paper on I-JEPA will be presented at CVPR 2023 next week, and we’re also open-sourcing the training code and model checkpoints today. — Read More
Geoffrey Hinton – Two Paths to Intelligence
Allen Institute teams up with AWS to build first-ever map of the brain
Just as the periodic table is foundational to chemistry and the Human Genome Project revolutionized modern genetics, researchers at the Allen Institute for Brain Science have teamed up with Amazon Web Services to create what could become a “transformative” new resource for the field of neuroscience.
AWS on Wednesday announced its technology will support the Allen Institute as it builds a map of the human brain, called the Brain Knowledge Platform. This platform, the first of its kind, is designed to be a complete reference of individual cells in the brain, and should eventually serve as the world’s largest open-source brain cell database. — Read More
Brain implants help paralysed man to walk again
A paralysed man has been able to walk simply by thinking about it thanks to electronic brain implants, a medical first he says has changed his life.
Gert-Jan Oskam, a 40-year-old Dutch man, was paralysed in a cycling accident 12 years ago.
The electronic implants wirelessly transmit his thoughts to his legs and feet via a second implant on his spine. — Read More
Read the Paper
A Brain Scanner Combined with an AI Language Model Can Provide a Glimpse into Your Thoughts
New technology gleans the gist of stories a person hears while laying in a brain scanner
Functional magnetic resonance imaging (fMRI) captures coarse, colorful snapshots of the brain in action. While this specialized type of magnetic resonance imaging has transformed cognitive neuroscience, it isn’t a mind-reading machine: neuroscientists can’t look at a brain scan and tell what someone was seeing, hearing or thinking in the scanner.
But gradually scientists are pushing against that fundamental barrier to translate internal experiences into words using brain imaging. This technology could help people who can’t speak or otherwise outwardly communicate such as those who have suffered strokes or are living with amyotrophic lateral sclerosis. Current brain-computer interfaces require the implantation of devices in the brain, but neuroscientists hope to use non-invasive techniques such as fMRI to decipher internal speech without the need for surgery
Now researchers have taken a step forward by combining fMRI’s ability to monitor neural activity with the predictive power of artificial intelligence language models. The hybrid technology has resulted in a decoder that can reproduce, with a surprising level of accuracy, the stories that a person listened to or imagined telling in the scanner. The decoder could even guess the story behind a short film that someone watched in the scanner, though with less accuracy. Read More
GPT-4 gets a B on my quantum computing final exam!
As I’ve mentioned before, economist, blogger, and friend Bryan Caplan was unimpressed when ChatGPT got merely a D on his Labor Economics midterm. So on Bryan’s blog, appropriately named “Bet On It,” he made a public bet that no AI would score on A on his exam before January 30, 2029. GPT-4 then scored an A a mere three months later (!!!), leading to what Bryan agrees will likely be one of the first public bets he’ll ever have to concede (he hasn’t yet “formally” conceded, but only because of technicalities in how the bet was structured).
.. But OK, labor econ is one thing. What about a truly unfakeable test of true intelligence? Like, y’know, a quantum computing test? Read More
‘Mind-reading’ AI: Japan study sparks ethical debate
Osaka University researchers have used AI to decode subjects’ brain activity to create images of what they are seeing.
Yu Takagi could not believe his eyes. …After Takagi and his research partner Shinji Nishimoto built a simple model to “translate” brain activity into a readable format, Stable Diffusion was able to generate high-fidelity images that bore an uncanny resemblance to the originals.
The AI could do this despite not being shown the pictures in advance or trained in any way to manufacture the results. Read More
Read the Paper
Generative Agents: Interactive Simulacra of Human Behavior
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents–computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent’s experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine’s Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture–observation, planning, and reflection–each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior. Read More