Attention Wasn’t All We Needed

There’s a lot of modern techniques that have been developed since the original Attention Is All You Need paper. Let’s look at some of the most important ones that have been developed over the years and try to implement the basic ideas as succinctly as possible. We’ll use the Pytorch framework for most of the examples. Note that most of these examples are highly simplified sketches of the core ideas, if you want the full implementation please read the original paper or the production code in frameworks like PyTorch or Jax.

  1. Group Query Attention
  2. Multi-head Latent Attention
  3. Flash Attention
  4. Ring Attention
  5. Pre-normalization
  6. RMSNorm
  7. SwiGLU
  8. Rotary Positional Embedding
  9. Mixture of Experts
  10. Learning Rate Warmup
  11. Cosine Schedule
  12. AdamW Optimizer
  13. Multi-token Prediction
  14. Speculative Decoding

Read More

#devops

The Man Who ‘A.G.I.-Pilled’ Google

A few years ago, most Google executives didn’t talk about A.G.I. — artificial general intelligence, the industry term for a human-level A.I. system. Even if they thought A.G.I. might be technically possible, the idea seemed so remote that it was barely worth discussing.

But this week, at Google’s annual developer conference, A.G.I. was in the air. The company announced a slate of new releases tied to Google’s Gemini A.I. models, including new features designed to let users write A.I.-generated emails, create A.I.-generated videos and songs, and chat with an A.I. bot on the flagship search engine. Google’s leaders traded guesses about when more powerful systems might arrive. And they predicted profound changes ahead, as A.I. tools become more capable and autonomous.

The man most responsible for making Google “A.G.I.-pilled” — industry shorthand for the way people can become gripped by the notion that A.G.I. is imminent — is Demis Hassabis.

… This week on “Hard Fork,” we interviewed Mr. Hassabis about his views on A.G.I. and the strange futures that might follow its arrival. You can listen to our conversation by clicking the “Play” button below or by following the show on AppleSpotifyAmazonYouTubeiHeartRadio or wherever you get your podcasts. Or, if you prefer to read, you’ll find an edited transcript of our conversation, which begins about 24 minutes into the podcast, below. — Read More

#big7