Attention and Transformer blocks (step 2/7) · cnns, transformers, and useful llm internals · promptdojo

Stripped of the math, what does an attention layer actually compute for a token?

Stripped of the math, what does an attention layer actually compute for a token?