promptdojo_

Attention and Transformer blocks — step 2 of 7

Stripped of the math, what does an attention layer actually compute for a token?