Zobrazeno 1 - 2
of 2
pro vyhledávání: '"McDougall, Callum"'
We present a single attention head in GPT-2 Small that has one main role across the entire training distribution. If components in earlier layers predict a certain token, and this token appears earlier in the context, the head suppresses it: we call
Externí odkaz:
http://arxiv.org/abs/2310.04625
Publikováno v:
BBC Sky at Night; Sep2016, Issue 136, p27-27, 3/4p, 4 Color Photographs