Post (Analysis)
2024-08-17
Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask, Bart Bussmann, and Neel Nanda take a close look at GPT-2’s SAE feature geometry on the AI Alignment Forum
AI Alignment Forum / MATS Program / LessWrong
TL;DR: We demonstrate that the decoder directions of GPT-2 SAEs are highly structured by finding a historical date direction onto which projecting non-date related features lets us read off their historical time period by comparison to year features.