Within this scope...
subprojects
timeline
Conference poster
16 Jul 2025Inference-Time Decomposition of Activations (ITDA): A Scalable Approach to Interpreting Large Language Models
Public symposium
2 Jul 2025Interpreting intelligent machines?
Paper & Poster
24 Apr 2025Sparse Autoencoders Do Not Find Canonical Units of Analysis
Conference poster
15 Dec 2024BatchTopK Sparse Autoencoders
Conference poster
15 Dec 2024Stitching Sparse Autoencoders of Different Sizes
Analysis
17 Aug 2024Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Workshop
6 May 2024AI Forensics IRL Meeting
Preprint
1 Nov 2023CoinRun: Solving Goal Misgeneralisation