SAE
Stitching Sparse Autoencoders of Different Sizes
SAE Stitching Stitching SAE Sparsity Autoencoders Latents Mechanistic InterpretabilityPatrick Leask and Noura Al Moubayed introduce SAE stitching, a new method for mechanistic intepretability, in a poster at NeurIPS 2024.
Latent mechanistic interpretability
Mechanistic interpretability Benchmarking SAEThis project seeks to provide a benchmark for evaluating the extent to which a model is mechanistically interpretable
BatchTopK Sparse Autoencoders
BatchTopK SAE Sparsity Autoencoders Mechanistic Interpretability ArchitecturePatrick Leask contributes to BatchTopK, a new SAE architecture introduced in a NeurIPS'24 poster.