Mechanistic Interpretability
Stitching Sparse Autoencoders of Different Sizes
SAE Stitching Stitching SAE Sparsity Autoencoders Latents Mechanistic InterpretabilityPatrick Leask and Noura Al Moubayed introduce SAE stitching, a new method for mechanistic intepretability, in a poster at NeurIPS 2024.
Latent mechanistic interpretability
Mechanistic interpretability Benchmarking SAEThis project seeks to provide a benchmark for evaluating the extent to which a model is mechanistically interpretable
BatchTopK Sparse Autoencoders
BatchTopK SAE Sparsity Autoencoders Mechanistic Interpretability ArchitecturePatrick Leask contributes to BatchTopK, a new SAE architecture introduced in a NeurIPS'24 poster.
Worldbuilding
Worldbuilding Technology Design Intelligence Mechanistic interpretabilityEleanor Dare creates an intra-active arts installation for the the 4th International Conference on Possibility Studies