SAE

Latent mechanistic interpretability

Jul 16, 2025 Mechanistic interpretability Benchmarking SAE

This project seeks to provide a benchmark for evaluating the extent to which a model is mechanistically interpretable

Stitching Sparse Autoencoders of Different Sizes

Dec 1, 2024 SAE Stitching Stitching SAE Sparsity Autoencoders Latents Mechanistic Interpretability

Patrick Leask and Noura Al Moubayed introduce SAE stitching, a new method for mechanistic intepretability, in a poster at NeurIPS 2024.

BatchTopK Sparse Autoencoders

Dec 1, 2024 BatchTopK SAE Sparsity Autoencoders Mechanistic Interpretability Architecture

Patrick Leask contributes to BatchTopK, a new SAE architecture introduced in a NeurIPS'24 poster.