Mechanistic Interpretability

Stitching Sparse Autoencoders of Different Sizes

SAE Stitching Stitching SAE Sparsity Autoencoders Latents Mechanistic Interpretability

Patrick Leask and Noura Al Moubayed introduce SAE stitching, a new method for mechanistic intepretability, in a poster at NeurIPS 2024.

Latent mechanistic interpretability

Mechanistic interpretability Benchmarking SAE

This project seeks to provide a benchmark for evaluating the extent to which a model is mechanistically interpretable

BatchTopK Sparse Autoencoders

BatchTopK SAE Sparsity Autoencoders Mechanistic Interpretability Architecture

Patrick Leask contributes to BatchTopK, a new SAE architecture introduced in a NeurIPS'24 poster.

Worldbuilding

Worldbuilding Technology Design Intelligence Mechanistic interpretability

Eleanor Dare creates an intra-active arts installation for the the 4th International Conference on Possibility Studies