Mechanistic Interpretibility Resources | Harshwardhan Sanjay Fartale

Mechanistic interpretability aims to reverse-engineer a neural network into human-understandable mechanisms. MI focuses on transformers (specifically LLMs) but is not limited to these neural network architectures

People

Primer on LLMs

Transformers

Quick Guides to MI

How to get started with MI ?

Relevant Papers

Straight from Anthropic

Blogs

Libraries

Enjoy Reading This Article?