Library
Curated reading list with quick filters and links to notes.
A living bookshelf of papers, kernels, and system guides that influence my research. Use the search box to filter by title, author, venue, or tag.
Library
Attention Is All You Need
Introduces the Transformer architecture, demonstrating that self-attention alone can outperform recurrent and convolutional models for sequence transduction.
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Proposes an IO-aware tiled attention kernel that reduces memory traffic and speeds up training while remaining exact.
Efficient Attention Mechanisms for Large Language Models: A Survey
Surveys the design space of efficient attention variants for LLMs, covering algorithmic approaches and hardware implications.
Bookshelf
Whatever I am, I am because of them
Library
Deep Learning for Vision Systems
ViewAI Engineering
View
Deep Learning Foundations and Concepts
ViewBuild a Large Language Model (From Scratch)
ViewEnsemble Methods for Machine Learning
View
A Simple Guide to Retrieval Augmented Generation
ViewMachine Learning for Tabular Data
View
Math and Architectures of Deep Learning
ViewInside Deep Learning Math, Algorithms, Models
View
Getting Started with Natural Language Processing
ViewNatural Language Processing in Action
ViewHands-On Large Language Models
ViewDesigning Large Language Model Applications
ViewHow Large Language Models Work
ViewLLMs in Production
View
GPU Programming with C++ and CUDA
View