My AI Journey
Just got done with my GATE Exam (Phew!) Onwards to a journey about Learning everything about Natural Language Processing to LLMs with a garnish of Reinforcement Learning (thanks deepseek)
All of it in the following order
NLP Roadmap
- NLP Learning Roadmap
- Prerequisites
- Mathematics
- Linear algebra
- Probability and statistics
- Programming
- Proficiency in a programming language (e.g., Python)
- Mathematics
- Introduction to NLP
- Definition and scope of NLP
- Historical development of NLP
- Key challenges and applications
- Text Analysis
- Lexical Analysis
- Word meaning and structure
- Morphology (word formation)
- Lemmatization (base form identification)
- Syntactic Analysis
- Parts-of-speech tagging
- Dependency parsing
- Constituency parsing
- Semantic Analysis
- Extracting meaning
- Word embedding models (e.g., Word2Vec, GloVe)
- Topic modeling
- Additional Semantic Analysis
- Coreference resolution
- Discourse analysis
- Lexical Analysis
- Text Processing
- Tokenization
- Sentence tokenization
- Word tokenization
- Subword tokenization (Byte Pair Encoding, SentencePiece)
- Stop Words Removal
- Importance and impact on NLP tasks
- Customizing stop word lists
- Stemming and Lemmatization
- Porter stemming algorithm
- Snowball stemming algorithm
- Lemmatization techniques and challenges
- Part-of-Speech Tagging
- POS tagging algorithms (HMM-based, rule-based, neural-based)
- Fine-grained POS tagging
- Tokenization
- Text Representation
- Bag of Words (BoW)
- Term Frequency (TF) and Inverse Document Frequency (IDF)
- Bag of N-grams
- TF-IDF
- Calculating TF-IDF scores
- Applications in information retrieval
- Word Embeddings
- Word2Vec
- Continuous Bag of Words (CBOW) model
- Skip-gram model
- GloVe
- Global Vectors for Word Representation
- Word2Vec
- Contextual Embeddings
- ELMo (Embeddings from Language Models)
- ULMFiT (Universal Language Model Fine-tuning)
- OpenAI GPT (Generative Pre-trained Transformer)
- Bag of Words (BoW)
- NLP Libraries and Tools
- NLTK (Natural Language Toolkit)
- SpaCy
- scikit-learn
- Transformers library (Hugging Face)
- Statistical Language Models
- N-grams
- Unigrams, bigrams, and trigrams
- N-gram language models
- Hidden Markov Models (HMM)
- Basics of HMMs
- Applications in part-of-speech tagging
- N-grams
- Machine Learning for NLP
- Supervised Learning
- Text classification algorithms (Naive Bayes, Support Vector Machines)
- Evaluation metrics (precision, recall, F1-score)
- Named Entity Recognition (NER)
- Rule-based NER
- Machine learning-based NER
- Evaluation metrics for NER
- Sentiment Analysis
- Sentiment lexicons
- Machine learning approaches for sentiment analysis
- Supervised Learning
- Sequence-to-Sequence Models
- Recurrent Neural Networks (RNN)
- Vanishing and exploding gradient problems
- Bidirectional RNNs
- Long Short-Term Memory (LSTM)
- Architecture and key components
- Gating mechanisms
- Gated Recurrent Unit (GRU)
- Simplified gating compared to LSTM
- Applications and advantages
- Recurrent Neural Networks (RNN)
- Deep Learning Architectures for NLP
- Convolutional Neural Networks (CNN) for Text
- Text classification with CNNs
- Hierarchical and multi-channel CNNs
- Transfer Learning in NLP
- Fine-tuning pre-trained models
- Universal Sentence Encoder
- Transformer Architecture
- Self-attention mechanism
- Multi-head attention
- Positional encoding
- Convolutional Neural Networks (CNN) for Text
- Transduction and Recurrency
- Transduction in NLP
- Definition and applications
- Challenges in sequence-to-sequence transduction
- Recurrent Neural Networks (RNN)
- Applications beyond sequence-to-sequence tasks
- Challenges in training RNNs
- Transduction in NLP
- Advanced Topics in Sequence Modeling
- Attention Mechanism
- Scaled Dot-Product Attention
- Position-wise Feedforward Networks
- Self-Attention Mechanism
- The concept of self-attention
- Layer normalization in self-attention
- Multi-Head Attention
- Motivation and benefits
- Combining multiple attention heads
- Attention Mechanism
- Syntax and Parsing
- Dependency Parsing
- Dependency tree representation
- Transition-based and graph-based parsing
- Constituency Parsing
- Treebank representation
- Earley parsing algorithm
- Parsing Techniques
- Chart parsing (CYK parser)
- Shift-Reduce parsing
- Dependency Parsing
- Semantic Role Labeling (SRL) and Coreference Resolution
- Semantic Role Labeling
- PropBank and FrameNet
- Neural approaches to SRL
- Coreference Resolution
- Mention detection
- End-to-end coreference resolution models
- Semantic Role Labeling
- Evaluation Metrics
- Precision, Recall, F1-score
- BLEU score for machine translation
- Perplexity for language models
- NLP in Industry and Research
- Case studies and applications in various domains (healthcare, finance, legal, etc.)
- Emerging research trends in NLP
- Ethical Considerations and Bias in NLP
- Addressing Bias in NLP Models
- Identifying and mitigating biases in training data
- Fairness-aware machine learning
- Ethical Considerations in NLP Research and Deployment
- Privacy concerns in NLP
- Responsible AI practices in NLP
- Addressing Bias in NLP Models
- Continuous Learning and Keeping Updated
- Follow conferences (ACL, NAACL, EMNLP)
- Engage with the NLP community
- Explore recent research papers and advancements (Arxiv, NeurIPS)
- Projects and Hands-on Practice
- Apply knowledge through practical projects
- Contribute to open-source NLP projects
- Participate in Kaggle competitions
- Prerequisites
LLM Roadmap
–Coming Soon–
Enjoy Reading This Article?
Here are some more articles you might like to read next: