My AI Journey | Harshwardhan Sanjay Fartale

Just got done with my GATE Exam (Phew!) Onwards to a journey about Learning everything about Natural Language Processing to LLMs with a garnish of Reinforcement Learning (thanks deepseek)

All of it in the following order

NLP Roadmap

NLP Learning Roadmap
- Prerequisites
  - Mathematics
    - Linear algebra
    - Probability and statistics
  - Programming
    - Proficiency in a programming language (e.g., Python)
- Introduction to NLP
  - Definition and scope of NLP
  - Historical development of NLP
  - Key challenges and applications
- Text Analysis
  - Lexical Analysis
    - Word meaning and structure
    - Morphology (word formation)
    - Lemmatization (base form identification)
  - Syntactic Analysis
    - Parts-of-speech tagging
    - Dependency parsing
    - Constituency parsing
  - Semantic Analysis
    - Extracting meaning
    - Word embedding models (e.g., Word2Vec, GloVe)
    - Topic modeling
  - Additional Semantic Analysis
    - Coreference resolution
    - Discourse analysis
- Text Processing
  - Tokenization
    - Sentence tokenization
    - Word tokenization
    - Subword tokenization (Byte Pair Encoding, SentencePiece)
  - Stop Words Removal
    - Importance and impact on NLP tasks
    - Customizing stop word lists
  - Stemming and Lemmatization
    - Porter stemming algorithm
    - Snowball stemming algorithm
    - Lemmatization techniques and challenges
  - Part-of-Speech Tagging
    - POS tagging algorithms (HMM-based, rule-based, neural-based)
    - Fine-grained POS tagging
- Text Representation
  - Bag of Words (BoW)
    - Term Frequency (TF) and Inverse Document Frequency (IDF)
    - Bag of N-grams
  - TF-IDF
    - Calculating TF-IDF scores
    - Applications in information retrieval
  - Word Embeddings
    - Word2Vec
      - Continuous Bag of Words (CBOW) model
      - Skip-gram model
    - GloVe
      - Global Vectors for Word Representation
  - Contextual Embeddings
    - ELMo (Embeddings from Language Models)
    - ULMFiT (Universal Language Model Fine-tuning)
    - OpenAI GPT (Generative Pre-trained Transformer)
- NLP Libraries and Tools
  - NLTK (Natural Language Toolkit)
  - SpaCy
  - scikit-learn
  - Transformers library (Hugging Face)
- Statistical Language Models
  - N-grams
    - Unigrams, bigrams, and trigrams
    - N-gram language models
  - Hidden Markov Models (HMM)
    - Basics of HMMs
    - Applications in part-of-speech tagging
- Machine Learning for NLP
  - Supervised Learning
    - Text classification algorithms (Naive Bayes, Support Vector Machines)
    - Evaluation metrics (precision, recall, F1-score)
  - Named Entity Recognition (NER)
    - Rule-based NER
    - Machine learning-based NER
    - Evaluation metrics for NER
  - Sentiment Analysis
    - Sentiment lexicons
    - Machine learning approaches for sentiment analysis
- Sequence-to-Sequence Models
  - Recurrent Neural Networks (RNN)
    - Vanishing and exploding gradient problems
    - Bidirectional RNNs
  - Long Short-Term Memory (LSTM)
    - Architecture and key components
    - Gating mechanisms
  - Gated Recurrent Unit (GRU)
    - Simplified gating compared to LSTM
    - Applications and advantages
- Deep Learning Architectures for NLP
  - Convolutional Neural Networks (CNN) for Text
    - Text classification with CNNs
    - Hierarchical and multi-channel CNNs
  - Transfer Learning in NLP
    - Fine-tuning pre-trained models
    - Universal Sentence Encoder
  - Transformer Architecture
    - Self-attention mechanism
    - Multi-head attention
    - Positional encoding
- Transduction and Recurrency
  - Transduction in NLP
    - Definition and applications
    - Challenges in sequence-to-sequence transduction
  - Recurrent Neural Networks (RNN)
    - Applications beyond sequence-to-sequence tasks
    - Challenges in training RNNs
- Advanced Topics in Sequence Modeling
  - Attention Mechanism
    - Scaled Dot-Product Attention
    - Position-wise Feedforward Networks
  - Self-Attention Mechanism
    - The concept of self-attention
    - Layer normalization in self-attention
  - Multi-Head Attention
    - Motivation and benefits
    - Combining multiple attention heads
- Syntax and Parsing
  - Dependency Parsing
    - Dependency tree representation
    - Transition-based and graph-based parsing
  - Constituency Parsing
    - Treebank representation
    - Earley parsing algorithm
  - Parsing Techniques
    - Chart parsing (CYK parser)
    - Shift-Reduce parsing
- Semantic Role Labeling (SRL) and Coreference Resolution
  - Semantic Role Labeling
    - PropBank and FrameNet
    - Neural approaches to SRL
  - Coreference Resolution
    - Mention detection
    - End-to-end coreference resolution models
- Evaluation Metrics
  - Precision, Recall, F1-score
  - BLEU score for machine translation
  - Perplexity for language models
- NLP in Industry and Research
  - Case studies and applications in various domains (healthcare, finance, legal, etc.)
  - Emerging research trends in NLP
- Ethical Considerations and Bias in NLP
  - Addressing Bias in NLP Models
    - Identifying and mitigating biases in training data
    - Fairness-aware machine learning
  - Ethical Considerations in NLP Research and Deployment
    - Privacy concerns in NLP
    - Responsible AI practices in NLP
- Continuous Learning and Keeping Updated
  - Follow conferences (ACL, NAACL, EMNLP)
  - Engage with the NLP community
  - Explore recent research papers and advancements (Arxiv, NeurIPS)
- Projects and Hands-on Practice
  - Apply knowledge through practical projects
  - Contribute to open-source NLP projects
  - Participate in Kaggle competitions

LLM Roadmap

–Coming Soon–

NLP Roadmap

LLM Roadmap

Enjoy Reading This Article?