|
|
|
|
|
linguist.page@gmail.com
Home
»
Computational Linguistics
»
Machine Learning
Foundations of ML
(9)
What is ML? (vs. traditional programming)
Types: Supervised, Unsupervised, Semi-supervised, Self-supervised, Reinforcement
Training, Validation, Test sets
Overfitting & Underfitting
Bias-Variance tradeoff
Feature engineering
Feature scaling (normalization, standardization)
One-hot encoding
Data augmentation
Classical ML Algorithms
(13)
Linear Regression
Logistic Regression
Naive Bayes
k-Nearest Neighbors (k-NN)
Support Vector Machines (SVM)
Decision Trees
Random Forests
Gradient Boosting (XGBoost, LightGBM)
k-Means Clustering
Hierarchical Clustering
Expectation-Maximization (EM)
Hidden Markov Models (HMM)
Conditional Random Fields (CRF)
Model Evaluation
(7)
Accuracy, Precision, Recall, F1-Score
Macro vs. Micro vs. Weighted averages
Confusion matrix
ROC curve & AUC
Cross-validation (k-fold)
Train/test split strategies
Learning curves
Neural Networks & Deep Learning
(36)
Foundations
(11)
Biological inspiration (neuron analogy)
Perceptron
Activation functions (sigmoid, tanh, ReLU, GELU, softmax)
Multi-layer perceptron (MLP)
Forward propagation
Loss functions (MSE, cross-entropy, NLL)
Backpropagation (chain rule in action)
Weight initialization
Batch normalization
Dropout regularization
Epochs, batches, iterations
Architectures Relevant to NLP
(17)
Convolutional Neural Networks (CNN) — for text classification
Recurrent Neural Networks (RNN)
Vanishing / Exploding gradient problem
Long Short-Term Memory (LSTM)
Gated Recurrent Unit (GRU)
Bidirectional RNNs
Sequence-to-Sequence (Seq2Seq) models
Encoder-Decoder architecture
Attention mechanism (Bahdanau, Luong)
Self-attention
Multi-head attention
Positional encoding
The Transformer architecture
BERT & variants (RoBERTa, ALBERT, DistilBERT)
GPT & autoregressive language models
T5, BART (seq2seq transformers)
Large Language Models (LLMs) — architecture level
Training Deep Models
(8)
Transfer learning & fine-tuning
Pre-training objectives (MLM, CLM, NSP)
Learning rate scheduling
Mixed precision training
Gradient clipping
Early stopping
Model checkpointing
Distributed training concepts