|
|
|
|
|
linguist.page@gmail.com
Home
»
Computational Linguistics
Disclaimer
✓
Computing Fundamentals
(79)
What Is a Computer?
✓
(4)
Data vs Information vs Knowledge
✓
Analog vs Digital
✓
Hardware vs Software
✓
Input / Process / Output / Storage
✓
Number Systems & Data Representation
(11)
Decimal System
✓
Binary System
✓
Octal System
✓
Hexadecimal System
✓
Conversions Between Systems
✓
Binary Arithmetic
✓
Two's Complement
✓
Floating Point Representation (IEEE 754)
✓
ASCII - American Standard Code for Information Interchange
Unicode (UTF-8, UTF-16, UTF-32)
Encoding vs Decoding
Hardware Components
(9)
CPU (cores, clock speed, cache, ALU, CU)
RAM (volatile memory, addressing)
ROM
Storage (HDD, SSD, NVMe)
Motherboard & Bus Systems
GPU (why it matters for NLP/AI)
TPU (Tensor Processing Units)
Input/Output devices
Network Interface Card
Operating Systems
(10)
What an OS does
Kernel
Processes & Threads
Memory management
File systems (FAT32, NTFS, ext4)
Directories & Paths (absolute vs. relative)
Permissions & Users
System calls
Linux vs. Windows vs. macOS
Shell / Terminal concept
The Linux Command Line (CLI)
✓
(29)
Powering Off, Restarting, and Exiting
✓
Finding Your Bearings: Who, When, and Where Am I
✓
Getting Help: Manuals and Command Lookup
✓
Moving Around: Navigating the Filesystem
✓
Managing Files and Folders
✓
Reading File Contents
✓
Searching Text with grep
✓
Finding Files with find
✓
Counting, Sorting, and Deduplicating Lines
✓
Cutting, Pasting, and Splitting Files
✓
Editing Text in Place with sed
✓
Processing Fields with awk
✓
Comparing Files
✓
Character Encoding and Locale
✓
Piping, Redirection, and Chaining Commands
✓
Compressing and Extracting Archives
✓
Checking System Resources and Hardware
✓
Managing Processes
✓
Networking and Downloading Data
✓
Remote Access and File Transfer (SSH)
✓
Permissions, Ownership, and Superuser Access
✓
Installing Software (Package Managers)
✓
Environment Variables
✓
Scheduling, Timing, and Calendars
✓
Shell Scripting Basics
✓
Text Editors
✓
Python and Virtual Environments
✓
Running Scripts in the Background
✓
Version Control with Git
✓
Networks & the Internet
(9)
IP addresses (IPv4, IPv6)
DNS
HTTP / HTTPS
TCP/IP model
Client-Server model
APIs & REST
JSON & XML formats
How browsers work
Localhost & Ports
File Formats Relevant to NLP
(7)
Plain text (.txt)
CSV & TSV
JSON
XML / HTML
PDF (and its complexity)
CONLL format
Annotation formats (BIO, IOB2)
Mathematics
(90)
Foundations of Mathematics
(5)
Sets, subsets, union, intersection, complement
Relations & functions
Proof techniques (direct, contradiction, induction)
Logic (propositions, connectives, quantifiers)
Number types (natural, integer, rational, real, complex)
Arithmetic & Algebra Review
(7)
Fractions, percentages, ratios
Exponents & logarithms
Summation notation (Σ)
Product notation (Π)
Absolute value
Polynomials
Solving equations & inequalities
Discrete Mathematics
(7)
Graph theory (nodes, edges, directed/undirected, weighted)
Trees (binary trees, parse trees, dependency trees)
Paths, cycles, connected components
Combinatorics (permutations, combinations)
Pigeonhole principle
Recurrence relations
Formal languages & Automata
Linear Algebra
(15)
Scalars, vectors, matrices, tensors
Vector spaces
Vector operations (addition, scalar multiplication, dot product, cross product)
Matrix operations (addition, multiplication, transpose)
Identity matrix & inverse matrix
Determinants
Systems of linear equations (Gaussian elimination)
Eigenvalues & Eigenvectors
Singular Value Decomposition (SVD)
Principal Component Analysis (PCA)
Norms (L1, L2, Frobenius)
Cosine similarity
Orthogonality & projections
Span, basis, rank, nullity
Change of basis
Calculus
(11)
Functions & limits
Derivatives (definition, rules: chain, product, quotient)
Partial derivatives
Gradients & gradient vectors
The Jacobian matrix
The Hessian matrix
Integrals (definite & indefinite)
The chain rule (critical for backpropagation)
Multivariable calculus
Taylor series & approximation
Optimization basics (minima, maxima, saddle points)
Probability Theory
(17)
Sample spaces & events
Axioms of probability
Conditional probability
Independence
Joint, marginal, conditional distributions
Bayes' theorem
Random variables (discrete & continuous)
PMF, PDF, CDF
Expected value & variance
Common distributions
Law of Large Numbers
Central Limit Theorem
Entropy (Shannon)
Cross-entropy
KL Divergence
Mutual Information
Likelihood & log-likelihood
Statistics
(11)
Descriptive statistics (mean, median, mode, std, variance)
Inferential statistics
Hypothesis testing
p-values & significance
Confidence intervals
Correlation vs. causation
Regression (linear, logistic)
Maximum Likelihood Estimation (MLE)
Maximum A Posteriori (MAP)
Bayesian inference
Sampling methods
Information Theory
(8)
Bit as a unit of information
Entropy (H)
Conditional entropy
Mutual information
Information gain
Perplexity (key NLP metric)
Compression basics
Huffman coding concept
Optimization
(9)
Loss functions
Gradient descent (batch, stochastic, mini-batch)
Learning rate
Momentum
Adaptive methods (AdaGrad, RMSprop, Adam)
Convex vs. non-convex optimization
Regularization (L1 / Lasso, L2 / Ridge)
Lagrange multipliers
Constrained optimization
Programming
(89)
Programming Concepts (Language-Agnostic)
(14)
What is a program?
Source code vs. compiled code vs. interpreted code
Variables & data types
Operators
Control flow (if/else, switch)
Loops (for, while, do-while)
Functions / procedures
Scope & lifetime of variables
Recursion
Stack vs. Heap memory
Pointers & references (conceptually)
Error handling & exceptions
Debugging basics
Comments & documentation
Python (Primary Language for NLP)
(41)
Python Basics
(15)
Installation & environment setup
Python interpreter
Variables & dynamic typing
Data types: int, float, str, bool, None
String operations & formatting (f-strings)
Lists, Tuples, Sets, Dictionaries
List comprehensions
Slicing
Functions (def, return, *args, **kwargs)
Lambda functions
Built-in functions (map, filter, zip, enumerate, sorted)
Modules & imports
File I/O (open, read, write, with statement)
Exception handling (try, except, finally)
Iterators & generators
Intermediate Python
(15)
Classes & objects
Attributes & methods
Constructors (__init__)
Inheritance
Polymorphism
Encapsulation
Magic/dunder methods
Decorators
Context managers
Regular expressions (re module)
Functional programming concepts
Comprehensions (dict, set)
Type hints
Virtual environments (venv, conda)
Package management (pip)
Python for Data & NLP
(11)
NumPy (arrays, broadcasting, vectorized ops)
Pandas (DataFrames, Series, data manipulation)
Matplotlib & Seaborn (visualization)
Scikit-learn (ML pipeline, feature extraction)
NLTK (classic NLP toolkit)
spaCy (industrial NLP)
Gensim (topic modeling, word vectors)
Hugging Face Transformers (modern NLP)
PyTorch (deep learning framework)
TensorFlow / Keras (alternative deep learning)
Jupyter Notebooks
Data Structures
(8)
Arrays
Linked lists
Stacks
Queues
Hash tables / Dictionaries
Trees (binary, BST, trie)
Graphs (adjacency list, adjacency matrix)
Heaps
Algorithms
(13)
Time complexity (Big-O notation)
Space complexity
Searching (linear, binary)
Sorting (bubble, merge, quicksort)
Recursion & divide and conquer
Dynamic programming
Greedy algorithms
Graph traversal (BFS, DFS)
String matching (Naive, KMP, Boyer-Moore)
Edit distance / Levenshtein distance
Longest Common Subsequence
Viterbi algorithm
CYK / Earley parsing algorithms
Version Control
(6)
What is version control?
Git basics (init, add, commit, push, pull)
Branching & merging
GitHub / GitLab
.gitignore
Collaborative workflows
Software Engineering Basics
(7)
Modular programming
Writing clean, readable code
Unit testing (pytest)
Documentation (docstrings)
Working with APIs
JSON parsing
Configuration files
Machine Learning
(65)
Foundations of ML
(9)
What is ML? (vs. traditional programming)
Types: Supervised, Unsupervised, Semi-supervised, Self-supervised, Reinforcement
Training, Validation, Test sets
Overfitting & Underfitting
Bias-Variance tradeoff
Feature engineering
Feature scaling (normalization, standardization)
One-hot encoding
Data augmentation
Classical ML Algorithms
(13)
Linear Regression
Logistic Regression
Naive Bayes
k-Nearest Neighbors (k-NN)
Support Vector Machines (SVM)
Decision Trees
Random Forests
Gradient Boosting (XGBoost, LightGBM)
k-Means Clustering
Hierarchical Clustering
Expectation-Maximization (EM)
Hidden Markov Models (HMM)
Conditional Random Fields (CRF)
Model Evaluation
(7)
Accuracy, Precision, Recall, F1-Score
Macro vs. Micro vs. Weighted averages
Confusion matrix
ROC curve & AUC
Cross-validation (k-fold)
Train/test split strategies
Learning curves
Neural Networks & Deep Learning
(36)
Foundations
(11)
Biological inspiration (neuron analogy)
Perceptron
Activation functions (sigmoid, tanh, ReLU, GELU, softmax)
Multi-layer perceptron (MLP)
Forward propagation
Loss functions (MSE, cross-entropy, NLL)
Backpropagation (chain rule in action)
Weight initialization
Batch normalization
Dropout regularization
Epochs, batches, iterations
Architectures Relevant to NLP
(17)
Convolutional Neural Networks (CNN) — for text classification
Recurrent Neural Networks (RNN)
Vanishing / Exploding gradient problem
Long Short-Term Memory (LSTM)
Gated Recurrent Unit (GRU)
Bidirectional RNNs
Sequence-to-Sequence (Seq2Seq) models
Encoder-Decoder architecture
Attention mechanism (Bahdanau, Luong)
Self-attention
Multi-head attention
Positional encoding
The Transformer architecture
BERT & variants (RoBERTa, ALBERT, DistilBERT)
GPT & autoregressive language models
T5, BART (seq2seq transformers)
Large Language Models (LLMs) — architecture level
Training Deep Models
(8)
Transfer learning & fine-tuning
Pre-training objectives (MLM, CLM, NSP)
Learning rate scheduling
Mixed precision training
Gradient clipping
Early stopping
Model checkpointing
Distributed training concepts
Linguistics Knowledge (Formalized for CL)
(46)
Phonetics & Phonology (Computational)
(5)
Phoneme representation
International Phonetic Alphabet (IPA) encoding
Feature matrices
Speech sounds as signals
Acoustic phonetics basics (for speech processing)
Morphology (Computational)
(8)
Morphemes, affixes, roots
Inflection vs. derivation
Finite-State Morphology
Morphological analyzers
Stemming vs. Lemmatization
Tokenization (word, subword, character)
Byte-Pair Encoding (BPE)
WordPiece & Unigram tokenization
Syntax (Computational)
(8)
Phrase structure grammars (CFG, PCFG)
Chomsky Normal Form (CNF)
Dependency grammars
Constituency parsing
Dependency parsing
Treebanks (Penn Treebank, Universal Dependencies)
Parse trees as data structures
Ambiguity and probabilistic parsing
Semantics (Computational)
(12)
Word meaning representation
Lexical semantics (synonymy, polysemy, hypernymy)
WordNet & ontologies
Distributional semantics
Vector space models
Word embeddings (Word2Vec, GloVe, FastText)
Contextual embeddings
Compositional semantics
Formal semantics (lambda calculus basics)
Semantic role labeling
Frame semantics (FrameNet)
Abstract Meaning Representation (AMR)
Pragmatics & Discourse (Computational)
(7)
Coreference resolution
Anaphora resolution
Discourse structure
Rhetorical Structure Theory (RST)
Coherence & cohesion
Dialogue acts
Speech acts
Typology & Multilingual NLP
(6)
Language families & their computational implications
Agglutinative vs. fusional vs. isolating
Low-resource languages
Cross-lingual transfer
Multilingual models (mBERT, XLM-R)
Code-switching
Core NLP Tasks & Methods
(69)
Text Preprocessing
(6)
Sentence segmentation
Tokenization
Normalization (lowercasing, punctuation)
Stop word removal
Spelling correction
Text cleaning
Text Representation
(6)
Bag of Words (BoW)
TF-IDF
N-gram models
Word embeddings
Sentence embeddings
Document embeddings
Language Modeling
(6)
N-gram language models
Smoothing techniques
Neural language models
Autoregressive language modeling
Masked language modeling
Perplexity as evaluation
Sequence Labeling
(8)
Part-of-Speech (POS) tagging
Named Entity Recognition (NER)
Chunking
Slot filling
BIO/IOB2 tagging scheme
HMM-based taggers
CRF-based taggers
Neural taggers
Parsing
(4)
Constituency parsing
Dependency parsing
Semantic parsing
Grammar induction
Classification Tasks
(5)
Text classification
Sentiment analysis
Topic classification
Authorship identification
Language identification
Sequence-to-Sequence Tasks
(5)
Machine Translation (MT)
Text summarization
Paraphrase generation
Text simplification
Grammatical Error Correction
Information Extraction
(5)
Named Entity Recognition (NER)
Relation extraction
Event extraction
Open Information Extraction (OpenIE)
Template filling
Question Answering
(5)
Extractive QA (SQuAD-style)
Abstractive QA
Open-domain QA
Closed-book QA
Reading comprehension
Dialogue & Conversational AI
(7)
Task-oriented dialogue
Open-domain dialogue
Intent detection
Slot filling
Dialogue state tracking
Response generation
Retrieval-augmented generation
Lexical & Semantic Tasks
(5)
Word sense disambiguation (WSD)
Semantic textual similarity
Textual entailment / Natural Language Inference (NLI)
Semantic role labeling (SRL)
Coreference resolution
Speech & Audio (Spoken Language Processing)
(7)
Waveforms & sampling
Fourier Transform & spectrograms
Mel-frequency cepstral coefficients (MFCCs)
Automatic Speech Recognition (ASR)
Text-to-Speech (TTS)
Speaker diarization
Prosody modeling
Formal Language Theory & Automata
(10)
Formal grammars (Type 0, 1, 2, 3 — Chomsky hierarchy)
Regular languages & expressions
Finite-State Automata (FSA)
Finite-State Transducers (FST)
Context-Free Grammars (CFG)
Pushdown Automata (PDA)
Context-Sensitive Grammars
Turing Machines (concept)
Decidability & computational limits
Relation to linguistic levels
Corpus Linguistics & Annotation
(11)
What is a corpus?
Corpus design & sampling
Annotation schemes
Inter-annotator agreement
Corpus tools (AntConc, SketchEngine)
Concordancing
Frequency lists
Collocation & association measures
Major corpora (BNC, COCA, Penn Treebank, Universal Dependencies)
Crowdsourced annotation
Active learning for annotation
Research Methods & Scientific Practice
(13)
Reading research papers
Experimental design
Ablation studies
Baseline vs. proposed system
Statistical significance testing
Effect sizes
Writing & reporting results
Reproducibility
Citation practices
Peer review
Key Venues to Know
(3)
ACL, EMNLP, NAACL, COLING
ICLR, NeurIPS, ICML
arXiv (cs.CL, cs.LG)
Specialized & Advanced NLP Topics
(41)
Multilinguality & Low-Resource NLP
(4)
Transfer learning across languages
Zero-shot & few-shot cross-lingual transfer
Data augmentation for low-resource
Pivot-based approaches
Multimodal NLP
(4)
Vision-Language models
Image captioning
Visual QA
Audio-visual speech recognition
Evaluation & Benchmarking
(4)
Intrinsic vs. extrinsic evaluation
Standard benchmarks (GLUE, SuperGLUE, BIG-Bench)
Human evaluation design
Error analysis
Ethics, Bias & Fairness in NLP
(7)
Data bias
Model bias (gender, racial, cultural)
Fairness metrics
Debiasing techniques
Privacy (differential privacy, federated learning)
Environmental cost of large models
Responsible AI & NLP
Large Language Models (Advanced)
(12)
Scaling laws
Emergent abilities
Instruction tuning
Reinforcement Learning from Human Feedback (RLHF)
Prompt engineering
Chain-of-thought prompting
In-context learning
Retrieval-Augmented Generation (RAG)
Hallucination & factuality
Model interpretability & explainability
Parameter-efficient fine-tuning (LoRA, adapters, prefix tuning)
Quantization & model compression
Computational Psycholinguistics
(5)
Surprisal theory
Reading time models
Garden-path sentences (computational)
Eye-tracking & neural correlates
LLMs as cognitive models
Computational Sociolinguistics
(5)
Dialect identification
Stance detection
Hate speech & toxicity detection
Computational analysis of language change
Social media language processing
Free Arabic Books (كُتُبٌ عَرَبِيَّةٌ مَجَّانِيَّةٌ)
(35)
أساسيات الحوسبة والبرمجة
(6)
أساسيات الحوسبة
أساسيات النظم الرقمية
البرمجة بلغة بايثون
السلاسل الزمنية التحليل والتنبؤ عن طريق الامثلة
بايثون عن طريق الامثلة
تصميم قواعد البيانات
التعلم العميق
(7)
20 مشروعا للتعلم العميق باستخدام بايثون
التعلم العميق المبادئ والمفاهيم والاساليب
التعلم العميق عن طريق الأمثلة
التعلم العميق من الأساسيات إلى بناء شبكة عصبية عميقة بلغة البايثون
التعلم العميق واستخداماته في الرعاية الصحية
التعمق في التعلم العميق الجزء الثالث
التعمق في التعلم العميق الجزء الثاني
الذكاء الإصطناعي والخوارزميات
(7)
الخوارزميات
العربية والذكاء الاصطناعي
تطبيقات الذكاء الاصطناعي في خدمة اللغة العربية
خوارزميات الذكاء الاصطناعي في تحليل النص العربي
عشرة مشاريع عملية عن الذكاء الاصطناعي
علم البيانات عن طريق الامثلة
نقل التعلم في الرؤية الحاسوبية
اللسانيات الحاسوبية
(8)
المعالجة الآلية للغة العربية - المشاكل والحلول
المعالجة اﻵلية للنصوص العربية
الموارد اللغوية الحاسوبية
تحليل المشاعر و التنقيب في الآراء عن طريق الامثلة
تطبيقات أساسية في المعالجة اﻵلية للغة العربية
تقنيات اللغة العربية الحاسوبية
معالجة اللغات الطبيعية للويب الدلالي
مقدمة في حوسبة اللغة العربية
تعلم الآلة
(6)
التعلم الآلي عن طريق الامثلة
تعلم الآلة وعلم البيانات
علم البيانات وتعلم الآلة عن طريق الامثلة
مدخل إلى الذكاء الاصطناعي وتعلم اﻵلة
مشاريع تعلم الآلة باستخدام بايثون
معجم مصطلحات التعلم الآلي والتعلم العميق وعلم البيانات
مجلة اللغويات الحاسوبية والمعالجة الآلية للغة العربية
Video Tutorials Archive
(96)
إسترجاع البيانات (Information Retrieval)
(2)
Information Retrieval 1
Information Retrieval 2
التعابير النمطية (Regular Expressions)
(6)
Regex 1
Regex 2
Regex 3
Regex 4
Regex 5 | Pipes & Anchors
Regex 6 | Character Classes
بايثون (Python)
(12)
Python 1| String Types
Python 10 | Word Context 1
python 11 | Word Context 2
Python 2| Methods & Formatting
Python 3| ID-Type-Value
Python 4| Basic Operations
Python 5| Basic Operations 2
Python 6| Indexing
Python 7| Mutability
Python 8 | Importing text
Python 9| Search in Corpus
Search Datasets
تجريف المواقع (Scraping)
(12)
Web Scraping 1
Web Scraping 2
Web Scraping 3
Web Scraping 4
تجريف تويتر (Twitter Scraping)
(8)
Scraping Twitter 1
Scraping Twitter 2
Scraping Twitter 3
Scraping Twitter 4
Scraping Twitter 5
Scraping Twitter 6
Scraping Twitter 7
Scraping Twitter 8
تصوير وعرض النصوص (Text Visualization)
(2)
Text Visualization 1
Text Visualization 2
تضخيم البيانات (Data Augmentation)
(6)
Data Augmentation 1 | Back Translation
Data Augmentation 1 | Back Translation 2
Data Augmentation 2 | Synonym Replacement
Data Augmentation 3 | Bi-gram Flipping
Data Augmentation 4 | TF-IDF Word Replacement
Data Augmentation 5 | Entity Replacement
تعلم الآلة (Machine Learning)
(5)
Algorithms
Classification
Semi-Supervised Machine Learning
Supervised Machine Learning
UnSupervised Machine Learning
جدول\إطار البيانات (Dataframes)
(2)
Dataframes 1
Dataframes 2
شرح إمتدادات الملفات (File Extensions)
(12)
.CSV 1
.CSV 2
.DOC/.DOCX
.HTML
.JSON 1
.JSON 2
.PDF
.RAR
.TXT
.XLS/.XLSX
.ZIP
File Extensions
كتاب (NLP With Python)
(3)
NLP With Python 1
NLP With Python 2
NLP With Python 3
لسانيات المدونة (Corpus Linguistics)
(7)
Common Collocations
Corpus Linguistics 1| Lecture 1
Corpus Linguistics 2| Lecture 2
Corpus Linguistics 3 | Corpus Cleaning
NLTK Text
Statistics & Corpus Linguistics 1
Statistics & Corpus Linguistics 2 | Context
لسانيات المدونة (Corpus Linguistics)
(5)
Parallel Corpus 3
Parallel Corpus 1
Parallel Corpus 2
Parallel Corpus 5
Parallel Corpus 6
معالجة اللغات الطبيعية (NLP)
(22)
Bag of Words
Cosine Similarity 2
Cosine Similarity 2
Datasets
General & Specific Features
Named Entity Recognition (NER) 2
Named Entity Recognition (NER) 3
Named Entity Recognition (NER)1
NLP Introduction
PoS Tagging 1
PoS Tagging 2
PoS Tagging 3
Stemming vs Lemmatization
Stopwords 1
Stopwords 2
Text Forms
TF-IDF 1
TF-IDF 2
Tokenization 1
Tokenization 2
Tokenization 3
Wordvector