09 — Machine Learning & AI MOC

← Back to Software Engineering - Map of Content

Teaching machines to learn from data. Increasingly core to software engineering — not just for ML specialists, but for every engineer building modern products.


ML Fundamentals

Learning Types

  • Supervised Learning — Labeled data, learn input → output mapping
    • Classification — Discrete outputs (spam/not spam, image labels)
    • Regression — Continuous outputs (price prediction, temperature)
  • Unsupervised Learning — No labels, discover structure in data
    • Clustering — K-Means, DBSCAN, hierarchical, Gaussian mixture models
    • Dimensionality Reduction — PCA, t-SNE, UMAP, autoencoders
    • Anomaly Detection — Isolation Forest, one-class SVM
  • Semi-Supervised Learning — Small labeled + large unlabeled dataset
  • Self-Supervised Learning — Generate labels from data itself (masked language modeling, contrastive learning)
  • Reinforcement Learning — Agent, environment, reward signal, policy optimization
    • Key Concepts — States, actions, rewards, Q-learning, policy gradient
    • RLHF — Reinforcement Learning from Human Feedback (used to train LLMs)

Classical ML Algorithms

  • Linear Models — Linear regression, logistic regression, regularization (L1/Lasso, L2/Ridge)
  • Decision Trees — Splits, information gain, Gini impurity, pruning
  • Ensemble Methods — Random Forest, Gradient Boosting (XGBoost, LightGBM, CatBoost)
  • Support Vector Machines — Kernel trick, margin maximization
  • K-Nearest Neighbors — Instance-based, distance metrics
  • Naive Bayes — Probabilistic, strong independence assumption, good for text
  • Bayesian Methods — Prior/posterior, Gaussian processes, probabilistic programming

Model Evaluation

  • Train/Validation/Test Split — Holdout method, cross-validation (k-fold, stratified)
  • Classification Metrics — Accuracy, Precision, Recall, F1-Score, AUC-ROC, confusion matrix
  • Regression Metrics — MSE, RMSE, MAE, R², adjusted R²
  • Bias-Variance Tradeoff — Underfitting (high bias) vs overfitting (high variance)
  • Regularization — L1, L2, dropout, early stopping, data augmentation
  • Hyperparameter Tuning — Grid search, random search, Bayesian optimization (Optuna)
  • Feature Importance — SHAP values, permutation importance, feature selection

Feature Engineering

  • Numerical Features — Scaling (standardization, normalization), binning, log transforms
  • Categorical Features — One-hot encoding, label encoding, target encoding, embeddings
  • Text Features — TF-IDF, bag of words, n-grams, word embeddings
  • Time Features — Lag features, rolling statistics, cyclical encoding
  • Feature Selection — Correlation analysis, mutual information, recursive feature elimination
  • Feature Stores — Centralized feature repository (Feast, Tecton, Hopsworks)

Deep Learning

Neural Network Fundamentals

  • Perceptron — Weighted sum + activation, basic building block
  • Activation Functions — ReLU, Sigmoid, Tanh, GELU, Swish, Softmax
  • Backpropagation — Chain rule, gradient computation, weight updates
  • Loss Functions — Cross-entropy, MSE, hinge loss, contrastive loss
  • Optimizers — SGD, Adam, AdamW, learning rate scheduling (cosine, warmup)
  • Batch Normalization — Normalize activations, stabilize training
  • Layer Normalization — Normalize across features (used in Transformers)
  • Dropout — Regularization by randomly dropping neurons during training
  • Residual Connections — Skip connections, enable very deep networks (ResNet)

Convolutional Neural Networks (CNNs)

  • Convolution Layers — Filters/kernels, feature maps, stride, padding
  • Pooling — Max pooling, average pooling, global average pooling
  • Architectures — LeNet, AlexNet, VGG, ResNet, EfficientNet, ConvNeXt
  • Applications — Image classification, object detection (YOLO, Faster R-CNN), segmentation

Recurrent Neural Networks (RNNs)

  • Vanilla RNN — Sequential processing, vanishing gradient problem
  • LSTM (Long Short-Term Memory) — Gates (forget, input, output), cell state
  • GRU (Gated Recurrent Unit) — Simplified LSTM
  • Bidirectional RNNs — Process sequence in both directions
  • Applications — Time series, speech (largely superseded by Transformers)

Transformers

  • Self-Attention — Query, Key, Value matrices, attention scores
  • Multi-Head Attention — Parallel attention with different projections
  • Positional Encoding — Inject sequence order information
  • Encoder-Decoder Architecture — Original Transformer (Vaswani et al., 2017)
  • Encoder-Only — BERT, RoBERTa — bidirectional, good for classification/NER
  • Decoder-Only — GPT, LLaMA, Claude — autoregressive, text generation
  • Encoder-Decoder — T5, BART — sequence-to-sequence tasks
  • Scaling Laws — Model size, data size, compute — predictable performance scaling
  • Flash Attention — IO-aware attention, reduced memory, faster training
  • Mixture of Experts (MoE) — Sparse activation, conditional computation

Generative Models

  • GANs (Generative Adversarial Networks) — Generator vs discriminator, adversarial training
  • VAEs (Variational Autoencoders) — Latent space, reconstruction + KL divergence loss
  • Diffusion Models — Gradually denoise, state-of-the-art image generation (Stable Diffusion, DALL-E)
  • Autoregressive Models — Generate token by token (GPT, LLMs)
  • Flow Models — Invertible transformations, exact likelihood

Training at Scale

  • Data Parallelism — Replicate model, split data across GPUs
  • Model Parallelism — Split model across GPUs (tensor, pipeline parallelism)
  • Mixed Precision Training — FP16/BF16 + FP32, faster with minimal accuracy loss
  • Gradient Accumulation — Simulate larger batch sizes
  • Distributed Training — DeepSpeed, FSDP, Megatron-LM
  • Frameworks — PyTorch, JAX, TensorFlow, Keras

Natural Language Processing

Text Processing Pipeline

  • Tokenization — Word-level, subword (BPE, WordPiece, SentencePiece, Unigram)
  • Vocabulary — Token-to-ID mapping, special tokens ([CLS], [SEP], [PAD])
  • Embeddings — Word2Vec, GloVe, FastText, contextual embeddings (BERT, GPT)

Language Model Applications

  • Text Classification — Sentiment analysis, topic categorization, intent detection
  • Named Entity Recognition (NER) — Extract entities (people, places, organizations)
  • Question Answering — Extractive (span selection), generative (free-form answer)
  • Summarization — Extractive vs abstractive summarization
  • Machine Translation — Sequence-to-sequence, multilingual models
  • Text Generation — Creative writing, code generation, dialogue systems

Large Language Models (LLMs)

  • Foundation Models — GPT-4, Claude, LLaMA, Gemini, Mistral
  • Prompt Engineering — Zero-shot, few-shot, chain-of-thought, system prompts
  • In-Context Learning — Learning from examples in the prompt
  • Retrieval-Augmented Generation (RAG) — Retrieve relevant documents, inject into context
    • Vector Databases — Pinecone, Weaviate, Chroma, Milvus, pgvector
    • Embedding Models — Sentence transformers, OpenAI embeddings
    • Chunking Strategies — Fixed-size, semantic, recursive splitting
    • Retrieval Methods — Dense retrieval, sparse (BM25), hybrid
  • Fine-Tuning — Full fine-tuning, LoRA, QLoRA, adapter methods
  • RLHF / Constitutional AI — Alignment techniques, reward models
  • Agent Frameworks — LangChain, LlamaIndex, AutoGPT, function calling, tool use
  • Evaluation — Perplexity, BLEU, ROUGE, human evaluation, benchmarks (MMLU, HumanEval)
  • Inference Optimization — Quantization (INT8, INT4), KV-cache, speculative decoding, vLLM

MLOps

Model Lifecycle

  • Experiment Tracking — MLflow, Weights & Biases, Neptune, CometML
  • Model Registry — Version models, stage transitions (dev → staging → production)
  • Reproducibility — Seed fixing, environment pinning, data versioning (DVC)

Model Serving

  • Batch Inference — Offline predictions on batches of data
  • Real-Time Inference — Online prediction APIs, low latency requirements
  • Serving Frameworks — TorchServe, TensorFlow Serving, Triton Inference Server, BentoML
  • Model Optimization — ONNX runtime, TensorRT, quantization, pruning, distillation
  • Edge Deployment — TensorFlow Lite, Core ML, ONNX Mobile

Data Management for ML

  • Data Versioning — DVC, LakeFS, Delta Lake
  • Data Labeling — Label Studio, Scale AI, Amazon SageMaker Ground Truth
  • Data Validation — Schema validation, distribution drift detection, Great Expectations
  • Synthetic Data — Generated training data, privacy preservation

Model Monitoring

  • Data Drift — Input distribution changes over time
  • Concept Drift — Relationship between input and output changes
  • Model Performance Degradation — Accuracy decay, latency increase
  • Monitoring Tools — Evidently AI, Arize, WhyLabs, NannyML
  • Retraining Triggers — Scheduled, performance-based, drift-based

ML Infrastructure

  • GPU Management — CUDA, GPU scheduling, multi-GPU training
  • ML Platforms — SageMaker, Vertex AI, Azure ML, Databricks
  • Pipeline Orchestration — Kubeflow, Airflow, Prefect, Metaflow
  • Cost Management — Spot instances, auto-scaling, right-sizing GPU instances

ML System Design

ML System Design Patterns

  • Batch Prediction — Pre-compute predictions offline, serve from cache/DB. Low latency at serving time but stale predictions.
  • Real-Time Prediction — Model serves predictions on demand. Fresh results but latency/cost constrained.
  • Online Learning — Model updates continuously from incoming data. Good for rapidly changing patterns (recommendations, fraud).
  • Embedding-Based Retrieval — Encode items as vectors, use approximate nearest neighbor (ANN) for retrieval. Powers recommendation and search systems.
  • Two-Tower Model — Separate encoders for query and item, dot product for scoring. Used in recommendation/search at scale (YouTube, Google).
  • Feature Pipeline Pattern — Separate feature computation from model serving. Feature store provides consistency between training and inference.
  • Human-in-the-Loop — Active learning, model suggests + human reviews, continuous improvement. Used in content moderation, medical diagnosis.

ML System Design Interview Problems

  • Recommendation System — Candidate generation → ranking → re-ranking, collaborative filtering, content-based, hybrid
  • Search Ranking — Query understanding, retrieval, ranking (learning-to-rank), re-ranking, evaluation (NDCG, MRR)
  • Fraud Detection — Imbalanced classes, real-time scoring, feature engineering from transaction graphs, false positive cost
  • Ads Click Prediction — Massive scale, real-time bidding, calibration, explore/exploit
  • Content Moderation — Multi-modal (text + image + video), multi-label classification, latency constraints, human review workflow
  • Newsfeed Ranking — Multi-objective optimization (engagement + quality + diversity), freshness, personalization

Responsible AI

  • Bias & Fairness — Training data bias, demographic parity, equalized odds, fairness metrics
  • Explainability — SHAP, LIME, attention visualization, feature importance, model cards
  • Robustness — Adversarial examples, distribution shift, out-of-distribution detection
  • Privacy in ML — Federated learning (train on device, aggregate gradients), differential privacy, membership inference attacks
  • AI Safety — Alignment problem, reward hacking, specification gaming, constitutional AI, RLHF guardrails
  • Model Cards — Documentation of model purpose, performance across groups, limitations, intended use
  • EU AI Act — Risk-based classification (unacceptable, high, limited, minimal), transparency requirements

Computer Vision (Beyond CNNs)

  • Object Detection — YOLO (real-time), Faster R-CNN (two-stage), DETR (Transformer-based), anchor-free methods
  • Semantic Segmentation — Per-pixel classification, U-Net, DeepLab, Mask R-CNN (instance segmentation)
  • Vision Transformers (ViT) — Apply Transformer architecture to image patches, competitive with CNNs at scale
  • Multimodal Models — CLIP (text + image), Flamingo, GPT-4V — align vision and language representations
  • Video Understanding — Temporal modeling, action recognition, video captioning
  • 3D Vision — NeRF (Neural Radiance Fields), depth estimation, point clouds, 3D reconstruction
  • Applications — Autonomous driving, medical imaging, satellite imagery, AR/VR, quality inspection

ml ai deep-learning nlp llm mlops system-design responsible-ai computer-vision