Created
Nov 9, 2021 07:18 PM
Topics
Bagging & Boosting
Bagging (Bootstrap Aggregating)Bootstrap SamplingClassification Tree ExampleSetup & TrainingMaking PredictionsOut-of-Bag (OOB) Error EstimationRandom ForestsRandom Feature SelectionSummaryAnalysisBoostingPredictionIntuitionAdaboostGradient BoostingIntuitionProcedureTheory: Relation to GradientsUpdate RuleGradient Boosted Decision Trees (if time permits)
Bagging (Bootstrap Aggregating)
Reduce variance w/o increasing bias
 : Averaging reduce variance
Β 
E.g. Cross Validation: give more stable error estimate by averaging multiple independent folds
Bootstrap Sampling
A method to sample from our sample data to simulate sampling from the population
- Use empirical distribution of our data to estimate the true unknown data-generating distribution
 
- Not all samples will be chosen and each could be chosen twice
 
- P(chosen) = 0.632
 
Classification Tree Example
Average the predictions over a collection of bootstrap samples
Setup & Training
Number of trees = 
Bootstrap sample size = original sample size 
Train each tree on each bootstrap sample  
Making Predictions
- Plurality vote over the predictions
 
- Predict with the combined probabilities
 - More well behaved than 1.
 
Out-of-Bag (OOB) Error Estimation
OOB Samples: remaining  sample not contained in each bootstrap samples
β See as test data
- Each sample is an OOB for trees
 
- Predict the response for the i-th observation
 
? When to record if a sample is OOB
Β 
Random Forests
Decorrelates individual bagged trees by small tweaks
Random Feature Selection
Essentially drop out a random subset of input features 
At each node, select a random subset of  predictors
Split along the best in the subset
In practice: 
Summary
Analysis
Random forests are better predictors than bagged trees
Boosting
A sequential (iterative) process: 
Combine multiple week classifier to classify non-linearly-separable data
Weak learner: a classification model w/ accuracy little more than 50%
- If use a strong learner, will 100% overfit
 
- Associate a weight to each training sample:
 
- Loop until convergence:
 - Train the weak learner on a bootstrap sample of the weighted training set
 - Increase for misclassified sample ; decrease otherwise
 
Prediction
DO a weighted majority voting on trained classifiers
Intuition
Adaboost
- = weight for model
 
- = classification error for model
 
- weight of sample computed from model 's classification errors
 
Gradient Boosting
Fit a model on the residuals & add it to the origional
Intuition
Given a model non-perfect model F
- Cannot delete anything from model F
 
- Can only add additional model to F to get
 
 by intuition
Use regression to find 
Procedure
Training set = residuals of the model
Train  on 
New Model = 
Repeat until satisified
Theory: Relation to Gradients
Squared Loss function 
Minimize  wrt  gives
β Residuals gradients
Update Rule
Can use other loss function
Gradient Boosted Decision Trees (if time permits)
Β 
