#10: Decision Trees

Created

Nov 2, 2021 06:10 PM

Topics

+ Random Forest

Decision Trees Motivation and intuition behind trees Composition Function Class What Makes a Good Decision Tree The Greedy Learning Algorithm Choosing the Best Variable to Split Theorizing Choosing the Best Variable Entropy Conditional Entropy Information Gain Know When to Stop Case When Information Gain Choosing a Best Decision Tree Heuristics Toward Obtaining Simper Trees Continuous Variable Decision Tree Decision Stumps Choosing Formulas Interpretation of Decision Trees Ensemble Learning

Decision Trees

Motivation and intuition behind trees

Composition

Node: tests a single attribute

Branch: one for each value of the attribute

Leaf: assign a class to the sample

Function Class

A decision tree can represent any function of the input

What Makes a Good Decision Tree

🪒

Principle of Occam’s Razor: the simpler the better

Learning the simplest & smallest decision tree ⇒ NP-complete problem (exponential computational complexity, hard to solve)

The Greedy Learning Algorithm

Start with an empty tree

Select the best possible attribute to split on

Generate child nodes: one for each value

Partition samples according to their values & assign to appropriate child node

Recurse: repeat for each child node; return if all samples at a node are in the same class

Choosing the Best Variable to Split

More certainty about the classification

Good fit: all samples of different classes goes to different nodes

Bad fit: equal number of samples fall in both classes

Theorizing Choosing the Best Variable

Entropy

The expected number of bits needed to encode a randomly drawn value of under the most efficient code

High: uniform-like distribution, less predictable

Low: has peeks, more predictable

Conditional Entropy

Conditioned on

Information Gain

Decrease in entropy after splitting at feature

Choose a variable that maximizes information gain

Know When to Stop

Biased toward finding a simpler tree

If stop too late: all samples will have its own class

Case When Information Gain

Involve an over-split hyperparameter

Over-split: randomly split & explore a few more levels

Then prune unused nodes

Choosing a Best Decision Tree

Decision trees will overfit!

Standard D-Tree: Bias = 0

High variance: slight change → huge difference

Heuristics Toward Obtaining Simper Trees

Lower bounding # of samples per leaf node

Upper bounding depth

Ensemble Learning (Random Forests)

Continuous Variable Decision Tree

Decision Stumps

Determine a threshold (Decision Stumps) to split for variable

Branch 1:

Branch 2:

Can allow repeated splits on the same variable w/ different thresholds

Choosing

❌ search through all possible values

✔️ Sort data

Consider splits at

Only split at t with most IG

Formulas

Interpretation of Decision Trees

May be readable but may not use human logic

Need to use heuristics to avoid overfitting (depth limit, leaf bin size threshold, etc.)

Very high variance: a small change in data will result in a completely different tree

Ensemble Learning

Reduce variance in models by

Train multiple models trained on the same data set

Perform an average over predictions

#11: Ensemble Learning