Machine & Deep Learning Compendium

Decision Trees

Using hellinger distance to split supervised datasets, instead of gini and entropy. Claims better results.
Visualize decision trees, forests

CART TREES

explains about the similarities and how to measure. which is the best split? based on SSE and GINI (good info about gini here).

For classification the Gini cost function is used which provides an indication of how “pure” the leaf nodes are (how mixed the training data assigned to each node is).

Gini = sum(pk * (1 – pk))

Early stop - 1 sample per node is overfitting, 5-10 are good
Pruning - evaluate what happens if the lead nodes are removed, if there is a big drop, we need it.

KDTREE

RANDOM FOREST

Using an ensemble of trees to create a high dimensional and sparse representation of the data and classifying using a linear classifier

How do deal with imbalanced data in Random-forest -

One is based on cost sensitive learning.
Other is based on a sampling technique

EXTRA TREES

Difference between RF and ET
Differences #2

PreviousAnomaly Detection NextActive Learning Algorithms

Last updated 2 years ago