Skip to main content

Posts

Showing posts with the label gini index

Gini Index & Information Gain in Machine Learning

What is the Gini index? The Gini index is a measure of impurity in a set of data. It is calculated by summing the squared probabilities of each class. A lower Gini index indicates a more pure set of data. What is information gain? Information gain is a measure of how much information is gained by splitting a set of data on a particular feature. It is calculated by comparing the entropy of the original set of data to the entropy of the two child sets. A higher information gain indicates that the feature is more effective at splitting the data. What is impurity? Impurity is a measure of how mixed up the classes are in a set of data. A more impure set of data will have a higher Gini index. How are Gini index and information gain related? Gini index and information gain are both measures of impurity, but they are calculated differently. Gini index is calculated by summing the squared probabilities of each class, while information gain is calculated by comparing the entropy of the original ...