# Evaluation#

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn_evaluation import plot, table


sklearn-evluation has two main modules for evaluating classifiers: sklearn_evaluation.plot and sklearn_evaluation.table, let’s see an example of how to use them.

## Train a model#

First, let’s load some data and split it in training and test set.

data = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)
X = data[0]
y = data[1]
# shuffle and split training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)


Now, we are going to train the data using one of the scikit-learn classifiers.

est = RandomForestClassifier(n_estimators=5)
est.fit(X_train, y_train)

RandomForestClassifier(n_estimators=5)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.

## Input arguments#

Most of the functions require us to pass the class predictions for the test set (y_pred), the scores assigned (y_score) and the ground truth classes (y_true), let’s define such variables.

y_pred = est.predict(X_test)
y_score = est.predict_proba(X_test)
y_true = y_test


## Confusion Matrix#

We can start evaluating our model, the following example shows how to plot a confusion matrix. A confusion matrix visualizes the performances of a classification algorithm. For a two-class classification problem it contains 4 different combinations of predicted and actual values. We can infer four important metrics from this table:

True Positive : Model correctly classified a sample as positive.

False Positive : Model incorrectly classified a sample as positive.

True Negative : Model correctly classified a sample as negative.

False Negative : Model incorrectly classified a sample as negative.

plot.ConfusionMatrix.from_raw_data(y_true, y_pred)

<sklearn_evaluation.plot.classification.ConfusionMatrix at 0x7f705b369150>


## Feature Importances#

Some classifiers (such as sklearn.ensemble.RandomForestClassifier) have feature importances, we can plot them by passing the estimator object to the feature_importances function.

plot.feature_importances(est, top_n=5)

<Axes: title={'center': 'Feature importances'}>


A feature importances function is also available in the table module.

print(table.feature_importances(est))

+----------------+--------------+-----------+
| feature_name   |   importance |      std_ |
+================+==============+===========+
| Feature 2      |    0.16338   | 0.139953  |
+----------------+--------------+-----------+
| Feature 3      |    0.160281  | 0.100635  |
+----------------+--------------+-----------+
| Feature 5      |    0.154491  | 0.0500466 |
+----------------+--------------+-----------+
| Feature 6      |    0.124345  | 0.0474126 |
+----------------+--------------+-----------+
| Feature 7      |    0.113711  | 0.0585814 |
+----------------+--------------+-----------+
| Feature 10     |    0.0938143 | 0.0947334 |
+----------------+--------------+-----------+
| Feature 9      |    0.0751288 | 0.0689658 |
+----------------+--------------+-----------+
| Feature 1      |    0.0518163 | 0.0565625 |
+----------------+--------------+-----------+
| Feature 4      |    0.041304  | 0.0220686 |
+----------------+--------------+-----------+
| Feature 8      |    0.0217287 | 0.0165079 |
+----------------+--------------+-----------+


## Classification Report#

Precision describes how relevant the retrieved instances of positive class are. Recall is the measure of the model correctly identifying the actual positives. The F1 score can be interpreted as a harmonic mean of the precision and recall.

plot.ClassificationReport.from_raw_data(y_true, y_pred)

<sklearn_evaluation.plot.classification_report.ClassificationReport at 0x7f705ab4c790>


Now, let’s see how to generate two of the most common plots for evaluating classifiers: Precision-Recall and ROC.

## Precision Recall#

Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a classifier using different probability thresholds. It is often used when the dataset is imbalanced.

plot.PrecisionRecall.from_raw_data(y_true, y_score)

<sklearn_evaluation.plot.precision_recall.PrecisionRecall at 0x7f705a733c70>


## ROC#

An ROC curve (receiver operating characteristic curve) is a graph that shows a classification model’s performance at all classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives.

plot.ROC.from_raw_data(y_true, y_score)

<sklearn_evaluation.plot.roc.ROC at 0x7f705b368f40>