Deploy AI apps for free on Ploomber Cloud!

Optimization#

Evaluating Grid Search Results#

A common practice in Machine Learning is to train several models with different hyperparameters and compare the performance across hyperparameter sets. scikit-learn provides a tool to do it: sklearn.grid_search.GridSearchCV, which trains the same model with different parameters. When doing grid search, it is tempting to just take the ‘best model’ and carry on, but analyzing the results can give us some interesting information, so it’s worth taking a look at the results.

sklearn-evaluation includes a plotting function to evaluate grid search results, this way we can see how the model performs when changing one (or two) hyperparameter(s) by keeping the rest constant.

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn import datasets
from sklearn_evaluation import plot

Prepare data#

First, let’s load some data.

data = datasets.make_classification(
    n_samples=200, n_features=10, n_informative=4, class_sep=0.5
)

X = data[0]
y = data[1]

Visualise results#

To generate the plot, we need to pass the grid_scores and the parameter(s) to change, let’s see how the number of trees in the Random Forest affects the performance of the model. We can also subset the grid scores to plot by using the subset parameter (note that the hyperparameter in change can also appear in subset).

plot.grid_search(
    clf.cv_results_,
    change="n_estimators",
    subset={"n_estimators": [10, 50, 100], "criterion": "gini"},
    kind="bar",
)
<Axes: title={'center': 'Grid search results'}, xlabel='n_estimators', ylabel='Mean score'>
../_images/97b4bb427aa1cec57636ce4f69c59ad1e0437f0a9b43a31e45810b36779c0ce4.png

To evaluate the effect of two hyperparameters, we pass the two of them in change, note that for this to work we need to subset the grid scores to match only one group. In this case we’ll plot n_estimators and criterion, so we need to subset max_features to one single value.

plot.grid_search(
    clf.cv_results_,
    change=("n_estimators", "criterion"),
    subset={"max_features": "sqrt"},
)
<Axes: >
../_images/c195bb1c686849fef02535a26b2c132e70bd79f7c964e18a99bb68012c9565ce.png