Deploy AI apps for free on Ploomber Cloud!




The Plot API supports both functional and object-oriented (OOP) interfaces. While the functional API allows you to quickly generate out-of-the-box plots and is the easiest to get started with, the OOP API offers more flexibility to compare models using a simple synatx, i.e, plot1 + plot2; or to customize the style and elements in the plot.

Object Oriented API#


class sklearn_evaluation.plot.ConfusionMatrix(cm, *, target_names=None, normalize=False, cmap=None)#

Plot confusion matrix.


Plot and Compare Confusion Matrix for multiple classifiers:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(
    1000, 20, n_informative=10, class_sep=0.80, n_classes=3, random_state=0

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)
y_pred = est.predict(X_test)

# plot for classifier 1
tree_cm = plot.ConfusionMatrix.from_raw_data(y_test, y_pred)

est = DecisionTreeClassifier(), y_train)
y_pred = est.predict(X_test)

# plot for classifier 2
forest_cm = plot.ConfusionMatrix.from_raw_data(y_test, y_pred)

# Compare
tree_cm + forest_cm

# Diff
forest_cm - tree_cm


Changed in version 0.9: Added cmap argument

classmethod from_dump(path)#

Instantiates a plot object from a path to a JSON file. A default implementation is provided, but you might override it.

classmethod from_raw_data(y_true, y_pred, target_names=None, normalize=False, cmap=None)#


Changed in version 0.9: Added cmap argument.


All plotting related code must be here with one optional argument ax=None. Must assign, self.ax_, and self.figure_ attributes and return self.


class sklearn_evaluation.plot.InteractiveConfusionMatrix(cm, *, target_names=None, interactive_data=None)#

Plot interactive confusion matrix.


New in version 0.11.3.

classmethod from_dump(path)#

Instantiates a plot object from a path to a JSON file. A default implementation is provided, but you might override it.

classmethod from_raw_data(y_true, y_pred, X_test=None, feature_names=None, feature_subset=None, nsample=5, target_names=None, normalize=False)#

Plot confusion matrix.

See also


  • y_true (array-like, shape = [n_samples]) – Correct target values (ground truth).

  • y_pred (array-like, shape = [n_samples]) – Target predicted classes (estimator predictions).

  • X_test (array-like, shape = [n_samples, n_features], optional) – Defaults to None. If X_test is passed interactive data is displayed upon clicking on each quadrant of the confusion matrix.

  • feature_names (list of feature names, optional) – feature_names can be passed if X_test passed is a numpy array. If not passed, feature names are generated like [Feature 0, Feature 1, .. , Feature N]

  • feature_subset (list of features, optional) – subset of features to display in the tables. If not passed first 5 columns are selected.

  • nsample (int, optional) – Defaults to 5. Number of sample observations to display in the interactive table if X_test is passed.

  • target_names (list) – List containing the names of the target classes. List must be in order e.g. ['Label for class 0', 'Label for class 1']. If None, generic labels will be generated e.g. ['Class 0', 'Class 1']

  • normalize (bool) – Normalize the confusion matrix


Click here to see the user guide.


All plotting related code must be here with one optional argument ax=None. Must assign, self.ax_, and self.figure_ attributes and return self.


class sklearn_evaluation.plot.PrecisionRecall(precision, recall, label=None)#

Plot precision recall curve.

  • precision (array-like, shape = [n_samples], when task is binary classification,) – or shape = [n_classes, n_samples], when task is multiclass classification.

  • recall (array-like, shape = [n_samples], when task is binary classification.) – or shape = [n_classes, n_samples], when task is multiclass classification.

  • label (string when task is binary classification, optional) – list of strings when task is multiclass classification this is used for labelling the curves. Defaults to precision recall. Make sure that the order of the labels corresponds to the order in which recall/precision arrays are passed to the constructor.

  • ax (matplotlib Axes) – Axes object to draw the plot onto, otherwise uses current Axes


Plot a Precision-Recall Curve:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(
    n_samples=2000, n_features=6, n_informative=4, class_sep=0.1

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

est = RandomForestClassifier(), y_train)

# y_pred = est.predict(X_test)
y_score = est.predict_proba(X_test)
y_true = y_test

# plot precision recall curve
pr = plot.PrecisionRecall.from_raw_data(y_true, y_score)

Compare Precision-Recall Curves of two classifiers:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(
    n_samples=200, n_features=10, n_informative=5, class_sep=0.65

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

est = RandomForestClassifier(), y_train)

y_pred = est.predict(X_test)
y_score = est.predict_proba(X_test)
y_true = y_test

# precision recall plot for random forest
forest_pr = plot.PrecisionRecall.from_raw_data(
    y_true, y_score, label="Random forest classifier"

est = DecisionTreeClassifier(), y_train)
y_pred = est.predict(X_test)
y_score = est.predict_proba(X_test)
y_true = y_test

# precision recall plot for decision tree
tree_pr = plot.PrecisionRecall.from_raw_data(
    y_true, y_score, label="Decision Tree classifier"

# compare two precision recall curves
forest_pr + tree_pr


New in version 0.10.1.

classmethod from_raw_data(y_true, y_score, *, label=None)#

Plot precision-recall curve from raw data.

  • y_true (array-like, shape = [n_samples]) – Correct target values (ground truth).

  • y_score (array-like, shape = [n_samples] or [n_samples, 2] for binary) – classification or [n_samples, n_classes] for multiclass Target scores (estimator predictions).

  • label (string or list, optional) – labels for the curves


It is assumed that the y_score parameter columns are in order. For example, if y_true = [2, 2, 1, 0, 0, 1, 2], then the first column in y_score must contain the scores for class 0, second column for class 1 and so on.


All plotting related code must be here with one optional argument ax=None. Must assign, self.ax_, and self.figure_ attributes and return self.


class sklearn_evaluation.plot.ROC(fpr, tpr, label=None)#

Plot ROC curve

  • fpr (ndarray of shape (>2,), list of lists or list of numbers) – Increasing false positive rates such that element i is the false positive rate of predictions with score >= thresholds[i].

  • tpr (ndarray of shape (>2,), list of lists or list of numbers) – Increasing true positive rates such that element i is the true positive rate of predictions with score >= thresholds[i].

  • label (list of str, default: None) – Set curve labels

  • ax (matplotlib Axes, default: None) – Axes object to draw the plot onto, otherwise uses current Axes

  • seealso: (..) – roc():


Plot a ROC Curve for binary classification:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
data = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)
X = data[0]
y = data[1]

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)

y_score = est.predict_proba(X_test)
y_true = y_test

# plot the roc curve
roc = plot.ROC.from_raw_data(y_true, y_score)

Compare ROC Curves of two binary classifiers:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)

y_score = est.predict_proba(X_test)
y_true = y_test

roc = plot.ROC.from_raw_data(y_true, y_score)

# create another dataset
X_, y_ = datasets.make_classification(200, 10, n_informative=5, class_sep=0.15)

# split data into train and test
X_train_, X_test_, y_train_, y_test_ = train_test_split(X_, y_, test_size=0.3)

est_ = RandomForestClassifier(), y_train_)

y_score_ = est.predict_proba(X_test_)
y_true_ = y_test_

roc2 = plot.ROC.from_raw_data(y_true_, y_score_)

# Compare both classifiers
roc + roc2

Plot a ROC Curve for multi-class classification:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# load data
iris = load_iris()
X, y =,
y = iris.target_names[y]

random_state = np.random.RandomState(0)
n_samples, n_features = X.shape

X = np.concatenate([X, random_state.randn(n_samples, 200 * n_features)], axis=1)
) = train_test_split(X, y, test_size=0.5, stratify=y, random_state=0)

classifier = LogisticRegression()
y_score =, y_train).predict_proba(X_test)

# plot roc curve
plot.ROC.from_raw_data(y_test, y_score)


New in version 0.8.4.

classmethod from_dump(path)#

Instantiates a plot object from a path to a JSON file. A default implementation is provided, but you might override it.

classmethod from_raw_data(y_true, y_score, ax=None)#

Takes raw unaggregated (for an example of aggregated vs unaggregated data see the constructor docstring) data, compute statistics and initializes the object. This is the method that users typically use. (e.g., they pass y_true, and y_pred here, we aggregate and call the constructor).

Apart from input data, this method must have the same argument as the constructor.

All arguments beyond the input data must be keyword-only (add a * argument between the input and the rest of the arguments).


All plotting related code must be here with one optional argument ax=None. Must assign, self.ax_, and self.figure_ attributes and return self.


class sklearn_evaluation.plot.ClassificationReport(matrix, keys, *, target_names=None)#


Plot a Classification Report:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

y_pred_rf = RandomForestClassifier().fit(X_train, y_train).predict(X_test)
y_pred_lr = LogisticRegression().fit(X_train, y_train).predict(X_test)

target_names = ["Not spam", "Spam"]

# report for random forest
cr_rf = plot.ClassificationReport.from_raw_data(
    y_test, y_pred_rf, target_names=target_names

# report for logistic regression
cr_lr = plot.ClassificationReport.from_raw_data(
    y_test, y_pred_lr, target_names=target_names

# how better it is the random forest?
cr_rf - cr_lr

# compare both reports
cr_rf + cr_lr
classmethod from_dump(path)#

Instantiates a plot object from a path to a JSON file. A default implementation is provided, but you might override it.

classmethod from_raw_data(y_true, y_pred, *, target_names=None, sample_weight=None, zero_division=0)#

Takes raw unaggregated (for an example of aggregated vs unaggregated data see the constructor docstring) data, compute statistics and initializes the object. This is the method that users typically use. (e.g., they pass y_true, and y_pred here, we aggregate and call the constructor).

Apart from input data, this method must have the same argument as the constructor.

All arguments beyond the input data must be keyword-only (add a * argument between the input and the rest of the arguments).


All plotting related code must be here with one optional argument ax=None. Must assign, self.ax_, and self.figure_ attributes and return self.


class sklearn_evaluation.plot.CalibrationCurve(mean_predicted_value, fraction_of_positives, label=None, cmap=None)#
  • mean_predicted_value (ndarray of shape (n_bins,) or smaller) – The mean predicted probability in each bin.

  • fraction_of_positives (ndarray of shape (n_bins,) or smaller) – The proportion of samples whose class is the positive class, in each bin.

  • label (list of str, optional) – A list of strings, where each string refers to the name of the classifier that produced the corresponding probability estimates in probabilities. If None, the names “Classifier 1”, “Classifier 2”, etc. will be used.

  • cmap (string or matplotlib.colors.Colormap instance, optional) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options.


from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

from sklearn_evaluation import plot

X, y = make_classification(
    n_samples=20000, n_features=2, n_informative=2, n_redundant=0, random_state=0
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=0

rf = RandomForestClassifier()
lr = LogisticRegression()
nb = GaussianNB()

rf_probas =, y_train).predict_proba(X_test)
lr_probas =, y_train).predict_proba(X_test)
nb_probas =, y_train).predict_proba(X_test)

probabilities = [rf_probas, lr_probas, nb_probas]

clf_names = [
    "Random Forest",
    "Logistic Regression",
    "Gaussian Naive Bayes",

plot.CalibrationCurve.from_raw_data(y_test, probabilities, label=clf_names)
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from sklearn_evaluation import plot

def make_dataset(n_samples):
    X, y = make_classification(
    return train_test_split(X, y, test_size=0.33, random_state=0)

# sample size 1k
X_train, X_test, y_train, y_test1 = make_dataset(n_samples=1000)
probs1 = LogisticRegression().fit(X_train, y_train).predict_proba(X_test)

# sample size 10k
X_train, X_test, y_train, y_test2 = make_dataset(n_samples=10000)
probs2 = LogisticRegression().fit(X_train, y_train).predict_proba(X_test)

# if you want plot probability curves for different sample sizes, pass
# a list with the true labels per each element in the probabilities
# argument
    [y_test1, y_test2], [probs1, probs2], label=["1k samples", "10k samples"]
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

from sklearn_evaluation import plot

X, y = make_classification(
    n_samples=20000, n_features=2, n_informative=2, n_redundant=0, random_state=0
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=0

rf = RandomForestClassifier()
lr = LogisticRegression()
nb = GaussianNB()

rf_probas =, y_train).predict_proba(X_test)
lr_probas =, y_train).predict_proba(X_test)
nb_probas =, y_train).predict_proba(X_test)

probabilities = [rf_probas, lr_probas, nb_probas]

clf_names = [
    "Random Forest",
    "Logistic Regression",
    "Gaussian Naive Bayes",

cc1 = plot.CalibrationCurve.from_raw_data(y_test, [rf_probas], label=["Random Forest"])
cc2 = plot.CalibrationCurve.from_raw_data(
    [lr_probas, nb_probas],
    label=["Logistic Regression", "Gaussian Naive Bayes"],
cc1 + cc2


New in version 0.11.1.

classmethod from_raw_data(y_true, probabilities, *, label=None, n_bins=10, cmap=None)#

Plots calibration curves for a set of classifier probability estimates. Calibration curves help determining whether you can interpret predicted probabilities as confidence level. For example, if we take a well-calibrated and take the instances where the score is 0.8, 80% of those instanes should be from the positive class. This function only works for binary classifiers.

  • y_true (array-like, shape = [n_samples] or list with array-like:) – Ground truth (correct) target values. If passed a single array- object, it assumes all the probabilities have the same shape as y_true. If passed a list, it expects y_true[i] to have the same size as probabilities[i]

  • probabilities (list of array-like, shape (n_samples, 2) or (n_samples,)) – A list containing the outputs of binary classifiers’ predict_proba() method or decision_function() method.

  • label (list of str, optional)) – A list of strings, where each string refers to the name of the classifier that produced the corresponding probability estimates in probabilities. If None, the names “Classifier 1”, “Classifier 2”, etc. will be used.

  • n_bins (int, optional, default=10) – Number of bins. A bigger number requires more data.

  • cmap (string or matplotlib.colors.Colormap instance, optional) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options.


Create the plot :param ax: An Axes object to add the plot to :type ax: matplotlib.Axes


class sklearn_evaluation.plot.Rank1D(algorithm='shapiro', features=None, figsize=(7, 7), orient='h', color=None, ax=None)#

Rank1D computes a score for each feature in the data set with a specific metric or algorithm (e.g. Shapiro-Wilk) then returns the features ranked as a bar plot.

  • algorithm (one of {'shapiro', }, default: 'shapiro') – The ranking algorithm to use, default is ‘Shapiro-Wilk.

  • features (list) – A list of feature names to use. If a DataFrame is passed features is None, feature names are selected as the columns of the DataFrame.

  • figsize (tuple, optional) – (width, height) for specifying the size of the plot.

  • orient ('h' or 'v', default='h') – Specifies a horizontal or vertical bar chart.

  • color (string) – Specify color for barchart

  • ax (matplotlib Axes, default: None) – The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).


An array of rank scores with shape (n,), where n is the number of features.




Visualize the Feature Rankings:

import matplotlib.pyplot as plt
from sklearn_evaluation.plot import Rank1D

from sklearn.datasets import load_breast_cancer as load_data

# load some data
X, y = load_data(return_X_y=True)

features = [
    "mean radius",
    "mean texture",
    "mean perimeter",
    "mean area",
    "mean smoothness",
    "mean compactness",
    "mean concavity",
    "mean concave points",
    "mean symmetry",
    "mean fractal dimension",
    "radius error",
    "texture error",
    "perimeter error",
    "area error",
    "smoothness error",
    "compactness error",
    "concavity error",
    "concave points error",
    "symmetry error",
    "fractal dimension error",
    "worst radius",
    "worst texture",
    "worst perimeter",
    "worst area",
    "worst smoothness",
    "worst compactness",
    "worst concavity",
    "worst concave points",
    "worst symmetry",
    "worst fractal dimension",

# plot feature rankings
rank1d = Rank1D(features=features, figsize=(14, 7))


New in version 0.8.4.


X (array-like, shape (n_samples, n_features)) – Feature dataset to be ranked. Refer


ax – Axes containing the plot

Return type

matplotlib Axes


This method is useful if user wants to use custom algorithm for feature ranking.


ranks (ndarray) – An n-dimensional, symmetric array of rank scores, where n is the number of features. E.g. for 1D ranking, it is (n,), for a 2D ranking it is (n,n).


ax – Axes containing the plot

Return type

matplotlib Axes


class sklearn_evaluation.plot.Rank2D(algorithm='pearson', features=None, colormap='RdBu_r', figsize=(7, 7), ax=None)#

Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm (e.g. Pearson correlation) then returns them ranked as a lower left triangle diagram.

  • algorithm (str, default: 'pearson') – The ranking algorithm to use, one of: ‘pearson’, ‘covariance’, ‘spearman’, or ‘kendalltau’.

  • features (list) – A list of feature names to use. If a DataFrame is passed features is None, feature names are selected as the columns of the DataFrame.

  • colormap (string or cmap, default: 'RdBu_r') – optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

  • figsize (tuple, optional) – (width, height) for specifying the size of the plot

  • ax (matplotlib Axes, default: None) – The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).


An array of rank scores with shape (n,n), where n is the number of features.




Visualize the Feature Rankings by Pairwise Comparison:

import matplotlib.pyplot as plt
from sklearn_evaluation.plot import Rank2D

from sklearn.datasets import load_breast_cancer as load_data

# load some data
X, y = load_data(return_X_y=True)

features = [
    "mean radius",
    "mean texture",
    "mean perimeter",
    "mean area",
    "mean smoothness",
    "mean compactness",
    "mean concavity",
    "mean concave points",
    "mean symmetry",
    "mean fractal dimension",
    "radius error",
    "texture error",
    "perimeter error",
    "area error",
    "smoothness error",
    "compactness error",
    "concavity error",
    "concave points error",
    "symmetry error",
    "fractal dimension error",
    "worst radius",
    "worst texture",
    "worst perimeter",
    "worst area",
    "worst smoothness",
    "worst compactness",
    "worst concavity",
    "worst concave points",
    "worst symmetry",
    "worst fractal dimension",

# plot feature rankings
rank2d = Rank2D(features=features, figsize=(14, 14))


New in version 0.8.4.


X (array-like, shape (n_samples, n_features)) – Feature dataset to be ranked. Refer


ax – Axes containing the plot

Return type

matplotlib Axes


This method is useful if user wants to use custom algorithm for feature ranking.


ranks (ndarray) – An n-dimensional, symmetric array of rank scores, where n is the number of features. E.g. for 1D ranking, it is (n,), for a 2D ranking it is (n,n).


ax – Axes containing the plot

Return type

matplotlib Axes

Functional API#


sklearn_evaluation.plot.calibration_curve(y_true, probabilities, clf_names=None, n_bins=10, cmap='nipy_spectral', ax=None)#

Plots calibration curves for a set of classifier probability estimates. Calibration curves help determining whether you can interpret predicted probabilities as confidence level. For example, if we take a well-calibrated and take the instances where the score is 0.8, 80% of those instanes should be from the positive class. This function only works for binary classifiers.

  • y_true (array-like, shape = [n_samples] or list with array-like:) – Ground truth (correct) target values. If passed a single array- object, it assumes all the probabilities have the same shape as y_true. If passed a list, it expects y_true[i] to have the same size as probabilities[i]

  • probabilities (list of array-like, shape (n_samples, 2) or (n_samples,)) – A list containing the outputs of binary classifiers’ predict_proba() method or decision_function() method.

  • clf_names (list of str, optional)) – A list of strings, where each string refers to the name of the classifier that produced the corresponding probability estimates in probabilities. If None, the names “Classifier 1”, “Classifier 2”, etc. will be used.

  • n_bins (int, optional, default=10) – Number of bins. A bigger number requires more data.

  • cmap (string or matplotlib.colors.Colormap instance, optional) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options.

  • ax (matplotlib Axes) – Axes object to draw the plot onto, otherwise uses current Axes


ax – Axes containing the plot

Return type

matplotlib Axes


from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB

from sklearn_evaluation import plot

# generate data
X, y = make_classification(
    n_samples=20000, n_features=2, n_informative=2, n_redundant=0, random_state=0

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=0

rf = RandomForestClassifier()
lr = LogisticRegression()
nb = GaussianNB()

rf_probas =, y_train).predict_proba(X_test)
lr_probas =, y_train).predict_proba(X_test)
nb_probas =, y_train).predict_proba(X_test)

# list of probabilities for different classifier
probabilities = [rf_probas, lr_probas, nb_probas]

clf_names = [
    "Random Forest",
    "Logistic Regression",
    "Gaussian Naive Bayes",

# plot calibration curve
plot.calibration_curve(y_test, probabilities, clf_names=clf_names)


sklearn_evaluation.plot.classification_report(*args, **kwargs)#


sklearn_evaluation.plot.confusion_matrix(y_true, y_pred, target_names=None, normalize=False, cmap=None, ax=None)#

Plot confusion matrix.

See also


  • y_true (array-like, shape = [n_samples]) – Correct target values (ground truth).

  • y_pred (array-like, shape = [n_samples]) – Target predicted classes (estimator predictions).

  • target_names (list) – List containing the names of the target classes. List must be in order e.g. ['Label for class 0', 'Label for class 1']. If None, generic labels will be generated e.g. ['Class 0', 'Class 1']

  • ax (matplotlib Axes) – Axes object to draw the plot onto, otherwise uses current Axes

  • normalize (bool) – Normalize the confusion matrix

  • cmap (matplotlib Colormap) – If None uses a modified version of matplotlib’s OrRd colormap.


ax – Axes containing the plot

Return type

matplotlib Axes


Plot a Confusion Matrix for binary classifier:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)

y_pred = est.predict(X_test)
y_true = y_test

# plot confusion matrix
plot.confusion_matrix(y_true, y_pred)


sklearn_evaluation.plot.cumulative_gain(*args, **kwargs)#


sklearn_evaluation.plot.elbow_curve(X, clf, range_n_clusters=None, n_jobs=1, show_cluster_time=True, ax=None)#

Plots elbow curve of different values of K of a clustering algorithm.

  • X (array-like, shape = [n_samples, n_features]:) – Data to cluster, where n_samples is the number of samples and n_features is the number of features. Refer

  • clf – Clusterer instance that implements fit,``fit_predict``, and score methods, and an range_n_clusters hyperparameter. e.g. sklearn.cluster.KMeans instance

  • range_n_clusters (None or list of int, optional) – List of n_clusters for which to plot the explained variances. Defaults to [1, 3, 5, 7, 9, 11].

  • n_jobs (int, optional) – Number of jobs to run in parallel. Defaults to 1.

  • show_cluster_time (bool, optional) – Include plot of time it took to cluster for a particular K.

  • ax (matplotlib.axes.Axes, optional) – The axes upon which to plot the curve. If None, the plot is drawn on the current Axes


ax – Axes containing the plot

Return type

matplotlib Axes


Plot the Elbow Curve:

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

from sklearn_evaluation import plot

# generate data
X, _ = make_blobs(n_samples=100, centers=3, n_features=5, random_state=0)
kmeans = KMeans(random_state=1, n_init=5)

# plot elbow curve
plot.elbow_curve(X, kmeans, range_n_clusters=range(1, 30))


sklearn_evaluation.plot.elbow_curve_from_results(*args, **kwargs)#


sklearn_evaluation.plot.feature_importances(data, top_n=None, feature_names=None, orientation='horizontal', ax=None)#

Get and order feature importances from a scikit-learn model or from an array-like structure. If data is a scikit-learn model with sub-estimators (e.g. RandomForest, AdaBoost) the function will compute the standard deviation of each feature.

  • data (sklearn model or array-like structure) – Object to get the data from.

  • top_n (int) – Only get results for the top_n features.

  • feature_names (array-like) – Feature names

  • orientation (('horizontal', 'vertical')) – Bar plot orientation

  • ax (matplotlib Axes) – Axes object to draw the plot onto, otherwise uses current Axes


ax – Axes containing the plot

Return type

matplotlib Axes


Plot Feature Importances:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(200, 20, n_informative=5, class_sep=0.65)

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

model = RandomForestClassifier(n_estimators=1), y_train)

# plot all features
ax = plot.feature_importances(model)

# only top 5
plot.feature_importances(model, top_n=5)


sklearn_evaluation.plot.ks_statistic(*args, **kwargs)#


sklearn_evaluation.plot.learning_curve(*args, **kwargs)#


sklearn_evaluation.plot.lift_curve(*args, **kwargs)#


sklearn_evaluation.plot.metrics_at_thresholds(fn, y_true, y_score, n_thresholds=10, start=0.0, ax=None)#

Plot metrics at increasing thresholds


sklearn_evaluation.plot.pca(*args, **kwargs)#


sklearn_evaluation.plot.precision_at_proportions(y_true, y_score, ax=None)#

Plot precision values at different proportions.

  • y_true (array-like) – Correct target values (ground truth).

  • y_score (array-like) – Target scores (estimator predictions).

  • ax (matplotlib Axes) – Axes object to draw the plot onto, otherwise uses current Axes


ax – Axes containing the plot

Return type

matplotlib Axes


sklearn_evaluation.plot.precision_recall(y_true, y_score, ax=None)#

Plot precision-recall curve.

  • y_true (array-like, shape = [n_samples]) – Correct target values (ground truth).

  • y_score (array-like, shape = [n_samples] or [n_samples, 2] for binary) – classification or [n_samples, n_classes] for multiclass Target scores (estimator predictions).

  • ax (matplotlib Axes) – Axes object to draw the plot onto, otherwise uses current Axes


It is assumed that the y_score parameter columns are in order. For example, if y_true = [2, 2, 1, 0, 0, 1, 2], then the first column in y_score must contain the scores for class 0, second column for class 1 and so on.


ax – Axes containing the plot

Return type

matplotlib Axes


Plot a Precision-Recall Curve for binary classification:

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
data = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)
X = data[0]
y = data[1]

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)

y_score = est.predict_proba(X_test)
y_true = y_test

# plot precision recall curve
plot.precision_recall(y_true, y_score)

Plot a Precision-Recall Curve for multi-class classification:

import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data for multiclass classification
data = datasets.make_classification(
X = data[0]
y = data[1]

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)

y_score = est.predict_proba(X_test)
y_true = y_test

# plot precision recall curve
plot.precision_recall(y_true, y_score)


sklearn_evaluation.plot.prediction_error(*args, **kwargs)#


sklearn_evaluation.plot.residuals(*args, **kwargs)#


sklearn_evaluation.plot.roc(y_true, y_score, ax=None)#

Plot ROC curve

  • y_true (array-like, shape = [n_samples]) –

    Correct target values (ground truth).


    ”classes” format : [0, 1, 2, 0, 1, …] or [‘virginica’, ‘versicolor’, ‘virginica’, ‘setosa’, …]

    one-hot encoded classes[[0, 0, 1],

    [1, 0, 0]]

  • y_score (array-like, shape = [n_samples] or [n_samples, 2] for binary) –

    classification or [n_samples, n_classes] for multiclass Target scores (estimator predictions).


    ”scores” format[[0.1, 0.1, 0.8],

    [0.7, 0.15, 0.15]]

  • ax (matplotlib Axes, default: None) – Axes object to draw the plot onto, otherwise uses current Axes


It is assumed that the y_score parameter columns are in order. For example, if y_true = [2, 2, 1, 0, 0, 1, 2], then the first column in y_score must contain the scores for class 0, second column for class 1 and so on.

See also



Plot a ROC Curve for binary classification:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from sklearn_evaluation import plot

# generate data
X, y = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)

# split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

est = RandomForestClassifier(), y_train)

# y_pred = est.predict(X_test)
y_score = est.predict_proba(X_test)

# plot roc curve
plot.roc(y_test, y_score)


sklearn_evaluation.plot.scores_distribution(*args, **kwargs)#


sklearn_evaluation.plot.silhouette_analysis(X, clf, range_n_clusters=None, metric='euclidean', figsize=None, cmap=None, text_fontsize='medium', ax=None)#

Plots silhouette analysis of clusters provided.

  • X (array-like, shape = [n_samples, n_features]:) – Cluster data, where n_samples is the number of samples and n_features is the number of features. Refer

  • clf – Clusterer instance that implements fit,``fit_predict``, and score methods, and an n_clusters hyperparameter. e.g. sklearn.cluster.KMeans instance

  • range_n_clusters (None or list of int, optional) – List of n_clusters for which to plot the silhouette scores. Defaults to [2, 3, 4, 5, 6].

  • metric (string or callable, optional:) – The metric to use when calculating distance between instances in a feature array. If metric is a string, it must be one of the options allowed by sklearn.metrics.pairwise.pairwise_distances. If X is the distance array itself, use “precomputed” as the metric.

  • figsize (2-tuple, optional:) – Tuple denoting figure size of the plot e.g. (6, 6). Defaults to None.

  • cmap (string or matplotlib.colors.Colormap instance, optional:) – Colormap used for plotting the projection. View Matplotlib Colormap documentation for available options.

  • text_fontsize (string or int, optional:) – Matplotlib-style fontsizes. Use e.g. “small”, “medium”, “large” or integer-values. Defaults to “medium”.

  • ax (matplotlib.axes.Axes, optional:) – The axes upon which to plot the curve. If None, the plot is drawn on a new set of axes.


ax – Axes containing the plot

Return type

matplotlib Axes


Plot the Silhouette Analysis:

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

from sklearn_evaluation import plot

# generate data
X, y = make_blobs(
    center_box=(-10.0, 10.0),

kmeans = KMeans(random_state=1, n_init=5)

# plot silhouette analysis of provided clusters
plot.silhouette_analysis(X, kmeans, range_n_clusters=[3])


New in version 0.8.3.


sklearn_evaluation.plot.silhouette_analysis_from_results(X, cluster_labels, metric='euclidean', figsize=None, cmap=None, text_fontsize='medium', ax=None)#

Same as silhouette_plot but takes list of cluster_labels as input. Useful if you want to train the model yourself


Plot the Silhouette Analysis from the results:

from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs

from sklearn_evaluation import plot

# generate data
X, y = make_blobs(
    center_box=(-10.0, 10.0),

cluster_labels = []

# Cluster labels for four clusters
kmeans = KMeans(n_clusters=4, n_init=5)

# Cluster labels for five clusters
kmeans = KMeans(n_clusters=5, n_init=5)

# plot silhouette analysis from provided list of cluster labels
plot.silhouette_analysis_from_results(X, cluster_labels)


sklearn_evaluation.plot.target_analysis(*args, **kwargs)#


sklearn_evaluation.plot.validation_curve(*args, **kwargs)#


sklearn_evaluation.plot.cooks_distance(*args, **kwargs)#

report_evaluation#, y_true, y_pred, X_test=None, y_score=None, report_title=None)#

Evaluates a given model and generates an HTML report

  • model (estimator) – An estimator to evaluate.

  • y_true (array-like) – Correct target values (ground truth).

  • y_pred (array-like) – Target predicted classes (estimator predictions).

  • y_score (array-like, default None) – Target scores (estimator predictions).

  • report_title (str, default "Model evaluation - {model_name}") –


Generate evaluation report for RandomForestClassifier

import urllib.request

from sklearn.ensemble import RandomForestClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
from import evaluate_model

url = (


column = "fbs"
data = pd.read_csv("heart.csv")
X = data.drop(column, axis=1)
y = data[column]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2023

model = RandomForestClassifier(), y_train)

y_pred = model.predict(X_test)
y_score = model.predict_proba(X_test)

report = evaluate_model(model, y_test, y_pred, y_score=y_score)


New in version 0.11.4.

report_comparison#, model_b, X_test, y_true, report_title=None)#

Compares two models and generates an HTML report

  • model_a (estimator) – An estimator to compare.

  • model_b (estimator) – An estimator to compare.

  • X_test (array-like of shape (n_samples, n_features)) – Training data, where n_samples is the number of samples and n_features is the number of features.

  • y_true (array-like) – Correct target values (ground truth).

  • report_title (str, default "Compare models - {model_a} vs {model_b}") –


Compare DecisionTreeClassifier and RandomForestClassifier

import pandas as pd
import urllib.request
from sklearn.model_selection import train_test_split
from import compare_models
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier

url = (


data = pd.read_csv("heart.csv")

column = "target"
X = data.drop(column, axis=1)
y = data[column]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2023

model_a = RandomForestClassifier(), y_train)

model_b = DecisionTreeClassifier(), y_train)

report = compare_models(model_a, model_b, X_test, y_test)


New in version 0.11.4.