Deploy AI apps for free on Ploomber Cloud!

Interactive Confusion Matrix

Contents

Interactive Confusion Matrix#

In this tutorial, we’ll demonstrate how to plot an interactive confusion matrix using the penguins dataset.

import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

Load the dataset#

df = sns.load_dataset("penguins")
df.dropna(inplace=True)
Y = df.species
Y = Y.map({"Adelie": 0, "Chinstrap": 1, "Gentoo": 2})
df.drop("species", inplace=True, axis=1)
se = pd.get_dummies(df["sex"], drop_first=True)
df = pd.concat([df, se], axis=1)
df.drop("sex", axis=1, inplace=True)
le = LabelEncoder()
df["island"] = le.fit_transform(df["island"])
X = df
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.3, random_state=40
)
df.head()

	island	bill_length_mm	bill_depth_mm	flipper_length_mm	body_mass_g	Male
0	2	39.1	18.7	181.0	3750.0	1
1	2	39.5	17.4	186.0	3800.0	0
2	2	40.3	18.0	195.0	3250.0	0
4	2	36.7	19.3	193.0	3450.0	0
5	2	39.3	20.6	190.0	3650.0	1

Train a model#

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Interactive confusion matrix#

from sklearn_evaluation import plot

cm = plot.InteractiveConfusionMatrix.from_raw_data(
    y_test.tolist(),
    y_pred.tolist(),
    X_test=X_test,
    feature_subset=[
        "Male",
        "body_mass_g",
        "bill_depth_mm",
        "bill_length_mm",
        "flipper_length_mm",
    ],
    nsample=6,
)

Clicking on each quadrant displays two tables: Sample Observations and Quadrant Statistics. Sample Observations displays 5 random samples from the quadrant. Quadrant Statistics table displays some statistics on all the data that lies in this quadrant.

import altair as alt

alt.renderers.enable("html")

RendererRegistry.enable('html')

cm.chart