Deploy AI apps for free on Ploomber Cloud!

Interactive Confusion Matrix

Interactive Confusion Matrix#

In this tutorial, we’ll demonstrate how to plot an interactive confusion matrix using the penguins dataset.

import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

Load the dataset#

df = sns.load_dataset("penguins")
df.dropna(inplace=True)
Y = df.species
Y = Y.map({"Adelie": 0, "Chinstrap": 1, "Gentoo": 2})
df.drop("species", inplace=True, axis=1)
se = pd.get_dummies(df["sex"], drop_first=True)
df = pd.concat([df, se], axis=1)
df.drop("sex", axis=1, inplace=True)
le = LabelEncoder()
df["island"] = le.fit_transform(df["island"])
X = df
X_train, X_test, y_train, y_test = train_test_split(
    X, Y, test_size=0.3, random_state=40
)
df.head()
island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g Male
0 2 39.1 18.7 181.0 3750.0 1
1 2 39.5 17.4 186.0 3800.0 0
2 2 40.3 18.0 195.0 3250.0 0
4 2 36.7 19.3 193.0 3450.0 0
5 2 39.3 20.6 190.0 3650.0 1

Train a model#

from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

Interactive confusion matrix#

from sklearn_evaluation import plot

cm = plot.InteractiveConfusionMatrix.from_raw_data(
    y_test.tolist(),
    y_pred.tolist(),
    X_test=X_test,
    feature_subset=[
        "Male",
        "body_mass_g",
        "bill_depth_mm",
        "bill_length_mm",
        "flipper_length_mm",
    ],
    nsample=6,
)

Clicking on each quadrant displays two tables: Sample Observations and Quadrant Statistics. Sample Observations displays 5 random samples from the quadrant. Quadrant Statistics table displays some statistics on all the data that lies in this quadrant.

import altair as alt

alt.renderers.enable("html")
RendererRegistry.enable('html')
cm.chart