Interactive Confusion Matrix#
In this tutorial, we’ll demonstrate how to plot an interactive confusion matrix using the penguins
dataset.
import pandas as pd
import seaborn as sns
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
Load the dataset#
df = sns.load_dataset("penguins")
df.dropna(inplace=True)
Y = df.species
Y = Y.map({"Adelie": 0, "Chinstrap": 1, "Gentoo": 2})
df.drop("species", inplace=True, axis=1)
se = pd.get_dummies(df["sex"], drop_first=True)
df = pd.concat([df, se], axis=1)
df.drop("sex", axis=1, inplace=True)
le = LabelEncoder()
df["island"] = le.fit_transform(df["island"])
X = df
X_train, X_test, y_train, y_test = train_test_split(
X, Y, test_size=0.3, random_state=40
)
df.head()
island | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g | Male | |
---|---|---|---|---|---|---|
0 | 2 | 39.1 | 18.7 | 181.0 | 3750.0 | 1 |
1 | 2 | 39.5 | 17.4 | 186.0 | 3800.0 | 0 |
2 | 2 | 40.3 | 18.0 | 195.0 | 3250.0 | 0 |
4 | 2 | 36.7 | 19.3 | 193.0 | 3450.0 | 0 |
5 | 2 | 39.3 | 20.6 | 190.0 | 3650.0 | 1 |
Train a model#
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Interactive confusion matrix#
from sklearn_evaluation import plot
cm = plot.InteractiveConfusionMatrix.from_raw_data(
y_test.tolist(),
y_pred.tolist(),
X_test=X_test,
feature_subset=[
"Male",
"body_mass_g",
"bill_depth_mm",
"bill_length_mm",
"flipper_length_mm",
],
nsample=6,
)
Clicking on each quadrant displays two tables: Sample Observations
and Quadrant Statistics
.
Sample Observations
displays 5 random samples from the quadrant. Quadrant Statistics
table displays some statistics on all the data that lies in this quadrant.
cm.chart