Report: Comparison#

New in version 0.11.4.

In this tutorial, we will demonstrate how to quickly train, test and compare two models to find which performs better.

We’ll be using the heart disease dataset and compare RandomForestClassifier to DecisionTreeClassifier.

You can download the dataset from here.

Download the data#

import urllib.request
import pandas as pd
from sklearn.model_selection import train_test_split

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/sharmaroshan/"
    + "Heart-UCI-Dataset/master/heart.csv",
    filename="heart.csv",
)

data = pd.read_csv("heart.csv")

Prepare the data#

column = "target"
X = data.drop(column, axis=1)
y = data[column]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2023
)

Random forest#

For our first model we will use the RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier

model_a = RandomForestClassifier()
model_a.fit(X_train, y_train)

Decision tree#

The second model will be based on the DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

model_b = DecisionTreeClassifier()
model_b.fit(X_train, y_train)

Compare models#

Now, let’s use the compare_models function and generate our report. Please note that our report will be more detailed as we provide more parameters.

from sklearn_evaluation.report import compare_models

report = compare_models(model_a, model_b, X_test, y_test)

Embed the report#

report

Compare models - RandomForestClassifier vs DecisionTreeClassifier

precision recall

Combined PR

auc
RandomForestClassifier AUC (ROC) is 0.9354838709677419
DecisionTreeClassifier AUC (ROC) is 0.7381720430107528

prediction time
RandomForestClassifier compute time is 0.012692928314208984 (seconds)
DecisionTreeClassifier compute time is 0.0014462471008300781 (seconds)

calibration

combined confusion matrix

combined pr

Save report as HTML#

report.save("report.html")

sklearn-evaluation

Report: Comparison

Contents

Report: Comparison#

Download the data#

Prepare the data#

Random forest#

Decision tree#

Compare models#

Embed the report#

Compare models - RandomForestClassifier vs DecisionTreeClassifier

precision recall

auc

prediction time

calibration

combined confusion matrix

combined pr

Save report as HTML#