Report: Comparison#

New in version 0.11.4.

In this tutorial, we will demonstrate how to quickly train, test and compare two models to find which performs better.

We’ll be using the heart disease dataset and compare RandomForestClassifier to DecisionTreeClassifier.

You can download the dataset from here.

Download the data#

import urllib.request
import pandas as pd
from sklearn.model_selection import train_test_split

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/sharmaroshan/"
    + "Heart-UCI-Dataset/master/heart.csv",
    filename="heart.csv",
)

data = pd.read_csv("heart.csv")

Prepare the data#

column = "target"
X = data.drop(column, axis=1)
y = data[column]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2023
)

Random forest#

For our first model we will use the RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier

model_a = RandomForestClassifier()
model_a.fit(X_train, y_train)

Decision tree#

The second model will be based on the DecisionTreeClassifier

from sklearn.tree import DecisionTreeClassifier

model_b = DecisionTreeClassifier()
model_b.fit(X_train, y_train)

Compare models#

Now, let’s use the compare_models function and generate our report. Please note that our report will be more detailed as we provide more parameters.

from sklearn_evaluation.report import compare_models

report = compare_models(model_a, model_b, X_test, y_test)

Embed the report#

report

Compare models - RandomForestClassifier vs DecisionTreeClassifier

  • precision recall

  • Combined PR
  • auc

  • RandomForestClassifier AUC (ROC) is 0.9354838709677419
  • DecisionTreeClassifier AUC (ROC) is 0.7381720430107528
  • prediction time

  • RandomForestClassifier compute time is 0.012692928314208984 (seconds)
  • DecisionTreeClassifier compute time is 0.0014462471008300781 (seconds)
  • calibration

  • combined confusion matrix

  • combined pr

Save report as HTML#

report.save("report.html")