Report: Comparison
Contents
Report: Comparison#
New in version 0.11.4.
In this tutorial, we will demonstrate how to quickly train, test and compare two models to find which performs better.
We’ll be using the heart disease dataset and compare RandomForestClassifier
to DecisionTreeClassifier
.
You can download the dataset from here.
Download the data#
import urllib.request
import pandas as pd
from sklearn.model_selection import train_test_split
urllib.request.urlretrieve(
"https://raw.githubusercontent.com/sharmaroshan/"
+ "Heart-UCI-Dataset/master/heart.csv",
filename="heart.csv",
)
data = pd.read_csv("heart.csv")
Prepare the data#
column = "target"
X = data.drop(column, axis=1)
y = data[column]
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=2023
)
Random forest#
For our first model we will use the RandomForestClassifier
from sklearn.ensemble import RandomForestClassifier
model_a = RandomForestClassifier()
model_a.fit(X_train, y_train)
Decision tree#
The second model will be based on the DecisionTreeClassifier
from sklearn.tree import DecisionTreeClassifier
model_b = DecisionTreeClassifier()
model_b.fit(X_train, y_train)
Compare models#
Now, let’s use the compare_models
function and generate our report.
Please note that our report will be more detailed as we provide more parameters.
from sklearn_evaluation.report import compare_models
report = compare_models(model_a, model_b, X_test, y_test)
Embed the report#
report
Compare models - RandomForestClassifier vs DecisionTreeClassifier
precision recall
- Combined PR
auc
- RandomForestClassifier AUC (ROC) is 0.9354838709677419
- DecisionTreeClassifier AUC (ROC) is 0.7381720430107528
prediction time
- RandomForestClassifier compute time is 0.012692928314208984 (seconds)
- DecisionTreeClassifier compute time is 0.0014462471008300781 (seconds)
calibration
combined confusion matrix
combined pr
Save report as HTML#
report.save("report.html")