Report: Evaluation#

New in version 0.11.4.

We use different metrics to estimate a machine learning model’s performance, and to understand its strengths and weaknesses.

In this guide, we’ll show you how to easily generate a report with everything your need in one place using our evaluate_models.

We’ll use the heart disease dataset, you can download it from here.

Download the data#

import urllib.request
import pandas as pd

urllib.request.urlretrieve(
    "https://raw.githubusercontent.com/sharmaroshan/"
    + "Heart-UCI-Dataset/master/heart.csv",
    filename="heart.csv",
)

data = pd.read_csv("heart.csv")

Prepare the data#

from sklearn.model_selection import train_test_split

column = "fbs"
X = data.drop(column, axis=1)
y = data[column]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=2023
)

Define the model#

from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)
y_score = model.predict_proba(X_test)

Evaluate the model#

from sklearn_evaluation.report import evaluate_model

report = evaluate_model(model, y_test, y_pred, y_score=y_score)

Embed the report#

report

Model evaluation - RandomForestClassifier

  • balance

  • Your test set is highly imbalanced
  • If you need help understanding these stats, send us a message on slack
  • accuracy

  • Accuracy is 0.9016393442622951
  • Please note your model is unbalanced, so high accuracy could be misleading
  • auc

  • Area under curve is low for class 0
  • If you need help understanding these stats, send us a message on slack
  • Number of classes : 1
  • AUC (roc) is : 0.49464285714285716
  • general stats

Save report as HTML#

report.save("report.html")