Deploy AI apps for free on Ploomber Cloud!

Evaluate Regression#

These 2 plots allow you to visualize your learning’s accuracy.

from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn_evaluation import plot

Fetch Data#

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Conduct Learning#

reg = LinearRegression()
reg.fit(X_train, y_train)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
y_pred = reg.predict(X_test)
y_true = y_test

Visualize Evaluations#

Residuals Plot#

This plot shows the residual values’ distribution against the predicted value.

plot.residuals(y_true, y_pred)
<Axes: title={'center': 'Residuals Plot'}, xlabel='Predicted Value', ylabel='Residuals'>
../_images/52dd0d57eaf151604075a483f017a16efd3649fce1fcb65bc1ca3c96ea0e14bd.png

Prediction Error Plot#

This plot has 2 lines: the identity line (where y_predicted=y_measured) and the best fit regression line of y_predicted against y_measured. The difference between the 2 lines visualizes the error of the prediction and trend of learning.

plot.prediction_error(y_true, y_pred)
<Axes: title={'center': 'Prediction Error'}, xlabel='y_true', ylabel='y_pred'>
../_images/40978e6ec90332f7557c07f743f891e5f4fc4064063ce8b87ebd353b47f677e4.png

Cooks Distance#

Cooks distance is an effective tool to measure the influence of an outlier in the training dataset for a regression problem. Outliers are data points that vary significantly from the rest of the data points in the training set. The presence of outliers in the training phase can affect the parameters that the model learns. This implementation assumes the Ordinary Least Squares regression.

Create a dataset with strong outliers.

from sklearn.datasets import make_regression

X, y = make_regression(
    n_samples=100, n_features=6, n_informative=5, n_targets=1, bias=100.0, noise=30.0
)
plot.cooks_distance(X, y)
<Axes: title={'center': "Cook's Distance Outlier Detection"}, xlabel='instance index', ylabel='influence (I)'>
../_images/8b26334e5d112aaac99f6bdf4a44836809e0f8f9e4c3b8e6a7db2eca081a9841.png