Deploy AI apps for free on Ploomber Cloud!

Experiment tracking#

SQLiteTracker provides a powerful and flexible way to track computational (e.g., Machine Learning) experiments using a SQLite database. Allows you to use SQL as the query language, giving you a powerful tool for experiment comparison, and comes with plotting features to compare plots side-by-side and to combine plots for better comparison.

Read more about the motivations in our blog post, check out the HN discussion.

This tutorial will walk you through the features with a Machine Learning use case; however, the tracker is generic enough to be used in any other domains.

from pathlib import Path

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# delete our example database, if any
db = Path("my_experiments.db")

if db.exists():
    db.unlink()
from sklearn_evaluation import SQLiteTracker

tracker = SQLiteTracker("my_experiments.db")
X, y = datasets.make_classification(200, 10, n_informative=5, class_sep=0.65)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

models = [RandomForestClassifier(), LogisticRegression(), DecisionTreeClassifier()]

Training and logging models#

for m in models:
    model = type(m).__name__
    print(f"Fitting {model}")

    experiment = tracker.new_experiment()
    m.fit(X_train, y_train)
    y_pred = m.predict(X_test)
    acc = accuracy_score(y_test, y_pred)

    # log a dictionary with log_dict
    experiment.log_dict({"accuracy": acc, "model": model, **m.get_params()})
Hide code cell output
Fitting RandomForestClassifier
Fitting LogisticRegression
Fitting DecisionTreeClassifier

Append experiment parameters#

Log initial “metric_a” values for the experiment

expr = tracker.new_experiment()
expr.log("metric_a", [0.2, 0.3])
tracker.get(expr.uuid)["metric_a"]
[0.2, 0.3]

Update the experiment, appending new “metric_a” values and adding “metric_b” values

tracker.upsert_append(expr.uuid, {"metric_a": 0.4, "metric_b": [0.8, 0.9]})
df = tracker.query(
    """
SELECT uuid,
       json_extract(parameters, '$.metric_a') AS metric_a,
       json_extract(parameters, '$.metric_b') AS metric_b
FROM experiments
"""
)
df
metric_a metric_b
uuid
67ce0516 None None
c7e41f4e None None
4beb2d6a None None
0aa740f7 [0.2,0.3,0.4] [0.8,0.9]

Displaying latest experiments#

Display the tracker object to show last experiments:

tracker

SQLiteTracker

uuid created parameters comment
c7e41f4e2023-04-11 16:49:05{"accuracy": 0.6515151515151515, "model": "LogisticRegression", "C": 1.0, "class_weight": null, "dual": false, "fit_intercept": true, "intercept_scaling": 1, "l1_ratio": null, "max_iter": 100, "multi_class": "auto", "n_jobs": null, "penalty": "l2", "random_state": null, "solver": "lbfgs", "tol": 0.0001, "verbose": 0, "warm_start": false}
4beb2d6a2023-04-11 16:49:05{"accuracy": 0.6515151515151515, "model": "DecisionTreeClassifier", "ccp_alpha": 0.0, "class_weight": null, "criterion": "gini", "max_depth": null, "max_features": null, "max_leaf_nodes": null, "min_impurity_decrease": 0.0, "min_samples_leaf": 1, "min_samples_split": 2, "min_weight_fraction_leaf": 0.0, "random_state": null, "splitter": "best"}
0aa740f72023-04-11 16:49:05{"metric_a": [0.2, 0.3, 0.4], "metric_b": [0.8, 0.9]}
67ce05162023-04-11 16:49:04{"accuracy": 0.803030303030303, "model": "RandomForestClassifier", "bootstrap": true, "ccp_alpha": 0.0, "class_weight": null, "criterion": "gini", "max_depth": null, "max_features": "sqrt", "max_leaf_nodes": null, "max_samples": null, "min_impurity_decrease": 0.0, "min_samples_leaf": 1, "min_samples_split": 2, "min_weight_fraction_leaf": 0.0, "n_estimators": 100, "n_jobs": null, "oob_score": false, "random_state": null, "verbose": 0, "warm_start": false}

(Most recent experiments)

Tip

Click here to see the detailed user guide.