01
Classification Metrics
Accuracy is misleading for imbalanced datasets. Precision/recall tradeoff must be understood in terms of the business cost of false positives vs false negatives.
Interactive Widget — Confusion Matrix & Threshold Explorer
Class imbalance (% positive) 10%
Model skill (separability) 0.80
Decision threshold 0.50
Predicted +
Predicted −
Actual +
0
0
Actual −
0
0
■ TP
■ FN
■ FP
■ TN
Accuracy
—
Precision
—
Recall
—
F1 Score
—
Precision · Recall · F1 vs Threshold
Precision
Recall
F1
Adjust the threshold to explore the precision-recall trade-off.
from sklearn.metrics import classification_report, average_precision_score from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=1000, weights=[0.9,0.1], random_state=42) Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2) clf = RandomForestClassifier(class_weight='balanced', random_state=42) clf.fit(Xtr, ytr) y_pred = clf.predict(Xte) yprob = clf.predict_proba(Xte)[:,1] print(classification_report(yte, y_pred)) print(f"AUC-PR: {average_precision_score(yte, yprob):.3f}")
02
AUC-ROC vs AUC-PR
AUC-ROC measures overall ranking quality; AUC-PR is more informative when positives are rare because it never counts true negatives in its calculation.
Interactive Widget — ROC & PR Curve Explorer
Positive class prevalence 5%
Model discrimination (AUC) 0.85
Highlighted threshold 0.50
ROC Curve (TPR vs FPR)
Precision-Recall Curve
AUC-ROC
—
AUC-PR
—
Prevalence
—
Random PR baseline
—
Compare how ROC and PR curves change with class imbalance.
from sklearn.metrics import roc_auc_score, average_precision_score from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split X, y = make_classification(n_samples=10000, weights=[0.97,0.03], n_features=15, random_state=42) Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2) model = LogisticRegression(class_weight='balanced', max_iter=500) model.fit(Xtr, ytr) yprob = model.predict_proba(Xte)[:,1] print(f"AUC-ROC: {roc_auc_score(yte, yprob):.4f}") print(f"AUC-PR: {average_precision_score(yte, yprob):.4f}") # AUC-PR much lower -> reveals true difficulty
03
Regression Metrics
MAE is robust to outliers; RMSE penalizes large errors heavily; R² shows how much variance your model explains vs a naive mean baseline.
Interactive Widget — Regression Error Explorer
Model noise / error 25
Outlier intensity 0
Number of outliers 0
Predicted vs Actual (residuals shown)
Residual Distribution
MAE
—
RMSE
—
R²
—
MAPE
—
MAE vs RMSE sensitivity to outliers — as outlier intensity grows
MAE
RMSE
← RMSE grows faster with outliers
Add outliers and watch RMSE diverge from MAE — RMSE penalizes large errors disproportionately.
from sklearn.metrics import (mean_absolute_error, mean_squared_error, r2_score) from sklearn.ensemble import GradientBoostingRegressor from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split import numpy as np X, y = make_regression(n_samples=1000, n_features=10, noise=25, random_state=42) Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2) model = GradientBoostingRegressor(n_estimators=100, random_state=42) model.fit(Xtr, ytr) yp = model.predict(Xte) print(f"MAE: {mean_absolute_error(yte,yp):.2f}") print(f"RMSE: {np.sqrt(mean_squared_error(yte,yp)):.2f}") print(f"R^2: {r2_score(yte,yp):.4f}")
