02 — Machine Learning

Learning from
Data

Five core ML algorithms — interactive playgrounds with live visualizations. Drag sliders to see models change in real time. Press ▶ to hear each widget explained.

01
Linear & Logistic Regression
Linear regression predicts continuous values via OLS; logistic regression predicts class probabilities with sigmoid transformation — both are interpretable baselines.
y = Xβ minimize MSE sigmoid(z) = 1/(1+e⁻ᶻ) L1 Lasso → sparse weights L2 Ridge → shrinks all e^β = odds ratio
Interactive Widget — Regression Explorer
▶ Narration
Regression Explorer — press play for audio explanation
Noise level 0.8
Regularization C 1.0
Scatter + Fit Line
Residuals
MSE
Slope β
Intercept
Adjust noise and regularization to see how the model adapts.
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

X, y = make_classification(n_samples=1000, n_features=10,
                            n_informative=5, random_state=42)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2)

model = LogisticRegression(C=1.0, penalty='l2', max_iter=200)
model.fit(Xtr, ytr)
prob = model.predict_proba(Xte)[:, 1]
print(f"AUC-ROC: {roc_auc_score(yte, prob):.3f}")
print("Top coefs:", model.coef_[0][:3].round(3))
02
Decision Trees & Splitting
Trees recursively split data to maximize purity; interpretable and powerful but prone to overfitting without depth/sample constraints.
Gini = 1 − Σpᵢ² Entropy = −Σpᵢ log₂(pᵢ) MSE for regression trees No feature scaling needed Max depth controls overfit
Interactive Widget — Decision Boundary Visualizer
▶ Narration
Decision Boundary — press play for audio explanation
Max depth 3
Min samples leaf 5
Criterion gini
Decision Boundary (2D)
Feature Importance
Train Accuracy
Test Accuracy
Leaf Nodes
Watch how deeper trees overfit the training data.
from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
dt = DecisionTreeClassifier(max_depth=3, criterion='gini',
                             random_state=42)
dt.fit(X, y)
print(export_text(dt, feature_names=load_iris().feature_names))
print("Feature importances:", dt.feature_importances_.round(3))
03
XGBoost & Ensemble Methods
Bagging (Random Forest) reduces variance by averaging independent trees; boosting (XGBoost) reduces bias by sequentially fitting residuals with regularization.
Random Forest: bagging + random features AdaBoost: reweight hard examples Gradient Boosting: fit pseudo-residuals XGBoost: GB + L1/L2 + subsampling Bias–Variance tradeoff
Interactive Widget — Bias–Variance & Ensemble Power
▶ Narration
Ensemble Methods — press play for audio explanation
Number of trees 20
Learning rate (boosting) 0.10
Max depth per tree 3
Bias²
Variance
Total Error
OOB Score
Compare how bagging and boosting reduce error differently.
from xgboost import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score

X, y = make_classification(n_samples=5000, n_features=20,
                            random_state=42)
xgb = XGBClassifier(
    n_estimators=100, learning_rate=0.1,
    max_depth=4, subsample=0.8,
    colsample_bytree=0.8,
    random_state=42, verbosity=0,
    eval_metric='logloss'
)
scores = cross_val_score(xgb, X, y, cv=5, scoring='roc_auc')
print(f"AUC: {scores.mean():.3f} +/- {scores.std():.3f}")
04
SVM — Support Vector Machine
SVMs find the maximum-margin hyperplane; the kernel trick maps data to higher dimensions implicitly, allowing non-linear classification.
Margin = 2/‖w‖ RBF: K(x,z) = exp(−γ‖x−z‖²) C high → overfit risk γ high → tight local boundary Support vectors on margin edge
Interactive Widget — SVM Margin & Kernel Explorer
▶ Narration
SVM Explorer — press play for audio explanation
C (regularization) 1.0
γ gamma (RBF) 0.50
Kernel RBF
Decision Boundary + Margin
C vs Margin Width
Support Vectors
Margin Width
Train Acc
Try switching kernels to see how the boundary shape changes.
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

X, y = make_moons(n_samples=500, noise=0.2, random_state=42)
pipe = Pipeline([('sc', StandardScaler()), ('svm', SVC())])
grid = GridSearchCV(pipe,
    {'svm__C':[0.1,1,10], 'svm__kernel':['rbf','linear']},
    cv=5, scoring='accuracy')
grid.fit(X, y)
print("Best params:", grid.best_params_)
print("CV accuracy:", f"{grid.best_score_:.3f}")
05
k-NN & k-Means
k-NN classifies by majority vote among k nearest neighbors (supervised, lazy learner); k-Means clusters by minimizing within-cluster variance (unsupervised, iterative).
k-NN: no training, O(n·d) predict k-Means: minimize WCSS Elbow method for k Silhouette score (higher = better) Always scale features!
Interactive Widget — Clustering & k-NN Explorer
▶ Narration
k-NN & k-Means — press play for audio explanation
k (clusters) 3
Spread / noise 1.0
Points per cluster 60
Cluster Assignment
Elbow / Silhouette Curve
WCSS
Silhouette
Iterations
k selected
Try the "Step k-Means" button to watch centroids move iteratively.
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import numpy as np

np.random.seed(42)
X = np.vstack([np.random.randn(100,2)+[0,0],
               np.random.randn(100,2)+[5,5],
               np.random.randn(100,2)+[0,5]])
X = StandardScaler().fit_transform(X)

for k in [2, 3, 4]:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = km.fit_predict(X)
    sil = silhouette_score(X, labels)
    print(f"k={k}: silhouette={sil:.3f}")
Buy me a coffee QR code

Found this useful? If you'd like to spare me a coffee, scan the QR code or click here