Machine Learning — Interactive Guide

01

Linear & Logistic Regression

Linear regression predicts continuous values via OLS; logistic regression predicts class probabilities with sigmoid transformation — both are interpretable baselines.

y = Xβ minimize MSE sigmoid(z) = 1/(1+e⁻ᶻ) L1 Lasso → sparse weights L2 Ridge → shrinks all e^β = odds ratio

Interactive Widget — Regression Explorer

▶ Narration

Regression Explorer — press play for audio explanation

Noise level 0.8

Regularization C 1.0

Scatter + Fit Line

Residuals

MSE

—

R²

—

Slope β

—

Intercept

—

Adjust noise and regularization to see how the model adapts.

from sklearn.linear_model import LogisticRegression
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score

X, y = make_classification(n_samples=1000, n_features=10,
                            n_informative=5, random_state=42)
Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2)

model = LogisticRegression(C=1.0, penalty='l2', max_iter=200)
model.fit(Xtr, ytr)
prob = model.predict_proba(Xte)[:, 1]
print(f"AUC-ROC: {roc_auc_score(yte, prob):.3f}")
print("Top coefs:", model.coef_[0][:3].round(3))

02

Decision Trees & Splitting

Trees recursively split data to maximize purity; interpretable and powerful but prone to overfitting without depth/sample constraints.

Gini = 1 − Σpᵢ² Entropy = −Σpᵢ log₂(pᵢ) MSE for regression trees No feature scaling needed Max depth controls overfit

Interactive Widget — Decision Boundary Visualizer

▶ Narration

Decision Boundary — press play for audio explanation

Max depth 3

Min samples leaf 5

Criterion gini

Decision Boundary (2D)

Feature Importance

Train Accuracy

—

Test Accuracy

—

Leaf Nodes

—

Watch how deeper trees overfit the training data.

from sklearn.tree import DecisionTreeClassifier, export_text
from sklearn.datasets import load_iris

X, y = load_iris(return_X_y=True)
dt = DecisionTreeClassifier(max_depth=3, criterion='gini',
                             random_state=42)
dt.fit(X, y)
print(export_text(dt, feature_names=load_iris().feature_names))
print("Feature importances:", dt.feature_importances_.round(3))

03

XGBoost & Ensemble Methods

Bagging (Random Forest) reduces variance by averaging independent trees; boosting (XGBoost) reduces bias by sequentially fitting residuals with regularization.

Random Forest: bagging + random features AdaBoost: reweight hard examples Gradient Boosting: fit pseudo-residuals XGBoost: GB + L1/L2 + subsampling Bias–Variance tradeoff

Interactive Widget — Bias–Variance & Ensemble Power

▶ Narration

Ensemble Methods — press play for audio explanation

Number of trees 20

Learning rate (boosting) 0.10

Max depth per tree 3

Bias²

—

Variance

—

Total Error

—

OOB Score

—

Compare how bagging and boosting reduce error differently.

from xgboost import XGBClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score

X, y = make_classification(n_samples=5000, n_features=20,
                            random_state=42)
xgb = XGBClassifier(
    n_estimators=100, learning_rate=0.1,
    max_depth=4, subsample=0.8,
    colsample_bytree=0.8,
    random_state=42, verbosity=0,
    eval_metric='logloss'
)
scores = cross_val_score(xgb, X, y, cv=5, scoring='roc_auc')
print(f"AUC: {scores.mean():.3f} +/- {scores.std():.3f}")

04

SVM — Support Vector Machine

SVMs find the maximum-margin hyperplane; the kernel trick maps data to higher dimensions implicitly, allowing non-linear classification.

Margin = 2/‖w‖ RBF: K(x,z) = exp(−γ‖x−z‖²) C high → overfit risk γ high → tight local boundary Support vectors on margin edge

Interactive Widget — SVM Margin & Kernel Explorer

▶ Narration

SVM Explorer — press play for audio explanation

C (regularization) 1.0

γ gamma (RBF) 0.50

Kernel RBF

Decision Boundary + Margin

C vs Margin Width

Support Vectors

—

Margin Width

—

Train Acc

—

Try switching kernels to see how the boundary shape changes.

from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV

X, y = make_moons(n_samples=500, noise=0.2, random_state=42)
pipe = Pipeline([('sc', StandardScaler()), ('svm', SVC())])
grid = GridSearchCV(pipe,
    {'svm__C':[0.1,1,10], 'svm__kernel':['rbf','linear']},
    cv=5, scoring='accuracy')
grid.fit(X, y)
print("Best params:", grid.best_params_)
print("CV accuracy:", f"{grid.best_score_:.3f}")

05

k-NN & k-Means

k-NN classifies by majority vote among k nearest neighbors (supervised, lazy learner); k-Means clusters by minimizing within-cluster variance (unsupervised, iterative).

k-NN: no training, O(n·d) predict k-Means: minimize WCSS Elbow method for k Silhouette score (higher = better) Always scale features!

Interactive Widget — Clustering & k-NN Explorer

▶ Narration

k-NN & k-Means — press play for audio explanation

k (clusters) 3

Spread / noise 1.0

Points per cluster 60

Cluster Assignment

Elbow / Silhouette Curve

WCSS

—

Silhouette

—

Iterations

—

k selected

—

Try the "Step k-Means" button to watch centroids move iteratively.

from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
import numpy as np

np.random.seed(42)
X = np.vstack([np.random.randn(100,2)+[0,0],
               np.random.randn(100,2)+[5,5],
               np.random.randn(100,2)+[0,5]])
X = StandardScaler().fit_transform(X)

for k in [2, 3, 4]:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = km.fit_predict(X)
    sil = silhouette_score(X, labels)
    print(f"k={k}: silhouette={sil:.3f}")

Learning fromData

Learning from
Data