01
Linear & Logistic Regression
Linear regression predicts continuous values via OLS; logistic regression predicts class probabilities with sigmoid transformation — both are interpretable baselines.
Interactive Widget — Regression Explorer
Noise level 0.8
Regularization C 1.0
Scatter + Fit Line
Residuals
MSE
—
R²
—
Slope β
—
Intercept
—
Adjust noise and regularization to see how the model adapts.
from sklearn.linear_model import LogisticRegression from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.metrics import roc_auc_score X, y = make_classification(n_samples=1000, n_features=10, n_informative=5, random_state=42) Xtr, Xte, ytr, yte = train_test_split(X, y, test_size=0.2) model = LogisticRegression(C=1.0, penalty='l2', max_iter=200) model.fit(Xtr, ytr) prob = model.predict_proba(Xte)[:, 1] print(f"AUC-ROC: {roc_auc_score(yte, prob):.3f}") print("Top coefs:", model.coef_[0][:3].round(3))
02
Decision Trees & Splitting
Trees recursively split data to maximize purity; interpretable and powerful but prone to overfitting without depth/sample constraints.
Interactive Widget — Decision Boundary Visualizer
Max depth 3
Min samples leaf 5
Criterion gini
Decision Boundary (2D)
Feature Importance
Train Accuracy
—
Test Accuracy
—
Leaf Nodes
—
Watch how deeper trees overfit the training data.
from sklearn.tree import DecisionTreeClassifier, export_text from sklearn.datasets import load_iris X, y = load_iris(return_X_y=True) dt = DecisionTreeClassifier(max_depth=3, criterion='gini', random_state=42) dt.fit(X, y) print(export_text(dt, feature_names=load_iris().feature_names)) print("Feature importances:", dt.feature_importances_.round(3))
03
XGBoost & Ensemble Methods
Bagging (Random Forest) reduces variance by averaging independent trees; boosting (XGBoost) reduces bias by sequentially fitting residuals with regularization.
Interactive Widget — Bias–Variance & Ensemble Power
Number of trees 20
Learning rate (boosting) 0.10
Max depth per tree 3
Bias²
—
Variance
—
Total Error
—
OOB Score
—
Compare how bagging and boosting reduce error differently.
from xgboost import XGBClassifier from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score X, y = make_classification(n_samples=5000, n_features=20, random_state=42) xgb = XGBClassifier( n_estimators=100, learning_rate=0.1, max_depth=4, subsample=0.8, colsample_bytree=0.8, random_state=42, verbosity=0, eval_metric='logloss' ) scores = cross_val_score(xgb, X, y, cv=5, scoring='roc_auc') print(f"AUC: {scores.mean():.3f} +/- {scores.std():.3f}")
04
SVM — Support Vector Machine
SVMs find the maximum-margin hyperplane; the kernel trick maps data to higher dimensions implicitly, allowing non-linear classification.
Interactive Widget — SVM Margin & Kernel Explorer
C (regularization) 1.0
γ gamma (RBF) 0.50
Kernel RBF
Decision Boundary + Margin
C vs Margin Width
Support Vectors
—
Margin Width
—
Train Acc
—
Try switching kernels to see how the boundary shape changes.
from sklearn.svm import SVC from sklearn.datasets import make_moons from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.model_selection import GridSearchCV X, y = make_moons(n_samples=500, noise=0.2, random_state=42) pipe = Pipeline([('sc', StandardScaler()), ('svm', SVC())]) grid = GridSearchCV(pipe, {'svm__C':[0.1,1,10], 'svm__kernel':['rbf','linear']}, cv=5, scoring='accuracy') grid.fit(X, y) print("Best params:", grid.best_params_) print("CV accuracy:", f"{grid.best_score_:.3f}")
05
k-NN & k-Means
k-NN classifies by majority vote among k nearest neighbors (supervised, lazy learner); k-Means clusters by minimizing within-cluster variance (unsupervised, iterative).
Interactive Widget — Clustering & k-NN Explorer
k (clusters) 3
Spread / noise 1.0
Points per cluster 60
Cluster Assignment
Elbow / Silhouette Curve
WCSS
—
Silhouette
—
Iterations
—
k selected
—
Try the "Step k-Means" button to watch centroids move iteratively.
from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler from sklearn.metrics import silhouette_score import numpy as np np.random.seed(42) X = np.vstack([np.random.randn(100,2)+[0,0], np.random.randn(100,2)+[5,5], np.random.randn(100,2)+[0,5]]) X = StandardScaler().fit_transform(X) for k in [2, 3, 4]: km = KMeans(n_clusters=k, random_state=42, n_init=10) labels = km.fit_predict(X) sil = silhouette_score(X, labels) print(f"k={k}: silhouette={sil:.3f}")
