01
A/B Testing — Sample Size & Analysis
Rigorous A/B testing requires pre-computing sample size to avoid peeking bias — the most common mistake causing false positive experiment results in industry.
Interactive Widget — Sample Size Calculator & Experiment Simulator
Baseline CTR 12%
Min detectable effect (MDE) 2%
Statistical power 80%
Significance α 0.05
Daily traffic per variant 5,000
True lift (simulate) 2%
Simulated experiment — p-value trajectory over time (peeking line = danger zone)
p-value over time
α = significance threshold
Required n boundary
Power curve — sample size vs detectable effect
Null vs alternative sampling distributions
Required n/variant
—
Days to run
—
Simulated p
—
Result
—
Peeking risk
—
Adjust baseline rate and MDE to compute required sample size.
import numpy as np from scipy import stats def sample_size(baseline, mde, alpha=0.05, power=0.80): p1, p2 = baseline, baseline + mde pooled = (p1+p2)/2 za = stats.norm.ppf(1-alpha/2) # two-tailed zb = stats.norm.ppf(power) n = (za*np.sqrt(2*pooled*(1-pooled)) + zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2 / (p2-p1)**2 return int(np.ceil(n)) n = sample_size(baseline=0.12, mde=0.02) print(f"Required n per variant: {n:,}") np.random.seed(42) ctrl = np.random.binomial(1, 0.12, n) trt = np.random.binomial(1, 0.14, n) # 2% lift t, p = stats.ttest_ind(ctrl, trt) print(f"CTR ctrl={ctrl.mean():.3f}, trt={trt.mean():.3f}") print(f"p-value: {p:.4f} -> {'Significant' if p<0.05 else 'Not significant'}")
02
Fraud Detection — ML System Design
The 8-step ML design framework applied to fraud detection: formulate → features → model → eval → serve → monitor. Each step has distinct tradeoffs for a production system.
Interactive Widget — Fraud ML System Design & Live Scoring Simulator
8-step ML design framework — click a step to explore
Decision threshold — business cost tradeoff
Fraud score threshold 0.50
FP cost / FN cost ratio 1:10
Feature importance (GBM model)
Live transaction scoring — adjust features to see risk score change
Transaction amount ($) $250
Velocity (txns last 1h) 1
Merchant risk score 0.20
Is international? No
Hour of day 14:00
System monitoring signals
Fraud score
—
Decision
—
Latency SLA
—
Daily FP cost
—
Adjust the threshold slider — watch precision and recall trade off.
import numpy as np import pandas as pd from sklearn.ensemble import GradientBoostingClassifier from sklearn.metrics import average_precision_score from sklearn.model_selection import train_test_split np.random.seed(42) n = 10000 df = pd.DataFrame({ 'amount': np.random.exponential(100, n), 'velocity_1h': np.random.poisson(2, n), 'is_international': np.random.binomial(1, 0.1, n), 'merchant_risk': np.random.uniform(0, 1, n), 'hour_of_day': np.random.randint(0, 24, n), }) fraud_p = 1/(1+np.exp(-(-5 + 0.02*df['amount'] + 0.5*df['velocity_1h'] + 2*df['is_international'] + df['merchant_risk']))) df['fraud'] = np.random.binomial(1, fraud_p) print(f"Fraud rate: {df['fraud'].mean():.2%}") X, y = df.drop('fraud',axis=1), df['fraud'] Xtr,Xte,ytr,yte = train_test_split(X, y, stratify=y, test_size=0.2) clf = GradientBoostingClassifier(n_estimators=100, random_state=42) clf.fit(Xtr, ytr) yp = clf.predict_proba(Xte)[:,1] print(f"AUC-PR: {average_precision_score(yte, yp):.4f}") feat_imp = pd.Series(clf.feature_importances_, index=X.columns) print("Top features:\n", feat_imp.sort_values(ascending=False).head(3))
