09 — Business & ML Design

Build Systems
That Work

Interactive playgrounds for A/B test sample sizing and fraud detection ML system design. Simulate experiments, detect peeking bias, and walk through the 8-step design framework. Press ▶ for full audio explanations.

01
A/B Testing — Sample Size & Analysis
Rigorous A/B testing requires pre-computing sample size to avoid peeking bias — the most common mistake causing false positive experiment results in industry.
Sample size from baseline, MDE, α, power Run full weeks — avoid early stopping One primary metric + guardrails Network effects → cluster randomization Multiple testing → Bonferroni / FDR
Interactive Widget — Sample Size Calculator & Experiment Simulator
▶ Narration
A/B Testing — press play for audio explanation
Baseline CTR 12%
Min detectable effect (MDE) 2%
Statistical power 80%
Significance α 0.05
Daily traffic per variant 5,000
True lift (simulate) 2%
Simulated experiment — p-value trajectory over time (peeking line = danger zone)
p-value over time α = significance threshold Required n boundary
Power curve — sample size vs detectable effect
Null vs alternative sampling distributions
Required n/variant
Days to run
Simulated p
Result
Peeking risk
Adjust baseline rate and MDE to compute required sample size.
import numpy as np
from scipy import stats

def sample_size(baseline, mde, alpha=0.05, power=0.80):
    p1, p2 = baseline, baseline + mde
    pooled = (p1+p2)/2
    za = stats.norm.ppf(1-alpha/2)  # two-tailed
    zb = stats.norm.ppf(power)
    n  = (za*np.sqrt(2*pooled*(1-pooled)) +
          zb*np.sqrt(p1*(1-p1)+p2*(1-p2)))**2 / (p2-p1)**2
    return int(np.ceil(n))

n = sample_size(baseline=0.12, mde=0.02)
print(f"Required n per variant: {n:,}")

np.random.seed(42)
ctrl = np.random.binomial(1, 0.12, n)
trt  = np.random.binomial(1, 0.14, n)  # 2% lift
t, p = stats.ttest_ind(ctrl, trt)
print(f"CTR ctrl={ctrl.mean():.3f}, trt={trt.mean():.3f}")
print(f"p-value: {p:.4f} -> {'Significant' if p<0.05 else 'Not significant'}")
02
Fraud Detection — ML System Design
The 8-step ML design framework applied to fraud detection: formulate → features → model → eval → serve → monitor. Each step has distinct tradeoffs for a production system.
Binary classification, 0.1% fraud rate AUC-PR not AUC-ROC for imbalance <100ms online inference SLA Feature store for real-time features Monitor for concept drift
Interactive Widget — Fraud ML System Design & Live Scoring Simulator
▶ Narration
Fraud ML Design — press play for audio explanation
8-step ML design framework — click a step to explore
Decision threshold — business cost tradeoff
Fraud score threshold 0.50
FP cost / FN cost ratio 1:10
Feature importance (GBM model)
Live transaction scoring — adjust features to see risk score change
Transaction amount ($) $250
Velocity (txns last 1h) 1
Merchant risk score 0.20
Is international? No
Hour of day 14:00
System monitoring signals
Fraud score
Decision
Latency SLA
Daily FP cost
Adjust the threshold slider — watch precision and recall trade off.
import numpy as np
import pandas as pd
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import average_precision_score
from sklearn.model_selection import train_test_split

np.random.seed(42)
n = 10000
df = pd.DataFrame({
    'amount':           np.random.exponential(100, n),
    'velocity_1h':      np.random.poisson(2, n),
    'is_international': np.random.binomial(1, 0.1, n),
    'merchant_risk':    np.random.uniform(0, 1, n),
    'hour_of_day':      np.random.randint(0, 24, n),
})
fraud_p = 1/(1+np.exp(-(-5 + 0.02*df['amount'] + 0.5*df['velocity_1h']
    + 2*df['is_international'] + df['merchant_risk'])))
df['fraud'] = np.random.binomial(1, fraud_p)
print(f"Fraud rate: {df['fraud'].mean():.2%}")

X, y = df.drop('fraud',axis=1), df['fraud']
Xtr,Xte,ytr,yte = train_test_split(X, y, stratify=y, test_size=0.2)
clf = GradientBoostingClassifier(n_estimators=100, random_state=42)
clf.fit(Xtr, ytr)
yp = clf.predict_proba(Xte)[:,1]
print(f"AUC-PR: {average_precision_score(yte, yp):.4f}")
feat_imp = pd.Series(clf.feature_importances_, index=X.columns)
print("Top features:\n", feat_imp.sort_values(ascending=False).head(3))
Buy me a coffee QR code

Found this useful? If you'd like to spare me a coffee, scan the QR code or click here