GΔ

Geometric Dilution • SMOTE in High Dimensions

Understand why SMOTE struggles as features grow. Explore the geometry, see the data, and get clear, evidence-based recommendations.

// SMOTE Playground

Move the sliders. We simulate how SMOTE creates synthetic points between neighbors. In high dimensions, those points cover less of the space—this is geometric dilution.

Dimensions (d)

6 Why it matters?

k Neighbors

Minority / Synthetic Samples

120 300

Noise (σ)

0.10 Seed

Coverage

—

How much of the space synthetic points can realistically reach.

Dilution

—

Higher means less meaningful coverage.

Neighbor Reach

—

Average distance to nearest neighbors.

Samples

—

Minority / Synthetic

Coverage formula: CR(d) ≈ ε^(d − 1) with ε ≈ 0.842. As d grows, CR shrinks fast.

// Coverage vs Dimensions

The curve drops quickly. That’s the core reason SMOTE can underperform in high-d data.

SMOTE: x_new = x + t(x_nbr − x) For d > 3 we show first 2 features

// Data

Loading evaluation results...

Dataset Metric

// Score by Method

// Effect Size (SMOTE vs Baseline)

// Training Time

// Inputs

Tell us about your dataset. We’ll recommend when SMOTE helps—and when to skip it.

Dimensions

Imbalance Ratio (majority:minority)

10:1

Expected Coverage

—

Risk Level

—

Confidence

—

// Recommendation

[STATUS: —]

Heuristic: Use SMOTE confidently when d ≤ 5 and IR ≥ 5; cautiously when d ≤ 15; avoid when d > 15 (try RandomOverSampler or class weights).

// Visualizations

These load from your visualizations/ folder if present.

// Summary

SMOTE makes new points between nearby minority samples. In low dimensions, that works well. As dimensions grow, the “space” grows even faster—so those linear interpolations cover a tiny fraction. That’s geometric dilution, and it explains why SMOTE often stalls or backfires in high-d datasets.

6 oversampling methods • 5 datasets • 5+ classifiers
F1 primary; ROC-AUC, Precision, Recall, BalAcc secondary
Holm-Bonferroni corrections • Cohen’s d effect sizes

Repo: smote-geometric-analysis

Try the Explorer first

Move the Dimensions slider and watch Coverage shrink. That drop is why SMOTE struggles as features grow.