Skip to content

XGBoost for Classification: The Definitive Guide to Extreme Gradient Boosting

DS
LDS Team
Let's Data Science
11 minAudio
Listen Along
0:00/ 0:00
AI voice

Kaggle leaderboards tell a consistent story. Scroll through the winning solutions for any tabular data competition and one algorithm appears again and again: XGBoost. The original paper by Chen and Guestrin (2016) has been cited over 50,000 times, and for good reason. XGBoost (Extreme Gradient Boosting) combines second-order gradient optimization with built-in regularization to produce classification models that outperform most alternatives on structured data, often right out of the box.

But XGBoost isn't just "gradient boosting, but faster." It introduces a specific mathematical framework that gives it measurable advantages over traditional gradient boosting machines and random forests. This article breaks down exactly how XGBoost works for classification, walks through the math that makes it tick, and builds a complete fraud detection model in Python.

Throughout every section, we'll work with one scenario: detecting fraudulent credit card transactions from a synthetic dataset of 5,000 transactions. Every formula, every code block, and every diagram references this same fraud detection problem.

The XGBoost Framework

XGBoost is an optimized gradient boosting library that builds an ensemble of decision trees sequentially, where each new tree corrects the errors left by the previous ensemble. It falls under the broader umbrella of ensemble learning, but its design choices make it fundamentally different from both traditional gradient boosting and bagging methods like random forests.

The distinction between boosting and bagging is the first thing to nail down:

PropertyBagging (Random Forest)Boosting (XGBoost)
Tree constructionParallel (independent)Sequential (dependent)
Goal of each treeReduce varianceReduce bias (fix errors)
Training dataBootstrap samplesWeighted/residual-focused
Final predictionAverage (regression) or vote (classification)Weighted sum of all trees
Overfitting tendencyLow (averaging smooths noise)Higher (correcting errors can chase noise)
RegularizationBuilt-in via randomnessExplicit penalty in objective function

Comparison of bagging and boosting ensemble strategies showing parallel versus sequential tree constructionClick to expandComparison of bagging and boosting ensemble strategies showing parallel versus sequential tree construction

Bagging trains many trees on different subsets of the data, then averages their predictions. Each tree is oblivious to the others. Boosting is the opposite: tree number 47 specifically targets the mistakes that trees 1 through 46 still get wrong. This sequential error correction is what gives boosting its power on structured data.

What makes XGBoost different from a vanilla gradient boosting implementation? Three things:

  1. Second-order gradients — XGBoost uses both the gradient (first derivative) and the Hessian (second derivative) of the loss function, enabling more precise step sizes.
  2. Regularization baked into the objective — traditional GBMs don't penalize tree complexity directly. XGBoost does.
  3. Systems-level engineering — column-based data layout, cache-aware access patterns, and sparsity-aware split finding make XGBoost fast on real hardware.

Sequential Error Correction

XGBoost learns by adding trees that specifically target the residual errors of the current ensemble. Each new tree receives a "map" of where the existing model fails and focuses its splits on those regions.

The Golfer Analogy

Picture a golfer trying to sink a putt in complete darkness. The "hole" is the correct classification (fraud or legitimate), and each "swing" is a new decision tree added to the ensemble.

Traditional gradient boosting gives the golfer a compass: "The hole is 12 feet to the left." The golfer takes a swing in that direction. But the compass says nothing about the terrain between here and the hole. Is it uphill? Downhill? Flat? The golfer has to take small, cautious swings to avoid overshooting.

XGBoost gives the golfer both the compass and a topographic map. The compass (gradient) says "go left." The map (Hessian) says "the ground slopes steeply downhill here, so the ball will roll fast." With both pieces of information, the golfer can calibrate the swing precisely: less force on a downhill slope, more on an uphill one. Fewer swings to reach the hole.

In our fraud detection problem, the "hole" is the correct probability for each transaction (0 for legitimate, 1 for fraud). Each tree is a swing that nudges predictions closer to those targets. The gradient tells each tree which direction to push, and the Hessian tells it how far to push.

Key Insight: The Hessian acts like a confidence measure. Where the loss surface is sharply curved (high Hessian), XGBoost takes smaller, more careful steps. Where it's flat (low Hessian), it takes larger steps. This adaptive step sizing is why XGBoost converges in fewer boosting rounds than first-order methods.

The Objective Function and Taylor Expansion

To understand why XGBoost outperforms standard boosting, you need to see the objective function. While standalone decision trees minimize impurity measures like Gini or entropy, XGBoost minimizes a composite objective that balances prediction accuracy against model complexity.

The Composite Objective

At boosting step tt, the model adds a new tree ftf_t to the ensemble. The objective function measures total cost:

Obj(t)=i=1nl(yi,  y^i(t1)+ft(xi))+Ω(ft)\text{Obj}^{(t)} = \sum_{i=1}^{n} l\bigl(y_i,\; \hat{y}_i^{(t-1)} + f_t(x_i)\bigr) + \Omega(f_t)

Where:

  • Obj(t)\text{Obj}^{(t)} is the total objective at boosting round tt
  • l(yi,y^i(t1)+ft(xi))l(y_i, \hat{y}_i^{(t-1)} + f_t(x_i)) is the loss between the true label yiy_i and the updated prediction
  • y^i(t1)\hat{y}_i^{(t-1)} is the prediction from the first t1t-1 trees combined
  • ft(xi)f_t(x_i) is the prediction of the new tree for sample ii
  • Ω(ft)\Omega(f_t) is the regularization penalty on the new tree's complexity
  • nn is the total number of training samples

In Plain English: The objective says "total cost equals how wrong we are plus how complicated the new tree is." For our fraud detector, ll measures how far each transaction's predicted fraud probability is from its true label (0 or 1). The term Ω\Omega penalizes overly complex trees that might memorize noise in the training data rather than learning real fraud patterns. Without Ω\Omega, the model would happily grow a thousand-leaf tree that perfectly fits the training set but fails on new transactions.

Taylor Expansion: Second-Order Approximation

Optimizing the raw objective above is expensive for complex loss functions like log-loss. XGBoost sidesteps this by approximating the loss with a second-order Taylor expansion:

Obj(t)i=1n[l(yi,y^i(t1))+gift(xi)+12hift2(xi)]+Ω(ft)\text{Obj}^{(t)} \approx \sum_{i=1}^{n} \Bigl[ l(y_i, \hat{y}_i^{(t-1)}) + g_i \, f_t(x_i) + \tfrac{1}{2} h_i \, f_t^2(x_i) \Bigr] + \Omega(f_t)

Where:

  • gi=l(yi,y^i(t1))y^i(t1)g_i = \frac{\partial \, l(y_i, \hat{y}_i^{(t-1)})}{\partial \, \hat{y}_i^{(t-1)}} is the gradient (first derivative of the loss for sample ii)
  • hi=2l(yi,y^i(t1))(y^i(t1))2h_i = \frac{\partial^2 \, l(y_i, \hat{y}_i^{(t-1)})}{\partial \, (\hat{y}_i^{(t-1)})^2} is the Hessian (second derivative of the loss for sample ii)
  • l(yi,y^i(t1))l(y_i, \hat{y}_i^{(t-1)}) is a constant at step tt (doesn't depend on the new tree)

In Plain English: Instead of solving a complex nonlinear optimization, XGBoost draws a parabola that locally approximates the loss surface near the current prediction. Parabolas have closed-form minima, so XGBoost can compute the optimal leaf weight for each leaf instantly using just gig_i and hih_i. For our fraud detector, gig_i tells the model "this transaction's fraud probability is too low, push it higher" (direction), while hih_i tells it "the loss surface is steeply curved here, so push gently" (step size). Without hih_i, the algorithm would treat all errors the same regardless of curvature, leading to overshooting on some samples and undershooting on others.

XGBoost objective function pipeline from loss calculation through Taylor expansion to optimal tree constructionClick to expandXGBoost objective function pipeline from loss calculation through Taylor expansion to optimal tree construction

The Regularization Term

XGBoost defines tree complexity with an explicit formula:

Ω(f)=γT+12λj=1Twj2\Omega(f) = \gamma \, T + \frac{1}{2} \lambda \sum_{j=1}^{T} w_j^2

Where:

  • Ω(f)\Omega(f) is the complexity penalty for tree ff
  • γ\gamma (gamma) is the minimum loss reduction required to justify a new split
  • TT is the number of leaf nodes in the tree
  • λ\lambda (lambda) is the L2 regularization coefficient on leaf weights
  • wjw_j is the weight (prediction score) assigned to leaf jj

In Plain English: The regularization says "complexity cost equals a penalty per leaf plus a penalty for extreme leaf scores." In our fraud detector, γ\gamma controls pruning: if splitting a node doesn't reduce the overall loss by at least γ\gamma, XGBoost won't make that split. This prevents the tree from creating tiny leaf nodes that only contain one or two fraud cases. Meanwhile, λ\lambda keeps leaf scores moderate so no single leaf can output an extreme probability like 0.999 based on limited evidence. This is similar to the L2 penalty in ridge regression, but applied to tree outputs instead of linear coefficients.

Common Pitfall: Setting γ=0\gamma = 0 and λ=0\lambda = 0 turns off regularization entirely, and your XGBoost model will overfit just as badly as an unconstrained decision tree. Always start with non-zero values. A good starting point is γ=0\gamma = 0 to $1andand\lambda = 1$.

Automatic Missing Value Handling

XGBoost handles missing values natively through a mechanism called sparsity-aware split finding, described in Section 3.4 of the original XGBoost paper. When the algorithm encounters a missing value in a feature column during tree construction, it doesn't impute or skip the sample. Instead, it tries sending the sample down both the left and right branch, measures the gain from each direction, and picks the better one. The winning direction becomes the "default path" for missing values at that node.

This is more than a convenience feature. In fraud detection, missingness often carries signal. A missing "merchant_risk" score might indicate a new, unrated merchant. A missing "card_age" field might mean the card was just issued. XGBoost learns these patterns automatically.

Pro Tip: If your dataset has meaningful missingness (like missing income implying unemployment, or missing IP geolocation implying VPN usage), don't impute before feeding data to XGBoost. Let the algorithm learn the optimal direction for missing values at each split. You can always compare imputed vs. non-imputed approaches on a validation set, but nine times out of ten, XGBoost's native handling wins.

Contrast this with logistic regression, where missing values must be imputed before training. XGBoost's approach is one less preprocessing step and often one more source of predictive signal.

Python Implementation

The math above is dense, but the code is refreshingly short. XGBoost's scikit-learn compatible API means you can go from data to predictions in about ten lines.

We'll build a fraud detection classifier on a synthetic dataset of 5,000 transactions with roughly 10% fraud rate.

Data Preparation

Expected output:

code
Training samples: 4000
Test samples:     1000
Fraud rate:       10.4%
Features:         10

The weights=[0.90, 0.10] parameter gives us a realistic class imbalance: about 90% legitimate transactions and 10% fraud. Real fraud rates are typically even lower (0.1% to 2%), but 10% keeps our demonstration readable without needing extreme techniques for class imbalance.

Training and Evaluation

Expected output:

code
Accuracy:  0.9610
ROC AUC:   0.9561

Classification Report:
              precision    recall  f1-score   support

  Legitimate       0.96      1.00      0.98       896
       Fraud       0.96      0.65      0.78       104

    accuracy                           0.96      1000
   macro avg       0.96      0.83      0.88      1000
weighted avg       0.96      0.96      0.96      1000

Sample predictions (first 5):
  P(fraud) = 0.1099  |  Actual: Legit
  P(fraud) = 0.0714  |  Actual: Legit
  P(fraud) = 0.7162  |  Actual: Fraud
  P(fraud) = 0.0306  |  Actual: Legit
  P(fraud) = 0.0173  |  Actual: Legit

A few things to notice here. The accuracy is 96.1%, which looks great until you consider that a naive classifier that labels everything "legitimate" would score 89.6% (since only 10.4% of transactions are fraud). The more telling metric is recall for the Fraud class: 0.65. That means 35% of actual frauds slip through. In a production fraud system, you'd want to push recall higher using scale_pos_weight, threshold tuning, or cost-sensitive learning.

The ROC AUC of 0.9561 confirms the model's ranking ability is strong. It correctly assigns higher fraud probabilities to actual fraud cases most of the time.

Handling Missing Values in Practice

Expected output:

code
Missing cells injected: 2478 (5.0% of data)
Accuracy with missing values: 0.9500
Accuracy without missing values: 0.9610
Difference: 0.0110

Only a 1.1 percentage point drop in accuracy despite 5% of the entire dataset being replaced with NaN values. XGBoost's sparsity-aware split finding absorbed the missing data without any imputation step. Try doing that with logistic regression.

Feature Importance Visualization

One reason data scientists choose tree-based models over black-box alternatives is interpretability. XGBoost tracks how much each feature contributes to the model's predictions, giving you immediate insight into what's driving classifications.

Expected output:

code
Top 5 features by importance:
  1. device_score: 0.2436
  2. num_declines: 0.1187
  3. distance_home: 0.1104
  4. card_age: 0.1041
  5. avg_amount_30d: 0.1035

The device_score feature dominates at 24.4% importance, meaning it appears in the most impactful tree splits across all 100 boosting rounds. In a real fraud system, this would tell your team that device fingerprinting is the strongest fraud signal and deserves investment in data quality.

Pro Tip: XGBoost offers three importance types: weight (how many times a feature is used in splits), gain (average reduction in loss when the feature is used), and cover (average number of samples affected). Gain is usually the most informative because it measures actual predictive contribution, not just usage frequency. The default feature_importances_ attribute uses gain.

Hyperparameter Tuning with GridSearchCV

XGBoost is sensitive to its hyperparameters. Unlike random forests, which often perform well with defaults, XGBoost can overfit or underfit dramatically if learning_rate, max_depth, and regularization parameters aren't balanced.

Key Hyperparameters

ParameterWhat it controlsTypical rangeEffect on model
learning_rate (eta)Step size for each tree's contribution0.01 to 0.3Lower = more trees needed, but more stable
max_depthMaximum depth of each tree3 to 10Higher = more complex interactions captured
n_estimatorsNumber of boosting rounds (trees)50 to 1000More rounds with lower learning rate
min_child_weightMinimum sum of Hessian (hih_i) in a child node1 to 10Higher = more conservative splits
subsampleFraction of samples used per tree0.5 to 1.0Lower = stochastic boosting, reduces overfitting
colsample_bytreeFraction of features used per tree0.5 to 1.0Similar to max_features in random forests
gamma (γ\gamma)Minimum loss reduction for a split0 to 5Higher = more aggressive pruning
reg_lambda (λ\lambda)L2 regularization on leaf weights0 to 10Higher = smoother predictions

XGBoost hyperparameter tuning decision guide for addressing overfitting and improving accuracyClick to expandXGBoost hyperparameter tuning decision guide for addressing overfitting and improving accuracy

Practical Tuning with GridSearchCV

The following code demonstrates systematic hyperparameter search. Note that GridSearchCV with many combinations can be slow, so this block is display-only.

python
from sklearn.model_selection import GridSearchCV
import xgboost as xgb

param_grid = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [100, 200],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0]
}

xgb_model = xgb.XGBClassifier(
    objective='binary:logistic',
    eval_metric='logloss',
    random_state=42
)

grid_search = GridSearchCV(
    estimator=xgb_model,
    param_grid=param_grid,
    scoring='roc_auc',
    cv=3,
    verbose=1,
    n_jobs=-1
)

grid_search.fit(X_train, y_train)

print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best ROC AUC:    {grid_search.best_score_:.4f}")

# Expected Output (approximate):
# Fitting 3 folds for each of 72 candidates, totalling 216 fits
# Best Parameters: {'colsample_bytree': 0.8, 'learning_rate': 0.1, 'max_depth': 5, ...}
# Best ROC AUC:    0.9620

Common Pitfall: A low learning_rate (like 0.01) requires many more n_estimators (500 to 1000) to converge. If you drop the learning rate without increasing the number of trees, the model won't have enough boosting rounds to reach its potential. Pair learning_rate=0.01 with n_estimators=500 as a starting point.

For large datasets (100K+ rows) or wide parameter grids, GridSearchCV becomes painfully slow. Consider RandomizedSearchCV for a random subset of combinations, or better yet, use Optuna for Bayesian optimization that converges faster with fewer trials.

Tuning Strategy in Practice

Here's the order that works well:

  1. Fix learning_rate=0.1 and tune n_estimators with early stopping
  2. Tune tree structure: max_depth, min_child_weight
  3. Add stochasticity: subsample, colsample_bytree
  4. Tune regularization: gamma, reg_lambda, reg_alpha
  5. Lower learning_rate to 0.01 to 0.05, increase n_estimators proportionally

This staged approach avoids searching a massive grid all at once.

When to Use XGBoost (and When Not To)

XGBoost isn't always the right tool. Knowing when to reach for it and when to pick something else saves real engineering time.

ScenarioBest choiceWhy
Tabular data, < 100K rowsXGBoostSweet spot for accuracy and speed
Tabular data, > 1M rowsLightGBMHistogram-based splits are faster at scale
Heavy categorical featuresCatBoostNative ordered target encoding
Need interpretable modelLogistic Regression or Decision TreeCoefficient weights are easier to explain to regulators
Image or text dataDeep learning (CNN/Transformer)Trees can't learn spatial or sequential structure
Need quick baselineRandom ForestWorks well with zero tuning
Few features, linear relationshipsLogistic RegressionFaster, simpler, and often just as accurate
Very small dataset (< 500 rows)Logistic Regression or SVMXGBoost may overfit without enough data

When XGBoost Shines

  • Mixed feature types (numeric + categorical after encoding)
  • Feature interactions matter (e.g., "high transaction amount" alone isn't fraud, but "high amount + new device + overseas merchant" is)
  • Missing data is common (native handling saves preprocessing effort)
  • You need ranking ability, not just classification (XGBoost supports rank:pairwise and rank:ndcg)

When to Avoid XGBoost

  • Your dataset fits in a spreadsheet — a simple model will perform comparably and be easier to maintain
  • Latency matters at inference — 100 trees must be traversed sequentially; for sub-millisecond latency, a single decision tree or linear model is faster
  • You need to explain every prediction to a non-technical audience — SHAP values help, but "this tree split on feature X at threshold 2.3" is harder to explain than "this coefficient is 0.4"

Production Considerations

Training Speed and Scaling

XGBoost's column-based data layout enables feature-level parallelism during split finding. Each feature's values are sorted once and cached, so subsequent trees reuse the same sorted order. On a modern 8-core machine, training 100 trees on 100K rows with 50 features takes roughly 2 to 5 seconds.

Dataset sizeApproximate training time (100 trees, 10 features)Memory
10K rows< 1 second~50 MB
100K rows2 to 5 seconds~200 MB
1M rows20 to 60 seconds~2 GB
10M rows5 to 15 minutes~15 GB

GPU Acceleration

XGBoost supports GPU training via tree_method='gpu_hist' (renamed to device='cuda' in recent versions). GPU acceleration provides 3x to 10x speedup on large datasets. For our 5,000-row fraud dataset, CPU is fine. For production models on millions of rows, GPU training can cut iteration time from minutes to seconds.

python
# GPU-accelerated XGBoost (requires CUDA-capable GPU)
clf_gpu = xgb.XGBClassifier(
    device='cuda',
    n_estimators=100,
    max_depth=5,
    learning_rate=0.1,
    random_state=42
)

Memory Optimization

XGBoost stores the data matrix in a compressed column format (DMatrix). For very large datasets, you can reduce memory with:

  • max_bin=256 — reduces histogram resolution (default is 256 in histogram mode)
  • Sparse data formats — CSR/CSC matrices use less memory than dense arrays
  • External memory mode — for datasets that don't fit in RAM (pass a file path instead of a matrix)

Inference Speed

Each prediction requires traversing all trees in the ensemble. For 100 trees of depth 5, that's 100 sequential tree lookups per sample. Batch prediction on 1,000 samples takes under 1 ms, but if you need single-sample latency under 100 microseconds, consider reducing n_estimators or using model distillation.

Conclusion

XGBoost earns its dominance on tabular data through a specific set of mathematical choices. The second-order Taylor expansion gives it more precise step sizes than first-order gradient boosting. The explicit regularization term (γT+12λwj2\gamma T + \frac{1}{2}\lambda \sum w_j^2) prevents overfitting at the objective level, not as an afterthought. And the sparsity-aware split finding handles missing values as a feature, not a bug.

For our fraud detection problem, XGBoost delivered 96.1% accuracy and a 0.956 ROC AUC with minimal preprocessing and default-like hyperparameters. That's the algorithm's real strength: getting strong results quickly on structured data, then offering enough tuning knobs to push performance further when needed.

If you're building on these concepts, the natural next steps are exploring gradient boosting to understand the first-order foundation that XGBoost extends, and XGBoost for regression to see how the same framework handles continuous targets. For datasets with heavy categorical features, CatBoost offers an alternative that avoids one-hot encoding entirely.

Start with the defaults, get a baseline, then tune systematically. That's the approach that wins competitions and ships production models.

Frequently Asked Interview Questions

Q: What makes XGBoost different from standard gradient boosting?

XGBoost uses a second-order Taylor expansion of the loss function, incorporating both the gradient (first derivative) and the Hessian (second derivative). This allows more precise step sizes when adding new trees. Additionally, XGBoost includes an explicit regularization term in its objective function that penalizes both the number of leaves and the magnitude of leaf weights, which standard GBMs lack.

Q: Why does XGBoost use both the gradient and the Hessian?

The gradient tells each new tree which direction to push predictions, while the Hessian measures how curved the loss surface is at that point. Where the surface is sharply curved (high Hessian), XGBoost takes smaller steps to avoid overshooting. This adaptive step sizing lets XGBoost converge in fewer boosting rounds than methods that only use first-order gradients.

Q: How does XGBoost handle missing values during training?

XGBoost uses sparsity-aware split finding. At each split point, it tries sending samples with missing values to both the left and right child, then picks the direction that yields better loss reduction. This learned default direction means XGBoost can treat missingness as a signal rather than a problem to impute away.

Q: Your XGBoost model has high training accuracy but poor test accuracy. What do you do?

This is overfitting. Reduce max_depth (try 3 to 5 instead of the default 6), increase min_child_weight to require more samples per leaf, add regularization via reg_lambda (L2) or gamma (minimum split gain), and introduce stochasticity with subsample=0.8 and colsample_bytree=0.8. You can also lower learning_rate and increase n_estimators with early stopping.

Q: When would you choose LightGBM or CatBoost over XGBoost?

LightGBM is faster on large datasets (1M+ rows) because its leaf-wise growth strategy and histogram-based split finding scale better. CatBoost handles categorical features natively through ordered target statistics, eliminating the need for one-hot encoding. XGBoost remains the strongest choice for medium-sized tabular data where you want fine-grained control over regularization.

Q: Explain the role of scale_pos_weight in imbalanced classification.

scale_pos_weight multiplies the loss contribution of positive-class samples. Setting it to the ratio of negative to positive examples (e.g., 9.0 for 10% positive rate) makes the model penalize missed positives more heavily. This typically improves recall at the cost of precision. It's equivalent to oversampling the minority class without actually duplicating data.

Q: How does XGBoost's regularization compare to L1/L2 regularization in linear models?

XGBoost's λ\lambda parameter applies L2 regularization to leaf weights (the prediction values at each terminal node), not to input feature coefficients. The γ\gamma parameter has no direct analog in linear models. It requires a minimum loss reduction for every split, effectively acting as a pre-pruning threshold. Together, they control model complexity from two complementary angles: tree structure (via γ\gamma) and prediction magnitude (via λ\lambda).

Hands-On Practice

While theoretical knowledge of XGBoost's second-order derivatives and hardware optimization is crucial, true mastery comes from applying it to detect subtle patterns in real-world data. You'll build a production-grade anomaly detection system using XGBoost to classify sensor failures, using the algorithm's unique ability to handle tabular data with high precision. We will use the Sensor Anomalies dataset, which provides a realistic scenario of identifying critical failures (is_anomaly) based on continuous sensor readings and device identifiers, perfectly demonstrating XGBoost's power in handling imbalanced classification tasks.

Dataset: Sensor Anomalies (Detection) Sensor readings with 5% labeled anomalies (extreme values). Clear separation between normal and anomalous data. Precision ≈ 94% with Isolation Forest.

Now that you have a working baseline, experiment by adjusting the scale_pos_weight parameter; try removing it to see how drastically the recall for anomalies drops (likely resulting in missed failures). You can also tune max_depth (try 3 vs. 10) to observe the trade-off between model complexity and overfitting on this noisy sensor data. Finally, try introducing subsample=0.8 to the classifier to enable stochastic gradient boosting, which often improves generalization on unseen data.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Ad Tech problems
Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths