Predicting what happens tomorrow is useful, but predicting what happens next week, next month, or next quarter is where the real business value lies. Supply chain managers don't order inventory one day at a time; they plan weeks in advance. Energy grids need 24-hour horizon forecasts, not just the next hour.
This is the domain of Multi-Step Forecasting.
While single-step forecasting is the default for most tutorials, extending it to multiple future time steps () introduces complex trade-offs between error accumulation and model complexity. Should you reuse one model iteratively? Build ten different models for ten days? Or use a complex neural network that predicts everything at once?
This guide breaks down the mathematical engines, Python implementations, and strategic trade-offs of the three dominant multi-step forecasting architectures.
What is the multi-step forecasting problem?
The multi-step forecasting problem involves predicting a sequence of future values given a historical sequence . Unlike single-step forecasting which outputs a scalar, multi-step forecasting must account for the dependencies between future time steps and the propagation of errors over the forecast horizon.
In a standard single-step regression setup, we map input features to a target :
In multi-step settings, we need to solve for a horizon . There are four primary strategies to handle this, each with distinct mathematical properties:
- Recursive Strategy (Iterative)
- Direct Strategy (Independent)
- Multi-Output Strategy (Vector)
- Hybrid Strategies (Direct-Recursive)
💡 Pro Tip: Before attempting multi-step forecasting, ensure your series is stationary or properly differenced. Trends and seasonality wreaks havoc on long horizons. We cover this in depth in Time Series Forecasting: Mastering Trends, Seasonality, and Stationarity.
How does the Recursive Strategy work?
The Recursive (or Iterative) strategy trains a single model to predict one step ahead. To forecast multiple steps, we feed the model's prediction back into itself as an input for the next step.
The Algorithm
- Train model to predict based on history.
- Predict .
- Append to the history.
- Use the updated history to predict .
- Repeat times.
The Math
For a horizon , the forecast looks like this:
In Plain English: The recursive strategy is like driving a car in heavy fog. You predict where the road is one second ahead, drive there, and then predict the next second based on your new position. If your first prediction is slightly off, you are now starting the second prediction from the wrong spot.
Python Implementation (Recursive)
We can build a simple recursive forecaster using XGBoost.
import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error
# Generate synthetic sine wave data
t = np.linspace(0, 50, 500)
data = np.sin(t) + np.random.normal(0, 0.1, 500)
# Create lag features (Window size = 10)
def create_lags(data, window_size):
X, y = [], []
for i in range(len(data) - window_size):
X.append(data[i:i+window_size])
y.append(data[i+window_size])
return np.array(X), np.array(y)
window_size = 10
X, y = create_lags(data, window_size)
# Split Train/Test
train_size = int(len(X) * 0.8)
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]
# Train the ONE-STEP model
model = XGBRegressor(n_estimators=100, objective='reg:squarederror')
model.fit(X_train, y_train)
# Recursive Forecast Logic
def recursive_forecast(model, initial_sequence, horizon):
current_sequence = initial_sequence.copy()
predictions = []
for _ in range(horizon):
# Reshape for prediction (1, window_size)
pred_input = current_sequence[-window_size:].reshape(1, -1)
pred = model.predict(pred_input)[0]
predictions.append(pred)
# Append prediction to sequence to use for next step
current_sequence = np.append(current_sequence, pred)
return np.array(predictions)
# Test on the first sample of the test set
initial_seq = X_test[0] # The last known real values
horizon = 20
forecasts = recursive_forecast(model, initial_seq, horizon)
print(f"First 5 recursive predictions: {forecasts[:5]}")
# Output will show the predicted values drifting over time
The Fatal Flaw: Error Propagation
The defining characteristic of the recursive strategy is Error Accumulation. Since depends on , any error in the first step is carried over and amplified.
If is the error at step 1:
This often causes recursive forecasts to degrade rapidly over long horizons, eventually converging to the mean or a flat line.
How does the Direct Strategy differ?
The Direct strategy handles the horizon problem by training separate models for each specific time step. If you need to forecast 7 days out, you train 7 distinct models.
The Algorithm
- Model learns to predict using .
- Model learns to predict using (skipping ).
- Model learns to predict using .
The Math
In Plain English: The direct strategy is like hiring a team of specialists. One person is an expert at predicting tomorrow's weather. Another person is an expert at predicting the weather specifically for next Tuesday, using only today's data. They don't talk to each other; they just give you their independent answers.
Python Implementation (Direct)
Here, we utilize MultiOutputRegressor from Scikit-Learn, which essentially wraps the Direct strategy logic (training one regressor per target).
from sklearn.multioutput import MultiOutputRegressor
# Prepare data for Direct Strategy
# We need y to be a matrix of shape (samples, horizon)
def create_direct_xy(data, window_size, horizon):
X, y = [], []
for i in range(len(data) - window_size - horizon + 1):
X.append(data[i:i+window_size])
y.append(data[i+window_size : i+window_size+horizon])
return np.array(X), np.array(y)
horizon = 20
X_direct, y_direct = create_direct_xy(data, window_size, horizon)
# Split
train_size = int(len(X_direct) * 0.8)
X_train_dir, y_train_dir = X_direct[:train_size], y_direct[:train_size]
X_test_dir, y_test_dir = X_direct[train_size:], y_direct[train_size:]
# Train 20 separate models (one per step)
direct_model = MultiOutputRegressor(XGBRegressor(n_estimators=100))
direct_model.fit(X_train_dir, y_train_dir)
# Predict
# No loop needed during inference, models run in parallel
predictions_direct = direct_model.predict(X_test_dir[0].reshape(1, -1))
print(f"Direct predictions shape: {predictions_direct.shape}")
# Output: (1, 20)
Pros and Cons
- Pros: No error propagation! The prediction for step 10 does not depend on the potentially wrong prediction for step 9.
- Cons: It ignores dependencies between future steps (e.g., if tomorrow is hot, the day after is likely hot). It is also computationally expensive (training models) and can have higher variance because tries to predict far into the future using old data.
What is the Multi-Output Strategy?
The Multi-Output (or Vector Output) strategy uses a single model that outputs the entire forecast sequence vector simultaneously. This is commonly seen in Deep Learning (LSTMs, Transformers) but can also be done with algorithms like K-Nearest Neighbors.
The Math
This looks similar to the Direct strategy in terms of inputs, but the internal weights are shared. The model learns the correlations between and during training.
In Plain English: This is the "shotgun" approach. Instead of firing one bullet at a time (Recursive) or using 10 different guns (Direct), you use one model that fires a spread of 10 predictions at once. The model understands that the pellets (predictions) should travel together in a certain shape.
We covered this architecture extensively in Mastering LSTMs for Time Series: When Deep Learning Beats Statistics, where the final dense layer has neurons.
When to use Multi-Output?
This strategy is ideal when there are strong dependencies between the future time steps. For example, in temperature forecasting, the curve of the temperature throughout the day follows a physics-based shape. A multi-output model can learn this "shape," whereas a Direct strategy might predict a jagged, unrealistic line.
Which strategy should you choose?
The choice depends on your data volume, forecast horizon, and the "cost" of error accumulation versus model variance.
| Feature | Recursive | Direct | Multi-Output |
|---|---|---|---|
| Model Count | 1 Model | Models | 1 Model |
| Error Pattern | Accumulates over time (bias) | Independent (variance) | Balanced |
| Dependencies | Captures sequential structure | Ignores forecast dependencies | Captures structure via shared weights |
| Computation | Fast training, slow inference | Slow training, fast inference | Fast training, fast inference |
| Best For | Short horizons, simple patterns | Long horizons, complex seasonality | Neural Networks, structured outputs |
🔑 Key Insight: A common "Pro" move is the Direct-Recursive Hybrid. You train separate models for different horizons (Direct), but include the predictions of shorter horizons as inputs for the longer horizons (Recursive). This reduces variance while keeping some dependency structure.
Common Pitfalls in Multi-Step Forecasting
1. The "Flat Line" Forecast
Newcomers often find that their recursive forecasts (especially with ARIMA or simple RNNs) quickly converge to a straight line or the mean of the data after a few steps.
Why it happens: This is usually due to "mean reversion" in stationary models. If your model doesn't explicitly capture trend or seasonality (or if the window size is too short to "see" the cycle), the safest statistical guess for is the dataset's average.
The Fix:
- Ensure seasonality is modeled explicitly (e.g., using Facebook Prophet).
- Use the Direct strategy, which forces the model to learn the specific value for rather than iterating its way there.
2. Leaking Future Information
In the Direct strategy, creating the and matrices can be tricky. A common bug is accidentally including data from in the input features for the model predicting .
The Check: Always verify your timestamps. For a model predicting , the latest allowed input is .
3. Improper Validation Scheme
Using standard K-Fold Cross-Validation breaks the temporal order of time series. Even standard TimeSeriesSplit is insufficient for multi-step if you don't account for the "gap."
If you are predicting 7 days out, your validation folds must be separated by at least 7 days, or your model will "peek" at the ground truth of adjacent days which are highly correlated.
Conclusion
Multi-step forecasting is not merely repeating a single-step prediction loop; it is a structural decision that defines how your model handles uncertainty over time.
- Use Recursive strategies (like ARIMA) when the horizon is short and the underlying physics of the system are stable.
- Use Direct strategies (often with XGBoost or LightGBM) when you have plenty of data and need to avoid the noise amplification of recursive loops.
- Use Multi-Output (like LSTMs or Transformers) when the shape of the future sequence matters as much as the individual points.
The "best" strategy is rarely obvious without experimentation. Start with a Recursive baseline—it's the easiest to implement. If accuracy degrades too fast over the horizon, switch to the Direct strategy to stabilize those long-range predictions.
To deepen your understanding of the models that power these strategies, explore our guide on Gradient Boosting for direct forecasting, or Exponential Smoothing for a classical recursive approach.
Hands-On Practice
Multi-step time series forecasting is a critical skill for real-world applications where planning horizons extend beyond a single day. In this tutorial, we will move beyond simple next-day predictions and implement the two dominant strategies for predicting sequences: the Recursive Strategy (iterative) and the Direct Strategy (independent models). Using a realistic retail sales dataset, you will build forecasting engines that can predict sales 14 days into the future, learning to balance the trade-offs between error accumulation and model complexity.
Dataset: Retail Sales (Time Series) 3 years of daily retail sales data with clear trend, weekly/yearly seasonality, and related features. Includes sales, visitors, marketing spend, and temperature. Perfect for ARIMA, Exponential Smoothing, and Time Series Forecasting.
Try It Yourself
Retail Time Series: Daily retail sales with trend and seasonality
In this tutorial, you implemented both Recursive and Direct forecasting strategies. You likely observed that the Recursive strategy follows the trend but may drift over time as errors compound, while the Direct strategy often captures specific future points better but requires maintaining multiple models. Experiment by changing the HORIZON variable to 30 days to see how drastically the recursive error accumulation degrades performance compared to the direct method.