Multi-Step Time Series Forecasting: Recursive vs Direct

Predicting what happens tomorrow is useful, but predicting what happens next week, next month, or next quarter is where the real business value lies. Supply chain managers don't order inventory one day at a time; they plan weeks in advance. Energy grids need 24-hour horizon forecasts, not just the next hour.

This is the domain of Multi-Step Forecasting.

While single-step forecasting is the default for most tutorials, extending it to multiple future time steps ( $h > 1$ ) introduces complex trade-offs between error accumulation and model complexity. Should you reuse one model iteratively? Build ten different models for ten days? Or use a complex neural network that predicts everything at once?

This guide breaks down the mathematical engines, Python implementations, and strategic trade-offs of the three dominant multi-step forecasting architectures.

What is the multi-step forecasting problem?

The multi-step forecasting problem involves predicting a sequence of future values $[y_{t+1}, y_{t+2}, \dots, y_{t+h}]$ given a historical sequence $[y_{t}, y_{t-1}, \dots, y_{t-n}]$ . Unlike single-step forecasting which outputs a scalar, multi-step forecasting must account for the dependencies between future time steps and the propagation of errors over the forecast horizon.

In a standard single-step regression setup, we map input features $X$ to a target $y$ :

$y_{t+1} = f(y_t, y_{t-1}, \dots, y_{t-n}) + \epsilon$

In multi-step settings, we need to solve for a horizon $H$ . There are four primary strategies to handle this, each with distinct mathematical properties:

Recursive Strategy (Iterative)
Direct Strategy (Independent)
Multi-Output Strategy (Vector)
Hybrid Strategies (Direct-Recursive)

💡 Pro Tip: Before attempting multi-step forecasting, ensure your series is stationary or properly differenced. Trends and seasonality wreaks havoc on long horizons. We cover this in depth in Time Series Forecasting: Mastering Trends, Seasonality, and Stationarity.

How does the Recursive Strategy work?

The Recursive (or Iterative) strategy trains a single model $f$ to predict one step ahead. To forecast multiple steps, we feed the model's prediction back into itself as an input for the next step.

The Algorithm

Train model $f$ to predict $y_{t+1}$ based on history.
Predict $\hat{y}_{t+1}$ .
Append $\hat{y}_{t+1}$ to the history.
Use the updated history to predict $\hat{y}_{t+2}$ .
Repeat $H$ times.

The Math

For a horizon $h=2$ , the forecast looks like this:

$\hat{y}_{t+1} = f(y_t, y_{t-1}, \dots)$ $\hat{y}_{t+2} = f(\hat{y}_{t+1}, y_t, \dots)$

In Plain English: The recursive strategy is like driving a car in heavy fog. You predict where the road is one second ahead, drive there, and then predict the next second based on your new position. If your first prediction is slightly off, you are now starting the second prediction from the wrong spot.

Python Implementation (Recursive)

We can build a simple recursive forecaster using XGBoost.

python

import numpy as np
import pandas as pd
from xgboost import XGBRegressor
from sklearn.metrics import mean_squared_error

# Generate synthetic sine wave data
t = np.linspace(0, 50, 500)
data = np.sin(t) + np.random.normal(0, 0.1, 500)

# Create lag features (Window size = 10)
def create_lags(data, window_size):
    X, y = [], []
    for i in range(len(data) - window_size):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size])
    return np.array(X), np.array(y)

window_size = 10
X, y = create_lags(data, window_size)

# Split Train/Test
train_size = int(len(X) * 0.8)
X_train, y_train = X[:train_size], y[:train_size]
X_test, y_test = X[train_size:], y[train_size:]

# Train the ONE-STEP model
model = XGBRegressor(n_estimators=100, objective='reg:squarederror')
model.fit(X_train, y_train)

# Recursive Forecast Logic
def recursive_forecast(model, initial_sequence, horizon):
    current_sequence = initial_sequence.copy()
    predictions = []
    
    for _ in range(horizon):
        # Reshape for prediction (1, window_size)
        pred_input = current_sequence[-window_size:].reshape(1, -1)
        pred = model.predict(pred_input)[0]
        
        predictions.append(pred)
        # Append prediction to sequence to use for next step
        current_sequence = np.append(current_sequence, pred)
        
    return np.array(predictions)

# Test on the first sample of the test set
initial_seq = X_test[0] # The last known real values
horizon = 20
forecasts = recursive_forecast(model, initial_seq, horizon)

print(f"First 5 recursive predictions: {forecasts[:5]}")
# Output will show the predicted values drifting over time

The Fatal Flaw: Error Propagation

The defining characteristic of the recursive strategy is Error Accumulation. Since $\hat{y}_{t+2}$ depends on $\hat{y}_{t+1}$ , any error in the first step is carried over and amplified.

If $\epsilon$ is the error at step 1: $\hat{y}_{t+2} = f(y_{t+1} + \epsilon, \dots)$

This often causes recursive forecasts to degrade rapidly over long horizons, eventually converging to the mean or a flat line.

How does the Direct Strategy differ?

The Direct strategy handles the horizon problem by training separate models for each specific time step. If you need to forecast 7 days out, you train 7 distinct models.

The Algorithm

Model $f_1$ learns to predict $y_{t+1}$ using $[y_t, y_{t-1}, \dots]$ .
Model $f_2$ learns to predict $y_{t+2}$ using $[y_t, y_{t-1}, \dots]$ (skipping $y_{t+1}$ ).
Model $f_H$ learns to predict $y_{t+H}$ using $[y_t, y_{t-1}, \dots]$ .

The Math

$\hat{y}_{t+h} = f_h(y_t, y_{t-1}, \dots)$

In Plain English: The direct strategy is like hiring a team of specialists. One person is an expert at predicting tomorrow's weather. Another person is an expert at predicting the weather specifically for next Tuesday, using only today's data. They don't talk to each other; they just give you their independent answers.

Python Implementation (Direct)

Here, we utilize MultiOutputRegressor from Scikit-Learn, which essentially wraps the Direct strategy logic (training one regressor per target).

python

from sklearn.multioutput import MultiOutputRegressor

# Prepare data for Direct Strategy
# We need y to be a matrix of shape (samples, horizon)
def create_direct_xy(data, window_size, horizon):
    X, y = [], []
    for i in range(len(data) - window_size - horizon + 1):
        X.append(data[i:i+window_size])
        y.append(data[i+window_size : i+window_size+horizon])
    return np.array(X), np.array(y)

horizon = 20
X_direct, y_direct = create_direct_xy(data, window_size, horizon)

# Split
train_size = int(len(X_direct) * 0.8)
X_train_dir, y_train_dir = X_direct[:train_size], y_direct[:train_size]
X_test_dir, y_test_dir = X_direct[train_size:], y_direct[train_size:]

# Train 20 separate models (one per step)
direct_model = MultiOutputRegressor(XGBRegressor(n_estimators=100))
direct_model.fit(X_train_dir, y_train_dir)

# Predict
# No loop needed during inference, models run in parallel
predictions_direct = direct_model.predict(X_test_dir[0].reshape(1, -1))

print(f"Direct predictions shape: {predictions_direct.shape}")
# Output: (1, 20)

Pros and Cons

Pros: No error propagation! The prediction for step 10 does not depend on the potentially wrong prediction for step 9.
Cons: It ignores dependencies between future steps (e.g., if tomorrow is hot, the day after is likely hot). It is also computationally expensive (training $H$ models) and can have higher variance because $f_H$ tries to predict far into the future using old data.

What is the Multi-Output Strategy?

The Multi-Output (or Vector Output) strategy uses a single model that outputs the entire forecast sequence vector $[y_{t+1}, \dots, y_{t+H}]$ simultaneously. This is commonly seen in Deep Learning (LSTMs, Transformers) but can also be done with algorithms like K-Nearest Neighbors.

The Math

$[\hat{y}_{t+1}, \dots, \hat{y}_{t+H}] = f(y_t, y_{t-1}, \dots)$

This looks similar to the Direct strategy in terms of inputs, but the internal weights are shared. The model learns the correlations between $y_{t+1}$ and $y_{t+2}$ during training.

In Plain English: This is the "shotgun" approach. Instead of firing one bullet at a time (Recursive) or using 10 different guns (Direct), you use one model that fires a spread of 10 predictions at once. The model understands that the pellets (predictions) should travel together in a certain shape.

We covered this architecture extensively in Mastering LSTMs for Time Series: When Deep Learning Beats Statistics, where the final dense layer has $H$ neurons.

When to use Multi-Output?

This strategy is ideal when there are strong dependencies between the future time steps. For example, in temperature forecasting, the curve of the temperature throughout the day follows a physics-based shape. A multi-output model can learn this "shape," whereas a Direct strategy might predict a jagged, unrealistic line.

Which strategy should you choose?

The choice depends on your data volume, forecast horizon, and the "cost" of error accumulation versus model variance.

Feature	Recursive	Direct	Multi-Output
Model Count	1 Model	$H$ Models	1 Model
Error Pattern	Accumulates over time (bias)	Independent (variance)	Balanced
Dependencies	Captures sequential structure	Ignores forecast dependencies	Captures structure via shared weights
Computation	Fast training, slow inference	Slow training, fast inference	Fast training, fast inference
Best For	Short horizons, simple patterns	Long horizons, complex seasonality	Neural Networks, structured outputs

🔑 Key Insight: A common "Pro" move is the Direct-Recursive Hybrid. You train separate models for different horizons (Direct), but include the predictions of shorter horizons as inputs for the longer horizons (Recursive). This reduces variance while keeping some dependency structure.

Common Pitfalls in Multi-Step Forecasting

1. The "Flat Line" Forecast

Newcomers often find that their recursive forecasts (especially with ARIMA or simple RNNs) quickly converge to a straight line or the mean of the data after a few steps.

Why it happens: This is usually due to "mean reversion" in stationary models. If your model doesn't explicitly capture trend or seasonality (or if the window size is too short to "see" the cycle), the safest statistical guess for $t+\infty$ is the dataset's average.

The Fix:

Ensure seasonality is modeled explicitly (e.g., using Facebook Prophet).
Use the Direct strategy, which forces the model to learn the specific value for $t+10$ rather than iterating its way there.

2. Leaking Future Information

In the Direct strategy, creating the $X$ and $y$ matrices can be tricky. A common bug is accidentally including data from $t+1$ in the input features for the model predicting $t+2$ .

The Check: Always verify your timestamps. For a model predicting $y_{t+h}$ , the latest allowed input is $y_t$ .

3. Improper Validation Scheme

Using standard K-Fold Cross-Validation breaks the temporal order of time series. Even standard TimeSeriesSplit is insufficient for multi-step if you don't account for the "gap."

If you are predicting 7 days out, your validation folds must be separated by at least 7 days, or your model will "peek" at the ground truth of adjacent days which are highly correlated.

Conclusion

Multi-step forecasting is not merely repeating a single-step prediction loop; it is a structural decision that defines how your model handles uncertainty over time.

Use Recursive strategies (like ARIMA) when the horizon is short and the underlying physics of the system are stable.
Use Direct strategies (often with XGBoost or LightGBM) when you have plenty of data and need to avoid the noise amplification of recursive loops.
Use Multi-Output (like LSTMs or Transformers) when the shape of the future sequence matters as much as the individual points.

The "best" strategy is rarely obvious without experimentation. Start with a Recursive baseline—it's the easiest to implement. If accuracy degrades too fast over the horizon, switch to the Direct strategy to stabilize those long-range predictions.

To deepen your understanding of the models that power these strategies, explore our guide on Gradient Boosting for direct forecasting, or Exponential Smoothing for a classical recursive approach.

Hands-On Practice

Multi-step time series forecasting is a critical skill for real-world applications where planning horizons extend beyond a single day. In this tutorial, we will move beyond simple next-day predictions and implement the two dominant strategies for predicting sequences: the Recursive Strategy (iterative) and the Direct Strategy (independent models). Using a realistic retail sales dataset, you will build forecasting engines that can predict sales 14 days into the future, learning to balance the trade-offs between error accumulation and model complexity.

Dataset: Retail Sales (Time Series) 3 years of daily retail sales data with clear trend, weekly/yearly seasonality, and related features. Includes sales, visitors, marketing spend, and temperature. Perfect for ARIMA, Exponential Smoothing, and Time Series Forecasting.

Try It Yourself

Retail Time Series

Loading editor...

0/50 runs(Ctrl+Enter)

Retail Time Series: Daily retail sales with trend and seasonality

In this tutorial, you implemented both Recursive and Direct forecasting strategies. You likely observed that the Recursive strategy follows the trend but may drift over time as errors compound, while the Direct strategy often captures specific future points better but requires maintaining multiple models. Experiment by changing the HORIZON variable to 30 days to see how drastically the recursive error accumulation degrades performance compared to the direct method.

Multi-Step Time Series Forecasting: Recursive, Direct, and Hybrid Strategies

What is the multi-step forecasting problem?

How does the Recursive Strategy work?

The Algorithm

The Math

Python Implementation (Recursive)

The Fatal Flaw: Error Propagation

How does the Direct Strategy differ?

The Algorithm

The Math

Python Implementation (Direct)

Pros and Cons

What is the Multi-Output Strategy?

The Math

When to use Multi-Output?

Which strategy should you choose?

Common Pitfalls in Multi-Step Forecasting

1. The "Flat Line" Forecast

2. Leaking Future Information

3. Improper Validation Scheme

Conclusion

Hands-On Practice

Try It Yourself

Related Articles

Open Source vs Closed LLMs: Choosing the Right Model in 2026

Structured Outputs: Making LLMs Return Reliable JSON

Related Articles

Open Source vs Closed LLMs: Choosing the Right Model in 2026

Structured Outputs: Making LLMs Return Reliable JSON