Imagine you run a retail chain. The CEO wants a global sales forecast for next year. The regional managers need forecasts for their territories. The store managers need forecasts for individual products on their shelves.
You have a choice. You could forecast the total sales and split them down (Top-Down). You could forecast every single product and add them up (Bottom-Up). Or you could do both.
Here is the problem: They will never match.
The sum of your product forecasts will virtually never equal your independent global forecast. This mismatch—called incoherency—is a logistics nightmare. It means your supply chain orders (bottom-level) won't match your financial budget (top-level).
Hierarchical Time Series (HTS) forecasting solves this. It isn't just about making predictions; it is about mathematically reconciling those predictions so the numbers add up at every level of the business, from the warehouse floor to the boardroom.
What defines a hierarchical time series?
A hierarchical time series consists of multiple time series arranged in a tree-like structure where lower-level series aggregate to form higher-level series. The "root" represents the total (e.g., Global Sales), which splits into branches (e.g., Regions), which split further into leaves (e.g., Individual Stores). The defining constraint is that the value of a parent node at any time must equal the sum of its children.
The Structure of the Hierarchy
To understand how algorithms handle this, we have to look at the Summing Matrix ().
Imagine a simple hierarchy:
- Total (Top)
- Groups A and B (Middle)
- Bottom Series AA, AB, BA, BB (Bottom)
The mathematical relationship isn't a vague concept; it's a hard linear constraint. We can express the entire system as:
Where:
- is a vector containing all series (Total, A, B, AA, AB, BA, BB).
- is a vector containing only the bottom-level series (AA, AB, BA, BB).
- is a matrix of 1s and 0s that tells us how to sum the bottom series to get the rest.
In Plain English: The Summing Matrix () is like a recipe card. It tells the math model: "To get the number for Region A, take the sales from Store AA and Store AB and just add them together." It formally defines the family tree of your data so the algorithm knows which pieces belong to which puzzle.
Why do independent forecasts fail to add up?
Independent forecasts fail to add up because they are generated separately, meaning each model minimizes its own error without regard for the aggregation constraints (coherency). The error variance at the bottom level often cancels out when aggregated, while the top-level forecast smooths over details. Consequently, the sum of the bottom forecasts () will almost always differ from the direct top-level forecast ().
The Coherency Problem
When you forecast each series independently (known as the Base Forecasts), you get a set of predictions we call .
Because these predictions are probabilistic estimates, they contain error.
If your financial team uses for budgeting, but your operations team uses and for inventory, you have a conflict.
We need a way to adjust these base forecasts so they become coherent forecasts (denoted as ).
How does the Bottom-Up approach work?
The Bottom-Up approach forecasts only the lowest level of the hierarchy (the "leaves") and sums these predictions to generate forecasts for all higher levels. This method captures the nuances and dynamics of individual fine-grained series but can suffer from high noise (variance) at the bottom levels, which accumulates when aggregated to the top.
The Mechanic
- Forecast every single SKU/Store combination (the vector).
- Multiply by the summing matrix to get the higher levels.
Pros:
- No information is lost. If Store A is exploding while Store B is dying, Bottom-Up sees it immediately.
- Guaranteed coherency (by definition).
Cons:
- Signal-to-Noise Ratio: Bottom-level data is often messy, sparse, and noisy. Forecasting "Toothbrush sales in Store #102 on Tuesday" is much harder than forecasting "Global Toothbrush Sales."
- Error Accumulation: When you sum up thousands of noisy forecasts, the errors can sometimes compound rather than cancel out.
How does the Top-Down approach work?
The Top-Down approach forecasts only the highest level of the hierarchy (the root) and disaggregates this total down to the lower levels using historical proportions (e.g., average historical sales share). This method produces a stable, reliable aggregate forecast but often fails to capture local trends or shifts in the lower-level series proportions over time.
The Mechanic
- Forecast the Total.
- Calculate proportions for each bottom series (e.g., "Store A historically contributes 20% of sales").
- Distribute the total forecast down.
Pros:
- Stability: Aggregated data is usually smoother and easier to forecast. The Law of Large Numbers works in your favor.
- Simplicity: You only build one reliable model.
Cons:
- Loss of Detail: If a specific region is starting to trend differently than the historical average, Top-Down will miss it completely. It assumes historical proportions () are constant or slowly changing.
⚠️ Common Pitfall: Do not use Top-Down if your hierarchy is dynamic. If a new store opens or an old one closes, the historical proportions break immediately.
What is Optimal Reconciliation (MinT)?
Optimal Reconciliation, often referred to as MinT (Minimum Trace), is a statistical method that combines forecasts from all levels of the hierarchy to generate a new set of coherent forecasts that minimizes the total forecast error variance. It uses a weighting matrix based on the correlation structure of the forecast errors to mathematically decide how much to "trust" the top-level vs. bottom-level signals.
This is the gold standard. Instead of choosing Bottom-Up OR Top-Down, we forecast all levels independently and then use linear algebra to reconcile them.
The Reconciliation Formula
We want to find a "Reconciliation Matrix" () that maps our independent base forecasts () to the optimal bottom-level forecasts ().
The genius of the MinT (Minimum Trace) algorithm lies in how it calculates . It looks at the covariance matrix of the forecast errors ().
In Plain English: Think of this as a "weighted vote" based on reliability.
The math asks: "Who usually makes the biggest mistakes?"
If the Top-level forecast typically has huge errors, the matrix lowers its weight. If the Bottom-level forecasts are noisy, they get down-weighted. The algorithm mathematically finds the "sweet spot" that adjusts all forecasts (Total, Regional, and Store) so that they sum up perfectly AND the overall system error is as low as possible.
It's like a CFO reconciling the budget: "I trust the Regional Manager of the West Coast (Middle level) more than the chaotic store reports (Bottom) or the detached HQ estimate (Top), so I will align the numbers closer to her prediction."
Implementation with Python and Nixtla
Writing the matrix algebra for MinT from scratch is prone to error. Fortunately, the hierarchicalforecast library from Nixtla (the creators of statsforecast) has made this production-ready.
We will use a standard dataset structure where we have columns for our hierarchy identifiers (e.g., Country, State).
Step 1: Install and Setup
pip install hierarchicalforecast statsforecast datasetsforecast
Step 2: Preparing the Data
We need a DataFrame with a hierarchical index. Let's assume a structure of Total -> Country -> State.
import pandas as pd
import numpy as np
from statsforecast.core import StatsForecast
from statsforecast.models import AutoARIMA, Naive
from hierarchicalforecast.core import HierarchicalReconciliation
from hierarchicalforecast.methods import BottomUp, TopDown, MinTrace
# Create dummy hierarchical data
# Dates
dates = pd.date_range(start='2023-01-01', periods=12, freq='M')
# Hierarchy: 2 Countries, 2 States per Country
data = []
for country in ['US', 'UK']:
for state in ['A', 'B']:
# Create a synthetic trend + noise
values = np.arange(12) + np.random.normal(0, 1, 12)
for d, v in zip(dates, values):
data.append([country, state, d, v])
df = pd.DataFrame(data, columns=['Country', 'State', 'ds', 'y'])
# We must aggregate the data to create the hierarchy (Total, Country, State)
from hierarchicalforecast.utils import aggregate
# Define the hierarchical structure
# Level 1: Country
# Level 2: Country + State
hierarchy_levels = [['Country'], ['Country', 'State']]
# This utility sums up the data to create the 'Total' and 'Country' rows
Y_df, S_df, tags = aggregate(df, hierarchy_levels)
print("Summing Matrix (S) shape:", S_df.shape)
print(Y_df.head())
Step 3: Generating Base Forecasts
First, we generate independent forecasts for every series in the hierarchy (Total, Countries, and States) using StatsForecast. This creates the incoherent .
# Define base models (independent forecasting)
fcst = StatsForecast(
df=Y_df,
models=[AutoARIMA(season_length=12), Naive()],
freq='M',
n_jobs=-1
)
# Forecast future steps (Horizon = 4 months)
Y_hat_df = fcst.forecast(h=4)
print("Base Forecasts (Likely Incoherent):")
print(Y_hat_df.head())
Step 4: Reconciling with MinT
Now we apply the reconciliation matrix to force the numbers to align optimally.
# Initialize the reconciliation engine
reconcilers = [
BottomUp(),
TopDown(method='forecast_proportions'),
MinTrace(method='mint_shrink') # The Optimal MinT method
]
hrec = HierarchicalReconciliation(reconcilers=reconcilers)
# Reconcile the base forecasts
Y_rec_df = hrec.reconcile(Y_hat_df=Y_hat_df, Y_df=Y_df, S=S_df, tags=tags)
print("Reconciled Forecasts (Coherent):")
print(Y_rec_df.head())
Output Explanation:
The Y_rec_df now contains columns like AutoARIMA/BottomUp, AutoARIMA/TopDown, and AutoARIMA/MinTrace_mint_shrink.
- BottomUp: Will exactly match the sum of the lowest levels.
- MinTrace: Will match the sums, but the values will be adjusted to minimize variance based on the structure of the errors.
Grouped vs. Hierarchical: What is the difference?
While often used interchangeably, strictly hierarchical time series differ from grouped time series based on the uniqueness of the disaggregation. A hierarchy has a unique "parent" for every node (e.g., a City belongs to only one State). Grouped time series involve attributes that can cross-cut (e.g., Sales by "Color" vs. Sales by "Size"), where the aggregation path is not unique.
- Strict Hierarchy: Geography is the classic example. A store is in a specific city, which is in a specific state. The tree structure is rigid.
- Grouped Time Series: Attributes are the example. You can sum sales by "Red Products" or by "XL Products." You can view the total as
Total -> Red -> XLorTotal -> XL -> Red.
Modern reconciliation methods like MinT work for both structures perfectly well, provided the Summing Matrix () is constructed to reflect all possible aggregations.
Conclusion
Forecasting at scale is rarely about predicting a single number; it is about predicting a coherent system of numbers. While simple methods like Bottom-Up preserve detail and Top-Down preserve stability, they force you to choose between the two.
Optimal Reconciliation (MinT) removes that trade-off. By using the information from all levels of the hierarchy and weighting them by their reliability, you create a forecast that is not only consistent (it adds up) but often more accurate than any single level could be on its own.
In production systems, this is the difference between a supply chain that fights against itself and one that moves in unison.
To deepen your understanding of the base models used in this hierarchy, check out our guide on ARIMA Models. If you are dealing with complex seasonality at the bottom levels, our article on Facebook Prophet explores how to handle those patterns before reconciliation.
Hands-On Practice
In this tutorial, you will tackle the 'Incoherency Problem' in time series forecasting—where global forecasts don't match the sum of their parts. Using the Retail Sales dataset, we will construct a natural hierarchy by aggregating daily sales into Weekly Total, Weekday, and Weekend components. You will generate independent base forecasts for each level, observe the mathematical mismatch, and apply the Bottom-Up approach to enforce coherency, ensuring your numbers add up perfectly across the business structure.
Dataset: Retail Sales (Time Series) 3 years of daily retail sales data with clear trend, weekly/yearly seasonality, and related features. Includes sales, visitors, marketing spend, and temperature. Perfect for ARIMA, Exponential Smoothing, and Time Series Forecasting.
Try It Yourself
Retail Time Series: Daily retail sales with trend and seasonality
You have successfully demonstrated that independent forecasts rarely sum up correctly and applied the Bottom-Up method to enforce mathematical consistency. Try experimenting with the 'Top-Down' approach by calculating the average historical proportion of Weekday/Weekend sales and distributing the Total Forecast downwards. You can also deepen the hierarchy by splitting the data further (e.g., Total -> Month -> Week) to see how error propagation changes with more levels.