This document presents PLAIO's forecasting methodology and evaluation framework designed to enhance data-driven decision making. The approach employs rolling forecasts that adapt continuously through S&OP cycles, replacing traditional static annual projections with a more responsive planning process.

PLAIO's evaluation system utilizes two complementary metrics—Error and Bias—when combined provide more nuanced performance analysis than conventional accuracy measures.

Forecasting Cadence

Rather than relying on a single annual forecast, PLAIO utilizes a rolling forecast approach. This method involves generating multiple forecasts throughout the year, ensuring that our projections remain adaptable and responsive to changes in each S&OP cycle. This cadence enables regular evaluations of forecast performance via metrics and provides a framework for improving forecast reliability and supporting data-driven decision making. Each forecast (n, n+1, n+2, n+3) covers specific periods with overlapping timeframes, allowing for continuous monitoring and adjustment.

Here is a graphic showing the cadence of how forecasts are generated and what is considered a single forecast. Within each forecast is a value for every period that is being forecasted across the horizon. Metrics are then calculated for each forecast (values within the forecast horizon used) allowing monitoring of how forecasts are performing over time.

Forecast Metrics

Overview

This section describes the two key metrics PLAIO uses for evaluating forecast performance: Error and Bias. These metrics provide complementary insights into forecast performance, allowing organizations to better understand their forecasting capabilities and identify areas for improvement.

Why These Metrics Matter

Traditional forecast accuracy metrics often fail to distinguish between different types of forecasting errors. Our Error and Bias metrics address this limitation by separating two critical aspects of forecast performance:

Error: Quantifies the overall magnitude of forecast inaccuracy
Bias: Identifies systematic tendencies to over-forecast or under-forecast

Together, these metrics provide a comprehensive view of forecast quality that can guide process improvements and help stakeholders understand the reliability of forecasts.

Benefits of Using These Metrics

Clear diagnostics: Distinguish between random error and systematic bias
Actionable insights: Different types of forecast issues require different solutions
Simple benchmarking: Easy to compare across different products, regions, or time periods
Intuitive communication: Stakeholders can easily understand what the metrics represent
Process improvement: Identify specific forecasting issues to address in your process

How the Metrics Work

Error Metric

The Error metric measures the absolute magnitude of forecasting errors relative to the total of actual values. The result is expressed as a positive percentage.

Where n is the n_th forecast and i is the i_th period where you have a historical forecast value and historical actual.

Screenshot below shows how to visualize Error evolution over time for demand on a finished good.

Bias Metric

The Bias metric measures the systematic tendency to over-forecast or under-forecast. The result is expressed as a positive or negative percentage.

Where n is the n_th forecast and i is the i_th period where you have a historical forecast value and historical actual.

Screenshot below shows how to visualize Bias evolution over time for demand on a finished good.

Interpretation Guide

Error

What it tells you: The overall magnitude of forecasting errors relative to the actual values
Target value: 0% (perfect forecasts)
Interpretation: Lower values indicate better forecast accuracy. Error is always positive (> 0).

Bias

What it tells you: The directional tendency of forecasts
Target value: 0% (no systematic bias)
Interpretation:
- Positive bias (e.g., +10%): Systematically over-forecasting (forecasts tend to be higher than actuals)
- Negative bias (e.g., -10%): Systematically under-forecasting (forecasts tend to be lower than actuals)

Common Scenarios and Business Implications

High-Quality Forecasts

Interpretation: Forecasts closely match actuals with no systematic direction
Business impact: Reliable information for decision-making, optimal resource allocation

High Variation Without Systematic Bias

Interpretation: Large deviations that cancel out directionally
Business impact: Poor predictability, but no systematic resource misallocation

Systematic Under-forecasting

Interpretation: Forecasts are consistently lower than actuals
Business impact: Potential understaffing, stock shortages, or missed revenue opportunities

Systematic Over-forecasting

Interpretation: Forecasts are consistently higher than actuals
Business impact: Potential overstaffing, excess inventory, or inflated expectations

Benchmarking Forecast Performance

The Importance of Benchmark Forecasts

Evaluating forecast performance solely on absolute metrics like Error and Bias can be misleading without proper context. A forecast with a 30% Error might be considered poor in some contexts but excellent in others. This is where benchmark forecasts become essential.

A benchmark forecast is a simple, easily implementable forecasting method that serves as a reference point for evaluating more sophisticated methods. PLAIO utilizes a simple moving average.

Relative Performance Evaluation

The true value of Error and Bias metrics emerges when they are compared against benchmark forecasts. This relative evaluation provides crucial context for interpreting forecast quality:

Contextualizing Performance: An Error of 60% might be excellent if the benchmark forecast has consistently shown 120% Error for the same items or periods.
Identifying Easy vs. Difficult Forecasts: For highly predictable items, even simple models might achieve 20% Error, making this level of performance merely average. In such cases, a good model would achieve less than 5% Error.
Setting Appropriate Expectations: Different product families, demand segments, or business units may have inherently different levels of forecast difficulty. Benchmarks help set realistic expectations for each.

Comparing Machine Learning, Market, and Benchmark Forecasts

When evaluating multiple forecast types, we typically expect to see a pattern of performance where machine learning forecasts outperform market (manual) forecasts, which in turn outperform simple benchmarks. However, several interesting patterns can emerge:

Expected Pattern: ML Forecast < Market Forecast < Benchmark Forecast (where < means "has less error than")
- This indicates that both human forecasters and ML models are adding value beyond simple methods.
ML Underperformance: Market Forecast < Benchmark Forecast < ML Forecast
- This may indicate that the ML model is missing important contextual factors or is overfitting to historical patterns.
- It could also indicate very complex or novel market conditions that humans can intuitively adjust for but ML models struggle with.
Market Insight: Market Forecast < ML Forecast < Benchmark Forecast
- When market forecasts outperform ML models, this suggests valuable nuanced insights from sales teams or country managers.
- These insights should be documented and potentially incorporated into ML models.
Need for Improvement: Benchmark Forecast < ML Forecast < Market Forecast
- When market forecasts perform worse than even simple benchmarks, this clearly indicates a need for better training, insights, or processes for the human forecasters.
- It may also indicate incentive problems if forecasters are biased toward optimistic or conservative forecasts.

Leveraging Benchmark Comparisons

Organizations can use benchmark comparisons to:

Target Improvement Efforts: Focus on areas where forecasts significantly underperform benchmarks.
Capture Market Insights: Identify and learn from cases where market forecasts excel.
Optimize Forecast Selection: Potentially use different forecast types for different products or markets based on relative performance.
Set Realistic Targets: Establish achievable forecast error targets based on the demonstrated forecast difficulty of each item or category.

Forecasting Approach & Performance Evaluation