Prediction Error vs Measurement Error in Model Fitting

11 Apr 2026 - tsp
Last update 11 Apr 2026
Reading time 6 mins

When fitting a model function to experimental data, one is often confronted with a subtle but important conceptual issue: the uncertainty of the fitted model is frequently much smaller than the apparent measurement uncertainty of the individual data points. At first glance, this may appear contradictory. How can a model, fitted to noisy data, exhibit smaller uncertainty than the data itself?

This apparent paradox often leads to misinterpretation. Observers may assume that the narrow confidence bands of the fitted model represent the measurement uncertainty, and consequently judge the data against these bands, leading to incorrect conclusions about data quality or model validity.

This article clarifies the distinction between:

Measurement error
Model (fit) uncertainty
Residuals and data-driven variance (Measurement error)
Prediction error

We will demonstrate how these quantities arise via a simulated measurement, how they should be interpreted, and how they can be computed in practice.

In the end we will provide a short summary and conclusion.

Measurement Error vs Model Uncertainty

Measurement Error

Measurement error describes the uncertainty associated with each observed data point. Formally, we write:

[ \begin{aligned} y_i &= f(x_i, \theta) + \epsilon_i \end{aligned} ]

Here

$y_i$ is the measured value
$f(x_i, \theta)$ is the underlying model at the position $x_i$ and for the parameter set $\theta$
$\epsilon_i$ is the random error term with a variance $\sigma_y^2$

The error $\epsilon_i$ is typically determined by the measurement process itself (measurement noise, environmental fluctuations, discretization, etc.). This corresponds to the standard deviation of the measurement process. $y_i$ is typically the mean obtained from repeated measurements.

Model (Fit) Uncertainty

When fitting a model $f(x, \theta)$ to data, the parameters $\theta$ are estimated from all observations. The uncertainty of these parameters is given by the covariance matrix:

[ C := \mathrm{Cov}(\theta) ]

The matrix encodes how precisely the parameters are determined by the fitting / regression procedure. The uncertainty of the model prediction at a given point $x$ is obtained by propagating the covariance:

[ \begin{aligned} \sigma_f^2(x) &= \left(\nabla_\theta f(x, \theta)\right)^T C \nabla_\theta f(x, \theta) \end{aligned} ]

The quantity $\sigma_f$ represents the confidence band of the fitted model. The width of this band decreases with the number of data points (similar to the standard error or a measurement. For well conditioned problems with independent observations, the scaling can be estimated for many problems as:

[ \sigma_f \sim \frac{\sigma_y}{\sqrt{N}} ]

Even if individual measurements are noisy, the estimated parameters of the assumed model can be determined very precisely.

Fit uncertainty $\sigma_f$: How confident can we be about the fitted model.

Note that a small $\sigma_f$ does not imply that the model is correct. You need to apply proper statistical tests on your hypothesis.

Residuals and Data-Driven Variance

The residuals quantify how well the model describes the observed data:

[ r_i = y_i - \hat{y_i} ]

Here $\hat{y_i} = f(x_i, \hat{\theta})$ is the prediction of the data value by the fitted model. From these residuals one can estimate the variance of the data around the model:

[ \sigma_r^2 = \frac{1}{N-p} \sum_{i=1}^{N} r_i^2 ]

Here:

$N$ is the number of observations (measurement points)
$p$ is the number of fitted parameters
$N-p$ thus is the degrees of freedom

The quantity $\sigma_r$ represents the intrinsic scatter of the data and is typically comparable to the measurement noise (but they are not equal). In case of correlated noise $\sigma_r$ underestimates true uncertainty.

Residual error / intrinsic scatter $\sigma_r$: How much does the measurement process scatter (typically comparable to the measurement noise, though $\sigma_r$ includes also the model mismatch, unmodeled systematics, etc.)

Prediction Error

The prediction error describes where - at a given point of your model - you would expect the next measurement to reside with a given certainty. This must account for two contributions, that are typically treated as independent:

Uncertainty of the model parameters (given by $\sigma_f$)
Scatter of the data around the model (given by $\sigma_r$)

This corresponds to the classical distinction between confidence intervals, the uncertainty of the fitted mean model, and prediction intervals, the uncertainty of individual observations.

Under the assumption of independence this yields a total error $\sigma$:

[ \begin{aligned} \sigma^2(x) &= \sigma_f^2(x) + \sigma_r^2 \\ \sigma(x) &= \sqrt{\sigma_f^2(x) + \sigma_r^2} \end{aligned} ]

Prediction error $\sigma$: How well can the model predict a new measurement at position $x$

Keep in mind that the assumption of independence breaks in case of heteroscedastic errors or correlated noise!

A Practical Example

To illustrate the concepts, we simulate a derivative Lorentzian (Cauchy) shaped signal, add noise in both axes, perform a fit and then compute the parameter uncertainties, the model confidence band, the residual variance and the full prediction uncertainty.

In this plot one can see:

First the blue (simulated) datapoints. The simulation assumes an amplitude of $A=120$, $x_0=400$, $\mathrm{FWHM}=3.0$ (i.e. $\gamma=1.5$), $\sigma_x = 0.3$ and $\sigma_y = 0.7 * \mathrm{max}(y_i)$. On top of these we performed the orange fit using the Levenberg-Marquardt algorithm to perform a least squares fit against the same model function that has been used to synthesize the data. This yields $\hat{x_0} = 399.892560 \pm 0.097864 \mathrm{MHz}$ and $\mathrm{FWHM} = 2.369803 \pm 0.385163 \mathrm{MHz}$. The narrow blue region around the orange fit function is the fit uncertainty $\sigma_f$. As one can see this is extremly narrow and does not reflect the scatter of individual datapoints. When one would compare this region with new datapoints one would get the impression that the measurements would not confirm the hypothesis given by the model. When adding the residual measurement error $\sigma_r$ one gets the total error $\sigma = \sqrt{\sigma_f(x)^2 + \sigma_r^2}$, which is shown as the orange band. This is much wider and one can estimate this to include around 68 percent of all datapoints. The blue errorbar like line on top of the points is again $\sigma_r$, the expected scatter of individual measurements.

As one can see the confidence of the model (orange region) is much narrower than the prediction region for individual measurements.

Conclusion and the Common Interpretation Pitfall

Comparing measurement data directly to the confidence band $\sigma_f(x)$ instead of the prediction interval $\sigma(x)$ is a common mistake and leads to systematic overestimation of discrepancies between model and data.

The correct interpretation is:

$\sigma_f(x)$: Confidence in the mean fitted model
$\sigma_r$: Scatter of individual measurements
$\sigma(x)$: Uncertainty of predicted future observations

This implies, that for sufficiently large datasets:

[ \sigma_f(x) \ll \sigma_r ]

This leads to the conclusions:

A fitted model can be known much more precisely than individual measurements
The covariance matrix $\mathrm{Cov}(\theta)$ describes the parameter uncertainty, not measurement noise
The model (fit) uncertainty $\sigma_f$ describes the confidence of the model. This corresponds to the posterior uncertainty of the model.
Residuals $\sigma_r$ capture the intrinsic data scatter or the measurement noise
The correct uncertainty for future predictions is the combination of both effects $\sigma(x) = \sqrt{\sigma_f(x)^2 + \sigma_r^2}$. This corresponds to the posterior predictive distribution.

Prediction Error vs Measurement Error in Model Fitting

Measurement Error vs Model Uncertainty

Measurement Error

Model (Fit) Uncertainty

Residuals and Data-Driven Variance

Prediction Error

A Practical Example

Conclusion and the Common Interpretation Pitfall

Related articles

Quick Recap: How to estimate the prediction error for least squares fits

Electric dipole interaction of atoms with radiated electromagnetic waves

Statistics 101

Linear time dependent correlations using bivariate correlation and shifts

Measurements from small DIY NMR spectrometer

Short Kalman filter summary

Some Gaussian integrals in 1 dimension

PID control loop in a nutshell

Also on this blog

Sane Windows IP configuration: Disabling IPv6 privacy extensions and enabling ICMP echo

A Simple Display Repair - and Why It’s Not Economical

Ten Common Problems That Heavily Influence Daily Life and Work Areas for Aspergers

Various articles