Why Your Mean Is More Stable Than Your Standard Error Suggests

18 Jul 2025 - tsp
Last update 18 Jul 2025
Reading time 7 mins

Have you ever wondered why your measured mean appears more stable than your calculated standard deviation or standard error would suggest? If you’ve been averaging measurements and your results don’t “jump around” as much as you expect, you’re not alone. The answer lies in the nature of noise.

When analyzing repeated measurements, it’s common practice to report:

Mean ($\mu, \bar{x}$): the average value across measurements
Standard deviation (SD, $\sigma$): how widely values spread around the mean for a single measurement
Standard error (SE): the estimated uncertainty of the mean

Those are defined as:

[ \begin{aligned} \bar{x} = \mu &= \frac{1}{N} \sum_{i=1}^{N} x_i \\ \sigma &= \sqrt{\frac{1}{N-1} \sum_{i=1}^{N} \left(\bar{x} - x_i\right)^2} \\ SE &= \frac{\sigma}{\sqrt{N}} \end{aligned} ]

Beyond summarizing past data, the mean and standard deviation (SD) also serve as point estimators for predicting future measurements — assuming the underlying statistical properties remain consistent.

The mean estimates the expected value (or for Gaussians the center) of the distribution. In the absence of bias and under stable noise conditions, it is the best predictor for the value of the next measurement.
The standard deviation describes the likely spread around the mean. Under white noise (i.e. Gaussian distribution) conditions, about 68% of future values will fall within one SD, and 95% within two.
The standard error, in contrast, estimates how accurately you’ve determined the mean. It’s not about predicting the next value directly, but about the confidence you have in the mean as a representation of the underlying expectation value.

For uncorrelated noise (“white noise” or Gaussian noise), the standard error should shrink as you take more samples:

[ SE = \frac{\sigma}{\sqrt{N}} ]

In the most extreme case you would assume

[ \lim_{N\to\infty} SE = \lim_{N\to\infty} \frac{\sigma}{N} \to 0 ]

This assumes each measurement is statistically independent. But what if they’re not?

The reason: Correlated Noise
The role of Allan Deviation
Apparent Stability Is Not the Same as Accuracy
Takeaways
References

The reason: Correlated Noise

In real-world experiments, especially with precise instruments or long averaging times, measurements often contain correlated noise. A few examples include:

Drift from temperature, voltage, or mechanical changes
flicker noise ($\frac{1}{f}$), common in electronics
Slow environmental fluctuations or drifts

These sources introduce long-term correlations. Even if you repeat a measurement a large number of times, the effective number of independent samples is far less. That means:

The standard error stops decreasing, even with more samples
The mean may appear very stable. You can imagine this caused by the correlated noise shifting entire sets of measurements to together.

You might notice that the standard deviation (SD) and standard error (SE) seem to reach a plateau, while your measured mean hovers consistently around the same value. This can feel counterintuitive - but it’s actually expected behavior in the presence of low-frequency noise.

The role of Allan Deviation

To detect and quantify this kind of noise behavior, especially in time-series data, physicists and engineers often use Allan deviation. Instead of assuming all samples are uncorrelated, it measures how the average changes over increasing timescales.

The Allan variance for a time series $x(t)$ for fixed $\tau$ (keep in mind this is and estimator only valid under the condition of fixed $\tau$, there are different forms depending on condition; this is not a tutorial on Allan deviations and variances) is defined as:

[ \sigma_x^2(\tau) = \frac{1}{2(M-1)} \sum_{i=0}^{M-1} \left(\bar{x_{i+1}} - \bar{x_{i}}\right)^2 ]

The $\bar{x}_i$ are the averages of $x(t)$ over successive time intervals of a fixed duration $\tau$ (imagine this as a sliding window). The square root of this variance gives the Allan deviation $sigma_x(\tau)$.

Different types of noise yield characteristic dependencies of Allan deviation with respect to $\tau$:

White noise (Gaussian noise) yields $\sigma_y(\tau) \propto \frac{1}{\sqrt{\tau}}$
Flicker noise ($\frac{1}{f}$ noise) yields $\sigma_y(\tau) \approx const.$
Random walk yields $\sigma_y(\tau) \propto \sqrt{\tau}$
A linear drift causes a linear behaviour $\sigma_y(\tau) \propto \tau$

By plotting Allan deviation across a range of $\tau$, one can visually identify which type of noise dominates at which timescales.

Allan deviation helps answer:

What kind of noise dominates my system?
When does further averaging no longer improve precision?

The following graph of simulated data shows the 3 common cases of white noise, flicker noise and a constant drift as well as the behaviour of the Allan deviation as well as standard deviation (SD) and standard error (SE) over time:

In this graph we can see:

For white noise the Allan deviation decreases with $\frac{1}{\sqrt{\tau}}$ while the standard deviation and standard error stay constant. Even if it fluctates at the beginning it should converge to the true standard deviation with $N \to \infty$
For flicker noise we see that the Allan deviation stays constant - the jump in the simulation can be attributed to numerical instabilities. The standard deviation actually increases slightly with the number of samples taken while the standard error still converges.
For long term linear drifts we see the allan deviation first following a $\frac{1}{\sqrt{\tau}}$ path until the drift dominates the noise sources at which point it increases again as expected. We can see that the standard deviation and also the standard error increase over the number of samples taken.

Apparent Stability Is Not the Same as Accuracy

A crucial caveat is that a stable mean and small standard error might tempt one to report overly precise results - especially if the underlying noise is correlated. In such cases, standard error underestimates the true uncertainty, and any conclusions drawn from the illusion of precision may be misleading. It’s important to remember that a narrow confidence interval computed from standard error is only valid under the assumption of independent samples. When this assumption fails, reported uncertainty becomes artificially optimistic. Always assess the nature of your noise before trusting the digits after the decimal point.

Conversely, long term drifts or correlated offsets can also lead to inflated standard deviations and standard errors, which may overestimate the variability of individual measurements. This can make your system appear noisier than it really is on short timescales - especially if the underlying random noise is low and the dominant effect is slow drift. In this case, while the reported mean might itself be inaccurate due to bias from drift, the apparent per-measurement noise is exaggerated. Correlated noise thus distorts both under- and overconfidence, depending on how you interpret your statistics.

Takeaways

If your mean stays stable but your SE doesn’t shrink, you’re likely limited by correlated noise
In such cases, your mean might actually be better estimated than SE or SD suggest because correlated noise (like drift or flicker noise) causes consistent offsets that shift all measurements together - keeping the mean in a narrow range even if the noise appears large.
Standard error formulas assume uncorrelated (white) noise - and often underestimate uncertainty when this isn’t true.
Allan deviation is the tool to go for understanding long-term behavior in repeated measurements.

Noise Type	Standard Deviation (SD)	Standard Error (SE)
White noise	✅ Converges to a constant value (true σ)	✅ Decreases as $\frac{\sigma}{\sqrt{N}}$
Flicker noise	⚠️ Slowly increasing or saturating	❌ Does not decrease as $1/\sqrt{N}$; often saturates
Linear drift	🔺 Increases linearly with time	❌ Misleading - may shrink briefly, but becomes invalid due to nonstationarity

References

Allan, D. W. Time and Frequency (Time-Domain) Characterization, Estimation, and Prediction of Precision Clocks and Oscillators. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control 34, 647–654 (1987)
AllanTools Python library on PyPi