Chapter 3 Measures of Uncertainty

The difference between a population parameter and a sample statistic is known as sampling error. There are various measures of uncertainty used to describe how estimates differ from the true value of the population include (Office for National Statistics 2022):

3.1 Standard Error

The standard error is a commonly used measure of sampling error.

The standard deviation is a descriptive statistic that details variability in a single sample statistic while the standard error is an inferential statistic that estimates the variability across multiple samples of a population (Lee D. K., In J. and Lee S. 2015).

The standard error shows how close the estimate based on sample data might be to the value that would have been taken from the whole population (Office for National Statistics 2022).

The standard error of the mean (SEM) is the most commonly reported type of standard error but the standard error can be calculated for other statistics as well.

The standard error is calculated by dividing the standard deviation of a set of measurements by the square root of the number of measurements.

Information

The standard error is given by (Office for National Statistics 2022):

\[SE=\frac{\sigma}{\sqrt{n}}\],

where \(SE\) is the standard error, \(\sigma\) is the population standard deviation and \(n\) is the number of elements in the sample.

In practice the population standard deviation is rarely known so instead the formula takes the sample standard deviation as a point estimate for the population standard deviation (Office for National Statistics 2022):

\[SE=\frac{s}{\sqrt{n}},\]

where \(SE\) is the standard error, \(s\) is the sample standard deviation and \(n\) is the number of elements in the sample.

The standard error decreases as sample size increases as the extent of chance variation is reduced. This idea underpins sample size calculations of drug trials in medical research (Altman D. G. and Bland J. M. 2005). In contrast, the standard deviation will not tend to change as we increase the size of our sample.

3.2 Coefficient of Variation

The coefficient of variation makes it easier to understand whether a standard error is large compared with the estimate itself. It allows researchers to measure variation in a way which enables comparisons between data with different means (Martin J. D. and Louis N. G. 1997).

The coefficient of variation is also known as the relative standard error as it is a relative measure of dispersion (compared with standard deviation and interquartile range which are absolute measures). The coefficient of variation is calculated by dividing the standard error of an estimate by the estimate itself and the result indicates the relative spread of the data. An advantage to the coefficient of variation is that unlike other dispersion measures it takes central tendency into account (Martin J. D. and Louis N. G. 1997).

Similar to the standard error, the closer the coefficient of variation is to zero, the more precise the estimate is. Higher values indicate the standard deviation is large compared to the estimate. Where it is above 50%, the estimate is considered to be lacking in precision (Office for National Statistics 2022).

The coefficient of variation should not be used for estimates of values that are close to zero or for percentages.

3.2.1 Example

Imagine a study that measures household expenditures where we want to compare the variability of spending in households of different incomes.

Fictional Income Study Data
Expenditure	Q1 (least deprived)	Q2	Q3	Q4	Q5 (most deprived)
Mean	£120,000	£60,000	£40,000	£25,000	£12,000
Standard Error	£24,000	£12,000	£6,000	£7,500	£4,800

The variability is very high in the high income households compared to the low income households which is unsurprising given the substantial differences in the means. In order to account for the differences in the means a measure of relative variability like the coefficient of variation is used.

Calculating the coefficient of variation shows that when we account for differences in expenses the first two quintiles (Q1 and Q2) actually have equal variability and the greatest variability in seen in the most deprived quntile (Q5).

Fictional Income Study Data
Expenditure	Q1 (least deprived)	Q2	Q3	Q4	Q5 (most deprived)
Coefficient of Variation	20%	20%	30%	30%	40%

3.3 Confidence Intervals

In inferential statistics the key goal is to estimate population parameters. Confidence intervals incorporate the uncertainty and sample error to create a range of values the true population value is likely to fall within (Frost J. 2019).

Consider a study to estimate the mean weight of all 10 year old boys in Northern Ireland. It would be impractical to weigh them all so a sample of 16 might be taken. The mean weight of the sample might be 45 kg. This is a point estimate of the population mean.

Point estimation uses sample data to calculate a single value as a best guess of an unknown population parameter.

This point estimate has limited utility because it does not reveal uncertainty associated with the estimate. Is there confidence that the population mean is within 5 kg of 45 kg? It’s not possible to know with this information. That is why confidence intervals are calculated.

A 95% confidence level is frequently used in official reporting. If we drew twenty random samples and calculated a 95% confidence interval for each sample, we would expect that, on average, 19 out of the 20 (95%) resulting confidence intervals would contain the true population value while 1 in 20 (5%) would not (Office for National Statistics 2022).

Example 3.3.1 illustrates how confidence intervals can be interpreted. Example 3.3.2. illustrates how confidence intervals are calculated. Some of the concepts (standard deviation and z-scores) have yet to be introduced but the example is provided regardless to show where these intervals come from and how we can improve the accuracy of our measurements through sampling. It isn’t necessary to understand it in great detail to achieve a good foundation in statistics although the concepts necessary to understand it will be introduced in later chapters.

3.3.1 Example

Confidence intervals for religious composition of the economically active (Working age) 2011 are shown below:

Religious Denomination, 2011
Religious Demonination	Gender	Rate (%)	Confidence Interval (%)	Lower Limit (%)	Upper Limit (%)
Protestant	Male	53.3	+/- 2.6	50.7	55.9
Roman Catholic	Male	46.7	+/- 2.6	44.1	49.3
Protestant	Female	52.6	+/- 2.7	49.9	55.3
Roman Catholic	Female	47.4	+/- 2.7	44.7	50.1
Protestant	All	53.0	+/- 1.9	51.1	54.9
Roman Catholic	All	47.0	+/- 1.9	45.1	48.9

Based on a sample, the table above shows that 52.6 % of Protestant females (C.I. = +/- 2.7) were estimated to be economically active in 2011.

This means that there is 95% confidence that the ‘true value’ lies somewhere between 49.9% and 55.3%.

To calculate a confidence interval we need to know how many measurements we have, the mean and standard deviation of those measurements and a z-score.

3.3.2 Example

We measure the heights of ten people in the office and get a mean height of 172 cm and a standard deviation of 15 cm.

We need to decide the confidence interval we want. 95% is the most common choice.

We need to know the Z-score for that confidence interval. For a 95% confidence interval the Z score is 1.96.

The confidence interval is then given by multiplying the Z-score by the standard deviation and dividing by the square root of the number of observations or measurements.

Information

The Z-score describes how far a value is from the mean in terms of standard deviations. The z-score indicates how many standard deviations an element is from the mean. A standard score can be calculated from the following formula:

\[z=\frac{(X-\mu)}{\sigma},\]

where \(z\) is the z-score, \(X\) is the value of the element, \(\mu\) is the mean of the population, and \(\sigma\) is the standard deviation.

A Z-score of zero would indicate that a value is identical to the mean value while a Z-score of 1 would indicate a distance of one standard deviation from the mean.

The formula for calculating a confidence interval is given by:

\[CI= \pm Z\frac{\sigma}{\sqrt{n}},\]

where the Greek letter sigma (\(\sigma\)) is the standard deviation, \(n\) is the number of observations or measurements and \(Z\) is the \(Z\)-score.

For now, assume the standard deviation is known and takes the value, \(\sigma = 15\). The mean and its associated confidence interval can then be calculated:

\[172 \pm 1.96 \frac{15}{\sqrt{10}},\] Plugging in the numbers gives:

\[172 \textrm{ cm}\pm 9.30 \textrm{ cm}.\]

In other words, the lower bound of the confidence interval is 162.7 cm and the upper bound is 181.3 cm. The true mean is likely between these two values. The confidence interval can be narrowed by increasing the number of measurements taken. With 100 measurements of height (and the same mean and standard deviation) the mean and its associated confidence interval would be stated as:

\[172 \pm 1.96 \frac{15}{\sqrt{100}},\] \[172 \textrm{ cm} \pm 2.94 \textrm{ cm}.\] This would make the range 169.1 cm to 174.9 cm.

The more observations that are collected the more accurate the measurement of the mean height becomes.

If it was somehow possible to measure the heights of a million people the mean and the associated confidence interval would become:

\[172 \textrm{ cm} \pm 0.03 \textrm{ cm}.\]

3.4 Statistical Significance

Statistical significance measures how likely it is that differences in outcomes between different groups are not due to chance. p values and confidence intervals are the most commonly used. The p values give the probability that any particular outcome would have arisen by chance while the confidence interval incorporates the uncertainty and sample error to create a range of values the true population value is likely to fall within (Leung W. C. 2001).

Statistical significance can be used to help decide whether a difference between two survey-based estimates reflects a true change in the population rather than random variation in our sample selection. A result is statistically significant if it is not likely to be caused by chance. A 5% standard is often used when testing for statistical significance. The observed change is statistically significant at the 5% level if there is less than a 1 in 20 chance of the observed change being calculated by chance if there is actually no underlying change (Office for National Statistics 2022).

Summary

Standard Error

The standard deviation details variability in a single sample statistic while the standard error estimates the variability across multiple samples of a population.

It is calculated by dividing the standard deviation by the number of elements in a sample.

Coefficient of Variation

The coefficient of variation makes it easier to understand whether a standard error is large compared with the estimate itself. It is calculated by dividing the standard error of an estimate by the estimate itself.

Confidence Intervals

Confidence Intervals describe a range of values that likely contain the true value for a measurement.

A 95% confidence interval is calculated by multiplying the Z-score by the standard deviation and dividing by the square root of the number of observations or measurements.

Statistical Significance

Statistical significance measures how likely it is that differences in outcomes between different groups are real and not due to chance. p values and confidence intervals are the most commonly used.

References

Altman D. G. and Bland J. M. 2005. “Standard Deviations and Standard Errors.” British Medical Journal 331:903: 331. https://doi.org/10.1136/bmj.331.7521.903.

Frost J. 2019. Introduction to Statistics. Statistics By Jim Publishing. https://statisticsbyjim.com/basics/introduction-statistics-intuitive-guide/.

Lee D. K., In J. and Lee S. 2015. “Standard Deviation and Standard Error of the Mean.” Korean Journal of Anesthesiology 68 (3): 220–23. https://doi.org/10.4097/kjae.2015.68.3.220.

Leung W. C. 2001. “Balancing statistical and clinical significance in evaluating treatment effects.” Postgraduate Medical Journal 77 (905): 201–4. https://doi.org/10.1136/pmj.77.905.201.

Martin J. D. and Louis N. G. 1997. “Measurement of Relative Variation: Sociological Examples.” American Sociological Review 33 (3): 496–502. https://doi.org/10.2307/2093089.

Office for National Statistics. 2022. “Uncertainty and How We Measure It for Our Surveys.” https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/uncertaintyandhowwemeasureit.