Chapter 9 Inferential Statistics

9.1 What is Inferential Statistics?

Inferential statistics is mostly used to explain different phenomenon, predict trends or make decisions relating to whether results are statistically significant or not. This is in contrast to descriptive statistics which is restricted to merely describing the important characteristics of data by using measures of central tendency and measures of dispersion.

9.2 Hypothesis Testing

People can form different opinions by looking at data but hypothesis testing provides a consistent framework for making decisions about different assumptions for everyone by providing a statistical using a set of rules rather than relying on subjective impressions (Pereira S. M. C, Leslie G. R. N. 2009).

Hypothesis testing is used to test assumptions relating to a population parameter based on a sample taken from that population. It involves formulating hypotheses, including a null hypothesis and an alternative hypothesis, and collecting data before using statistical methods to determine whether or not the null hypothesis can be rejected.

The null hypothesis describes the assumption that there is no difference between observations while the alternative hypothesis describes the assumption that there is a difference (not due to chance).

The decision to reject the null hypothesis is based on the strength of the evidence provided by the sample data. The strength of the evidence provided by the sample data is described by the p-value which is the probability of observing a test statistic as extreme or more extreme than the one computed from the sample, assuming that the null hypothesis is true.

Researchers will construct hypotheses with the expectation that their findings will contradict the null hypothesis.

When the null hypothesis is rejected it important to avoid stating acceptance of the alternative hypothesis. In general, studies provide evidence for or against a hypothesis rather than conclusively proving one or another to be true.

Similarly, the null hypothesis cannot be rejected it is important not to state that the null hypothesis is accepted instead it should be stated that the null hypothesis cannot be rejected.

This is often compared to how verdicts are made in a court of law (Banerjee A., Chitnis U. B., Jadhav S. L., Bhawalkar J. S., Chaudhury S. 2009). A person can be found guilty or not guilty. A not guilty verdict means that the prosecution was unable not prove beyond a reasonable doubt that the person committed the crime but it doesn’t necessarily mean the person is innocent. The court can provide evidence of guilt but it cannot prove innocence.

In the same way, statistical tests cannot prove that either hypothesis is true.

9.3 Statistical Significance

Statistical significance is a concept in hypothesis testing that describes whether the results of a study are meaningful or not. Statistical significance is used to determine if the results are due to random chance or not and is a measure of the probability of the null hypothesis being true (Tenny S. and Abdelgawad I. 2022).

A result is statistically significant if the p-value is less than a pre-defined value (Tenny S. and Abdelgawad I. 2022). Results that are unlikely to have occurred by chance are considered to be statistically significant. The level of statistical significance is typically set by choosing a p-value of 0.05 (Office for National Statistics 2022). A p-value less than this it means means that there is less than a 5% chance that the results are due to random chance, the null hypothesis is rejected and the result is considered statistically significant.

Statistical significance does not always imply practical significance hover and a result may be statistically significant but may not have a large enough effect to be of any practical use.

9.4 Errors in Hypothesis Testing

In hypothesis testing, there are two types of errors that can occur: Type I errors and Type II errors.

9.4.1 Type I Errors

A Type I error is known as a false positive and occurs when the null hypothesis is rejected despite being true (Banerjee A., Chitnis U. B., Jadhav S. L., Bhawalkar J. S., Chaudhury S. 2009). A significant result is obtained by chance but is interpreted as a real effect. The probability of making a Type I error is represented by alpha, \(\alpha\):

\[\alpha= \textrm{P}(\textrm{Null hypothesis rejected} | \textrm{Null hypothesis is true}).\]

9.4.2 Type II Errors

A Type II error, known as a false negative, occurs when the null hypothesis is not rejected despite being false (Banerjee A., Chitnis U. B., Jadhav S. L., Bhawalkar J. S., Chaudhury S. 2009). The real effect goes undetected and is interpreted as the result of chance. The probability of making a Type II error is represented by beta, \(\beta\):

\[\beta= \textrm{P} (\textrm{Null hypothesis accepted} | \textrm{Null hypothesis is false}).\]

Minimizing the risk of both of these types of errors is important but there is often a trade-off in that increasing the sample size and reducing the significance level can reduce Type I errors but it can also increase Type II errors.

9.4.3 Test Power

Test power is another useful measure in hypothesis testing. It is a measure of the ability of a hypothesis test to detect an effect or a difference if it actually exists. It represents the probability of correctly rejecting the null hypothesis when it is false. The formula for test power is (Swinscow T.D.V 1997):

\[\textrm{Power}= 1-\beta,\]

where \(\beta\) is the probability of a Type II error.

9.5 Parametric and Non-Parametric Testing

Parametric tests and non-parametric tests are statistical tests used to compare groups of data.

They differ in that parametric tests generally make assumptions about the data being normally distributed with equal variances. There are a range of parametric tests including: t-tests, ANOVA, and regression analysis. Parametric tests are typically more powerful but for the tests to be reliable the assumptions must be valid.

Non-parametric tests are often used when data fails to meet the assumptions of parametric tests although non-parametric tests often come with their own sets of assumptions. There are a range of non-parametric tests including: the Wilcoxon rank-sum test, Kruskal-Wallis test, and the Mann-Whitney U test. These tests are generally more flexible but this comes at the expense of being less powerful than parametric tests.

References

Banerjee A., Chitnis U. B., Jadhav S. L., Bhawalkar J. S., Chaudhury S. 2009. “Hypothesis testing, type I and type II errors.” Ind Psychiatry J. 18(2): 127–31. https://doi.org/10.4103/0972-6748.62274.

Office for National Statistics. 2022. “Uncertainty and How We Measure It for Our Surveys.” https://www.ons.gov.uk/methodology/methodologytopicsandstatisticalconcepts/uncertaintyandhowwemeasureit.

Pereira S. M. C, Leslie G. R. N. 2009. “Hypothesis testing.” Australian Critical Care 22 (4): 187–91. https://doi.org/10.1016/j.aucc.2009.08.003.

Swinscow T.D.V. 1997. Statistics at Square One: Differences between means: type I and type II errors and power. 9th ed. BMJ Publishing Group. https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/5-differences-between-means-type-i-an.

Tenny S. and Abdelgawad I. 2022. Statistical Significance. StatPearls [Internet] StatPearls Publishing, Treasure Island (FL). https://www.ncbi.nlm.nih.gov/books/NBK459346/.