Chapter 4 Data Types and Levels of Measurement


Understanding data is key to analysing its contents. The types of data we work with will inform our use of inferential and descriptive statistics.

4.1 Variables

A variable is an attribute that describes a person, place or thing. The value of the variable can vary from one observation to the next. For example, a person’s hair colour is a variable that can take values like “blond” or “brown”.

Variables can be classified as qualitative (categorical) or quantitative (numerical). Sometimes ranked data consisting of numbers (1st, 2nd,… 30th place) is included as a third category (Witte R. S. and Witte J. S. 2017).

To obtain data we have to observe or measure something, the something we observe or measure is a variable. For example, height, shoe size, weight and nationality are all variables as we can obtain observations or measurements for each of them (Campbell M. J. 2021):

Variables and Measurement
Variable Measurement or Observation Type
Height 170 cm, 173 cm, 182 cm, 179 cm Quantitative
Shoe Size 6, 6.5, 7, 7.5 Quantitative
Weight 35 kg, 40 kg, 75 kg, 0.57 kg Quantitative
Nationality Canadian, German, Spanish Qualitative

Variables can be dependent or independent variables.

4.1.1 Independent Variables

When looking at variables it is common to consider whether there are relationships between them. An independent variable isn’t changed by other variables. It is the variable that a researcher might change or control in a scientific experiment to test the effects on the dependent variable (Witte R. S. and Witte J. S. 2017).

4.1.2 Dependent Variables

When a variable is believed to have been influenced by changes in an independent variable, it is called a dependent variable. This is the variable that usually observed or measured by a researcher. This variable is not changed or manipulated during the course of a study (Witte R. S. and Witte J. S. 2017).

Statistical data is often classified according to the number of variables which are being studied.

4.1.3 Univariate and Bivariate Data

Univariate data involves only one variable. An example of this might be conducting a survey to estimate the average height of primary school children.

Bivariate data involves data with two variables. An example of this would be a study to determine if there was a relationship between the height and weight of primary school children.

Multivariate data involves data with more than two variables. This type of data is also sometimes referred to as multidimensional.

4.2 Types of Data

Data can be broadly categorised as qualitative (data relating to qualities or characteristics) or quantitative (numerical data relating to sizes or quantities of things).

4.2.1 Qualitative and Quantitative

Qualitative data is data for which the value of the qualitative variable is a name or a label. The colour of hair (blonde, brown, red) or a location (Belfast, Bangor, Lisburn) are examples of qualitative (or categorical) variables that take values which are names or labels.

Quantitative data is data for which the value of the quantitative variable is a number. For example, the population of a country is the number of people in that country. It is a numerical attribute of the country. Population is a quantitative variable that takes numerical values.

We can further categorise quantitative data as being continuous or discrete.

4.2.2 Continuous or Discrete

Discrete data involves whole numbers that can not be divided because of what the numbers represent (the number of students in a class, the number of cars owned, the number of fish in a lake). The number of students in a class cannot be 10.5 or 3.14.

Continuous data can be divided and measured to some number of decimal places (height, weight, speed). Height is an example of continuous data. Height can be any number (provided it lies within the range of possible human heights) and can be reported to any number of decimal places (150 cm or 150.1 cm or 150.12 cm) depending on how accurate the measurement tool is.

Information

Accuracy and precision are terms used to refer to the quality of measurements. The accuracy of a measurement describes how close the measurement is to its true value. Precision describes the degree to which an instrument or process will repeat the same value or how close measurements of the same item or quality are to one another.

There are also different levels of measurement.

4.3 Levels of Measurement

The levels of measurement describe how precisely variables are recorded. The different levels of measurement limit which statistics can be used to summarise data and which inferential statistics can be performed. These levels are:

  • Nominal

  • Ordinal

  • Interval

  • Ratio

4.3.1 Nominal

To be measured at the nominal level, a variable must be able to be divided up into separate or discrete categories, which are then named (McHugh M. L. and Villarruel A. M. 2003). The nominal level of measurement is the simplest form of a scale of measurement.

Nominal data is also called categorical data since the subjects are allocated to different categories. It can be categorised but not ranked (eye colour and gender for instance). The values grouped into these categories have no meaningful order. It is not possible to form a meaningful hierarchy of gender, hair colour, eye colour or marital status for instance.

It is fairly common to represent nominal data using bar charts like the chart below. The bar chart below shows the marital status of people in Northern Ireland based on the Census 2011 data (Census 2011b).

Nominal data can be analysed by grouping variables into categories. For each category, the frequency or the relative frequency can be calculated. The data can be presented visually and usually is illustrated using bar charts or pie charts. The only measure of central tendency used with nominal data is the mode.

Inferential statistics can be used with nominal data. Chi-square tests are non-parametric tests for categorical variables.

4.3.2 Ordinal

The ordinal level of measurement is the second level of measurement. Ordinal data is another type of qualitative data that groups variables into descriptive categories. The categories used for ordinal data are ordered in some kind of hierarchical scale although the distance between those categories may be uneven or even unknown. For example, measuring economic status using a social class hierarchy involves the use of categories with no clearly identifiable or evenly spaced interval between them.

The pie chart below shows the highest level of qualification of usual residents in households in Northern Ireland aged 16-64 in 2011 (Census 2011a). Ordinal data is often illustrated through pie charts or bar charts.

Ordinal variables often include ratings about opinions that can be categorised (strongly agree, agree, don’t know, disagree, strongly disagree).

The descriptive statistics which can be used with ordinal data are the mode and the median. Ordinal data can also be described with a measure of dispersion, namely, range.

There are a number of possible statistical tests that can be used with ordinal data. Which one is used depends on the aims of the researcher and the number and type of samples.

Non-parametric tests
Non-parametric test Aim Samples or variables
Mood’s median test Compares medians 2 or more samples
Mann-Whitney U test (also referred to as the Wilcoxon rank sum test) Compares sums of rankings of scores 2 independent samples
Wilcoxon matched-pairs signed-rank test Compares the magnitude and direction of the difference between distributions of scores 2 dependent samples
Kruskal-Wallis H test (also referred to as the one-way ANOVA on ranks) Compares the mean rankings of scores 3 or more samples
Spearman’s rho or Kendall’s Tau Measures correlation between 2 variables 2 ordinal variables

4.3.3 Interval

The next level of measurement is interval measurement. Interval measures have categories and magnitude, just like nominal and ordinal measures but this measure adds the concept that the intervals between each measure are exactly equal (McHugh M. L. and Villarruel A. M. 2003). Interval data groups variables into categories where the values are ordered and separated by equal distances.

Interval data is a type of quantitative data that groups variables into categories. Values can be ordered and separated using an equal measure of distance.

An example of interval level data is temperature data recorded in Celsius or Fahrenheit. The values on either scale are ordered and separated using an equal measure of distance (the distances between notches on a thermometer are always equally spaced).

Temperature in Celsius is interval data. The values are ordered and separated by an equal interval. The distance between 0°C and 1°C is the same as the distance between 2°C and 3°C.

The line chart below shows some simulated temperature data for Belfast. Interval data is often visualised using line charts.

Mathematical operations can be carried out on this type of data, for instance, subtracting one value from another to find the difference.

Interval data lacks a true zero. True zero indicates a lack of whatever is being measured. The Celsius scale doesn’t qualify as having a true zero since the zero point in a thermometer is arbitrary.

Information

When the Celsius scale was first created by Anders Celsius 0°C was selected to match the boiling point of water and a value of 100 °C was the freezing point of water - it could have as easily been a different liquid with a different boiling point and freezing point making it arbitrary. The scale was later reversed. Thermometers measure heat and at 0°C there is still heat, maybe not a great deal of it but heat is still measurable meaning 0°C is not a true zero. The thermodynamic Kelvin Scale has a true zero - where particles have no motion and can become no colder (there is a true absence of heat).

A range of descriptive statistics can be used to describe interval data. The measures of central tendency applicable to interval data are the mode, median and the mean. The measures of dispersion applicable to interval data are the range, standard deviation and the variance.

A number of parametric tests can be applied to interval data.

Parametric Tests
Parametric test Aim Samples or variables
T-test Compares means of two samples 2 samples
Analysis of Variance (ANOVA) Compares means of several samples 3 or more samples
Pearson’s r Measures correlation between two variables 2 variables
Simple linear regression Estimate the relationship between a dependent variable and an independent variable using a straight line 2 variables

4.3.4 Ratio

Ratio data measures variables on a continuous scale and has a true zero.

Ratio data is a form of quantitative data. It measures variables on a continuous scale with an equal distance between adjacent values (weight, height). Ratio data has a true zero unlike interval data. Ratio data is the most complex of the four data types.

Ratio data can be analysed with descriptive statistics including the mode, median and mean. Range, standard deviation, variance and the coefficient of variation can all be used to describe the dispersion of ratio data.

The tests used with interval data can be used with ratio data as well.

Summary

Data can be broadly categorised as qualitative (data relating to qualities or characteristics) or quantitative (numerical data relating to sizes or quantities of things).

Qualitative data deals with names or labels.

Quantitative data is numerical.

Discrete data involves whole numbers that can not be divided because of what the numbers represent (the number of people in a class, the number of cars owned, the number of fish in a lake).

Continuous data can be divided and measured to some number of decimal places (height, weight, speed).

There are four levels of measurement:

  • Nominal - Used to label variables. Frequencies can be calculated, as can the mode.

  • Ordinal - Groups variables into descriptive categories with some sort of heirarchy. The mode and the median can be calculated, as can the range.

  • Interval - Groups variables into categories where the values are ordered and separated by equal distances. The mode, median and mean can all be calculated as can the range, standard deviation and variance.

  • Ratio - Measures variables on a continuous scale and has a true zero. The mode, median and mean can all be calculated as can the range, standard deviation, variance and the coefficient of variation.

References

Campbell M. J. 2021. Statistics at Square One. John Wiley & Sons. https://www.wiley.com/en-ie/Statistics+at+Square+One,+12th+Edition-p-9781119401308.
Census. 2011a. Highest Level of Qualification of Usual Residents in households aged 16 to 64 (Northern Ireland).” https://www.nisra.gov.uk/statistics/census/2011-census.
———. 2011b. Marital and Civil Partnership Status (Northern Ireland).” https://www.nisra.gov.uk/statistics/census/2011-census.
McHugh M. L. and Villarruel A. M. 2003. Descriptive Statistics, Part I: Level of Measurement.” Journal for Specialists in Pediatric Nursing 8 (1): 35–37. https://doi.org/10.1111/j.1744-6155.2003.tb00182.x.
Witte R. S. and Witte J. S. 2017. Statistics. John Wiley & Sons. https://www.wiley.com/en-us/Statistics,+11th+Edition-p-9781119254515.