Appendices
Excel Functions
Frequency
=FREQUENCY(start:end,bins)
The frequency() function will return a frequency table describing your data. It takes two arguments, the first being the array of values and the second being an array describing the upper boundary of the bins used.
Average
=AVERAGE(start:end)
The mean is calculated using the average() function. There are several other functions relating to means: geomean(), harmean() and trimmean(). Take care not to use these as they are quite different from calculating the mean that has been described here.
Median
=MEDIAN(start:end)
The median is calculated using the median() function.
Mode
=MODE.SNGL(start:end)
=MODE.MULT(start:end)
There are several functions for calculating the mode: mode(), mode.sngl() and mode.mult().
mode() was used in Excel 2007 and may still appear as an option in some versions of Excel.
mode.sngl() will return one mode and mode.mult() will return multiple modes (if there are multiple modes).
Neither mode() nor mode.sngl() will provide a warning if there are multiple modes so mode.mult() is usually the safest option.
Range
=MAX(start:end)-MIN(start:end)
There is no single function for calculating the range in Excel but the formula above will subtract the smallest value from the largest value in an array.
Square Root
=SQRT(number)
Takes the square root of a number.
Percentile
=PERCENTILE.EXC(array,k)
=PERCENTILE.INC(array,k)
These two percentile functions will return the k-th percentile of values in a range where k is in the range 0 to 1 (exclusive and inclusive respectively).
Standard Deviation
=STDEV.S(start:end)
=STDEV.P(start:end)
stdev.s() estimates standard deviation based on a sample. stdev.p() calculates standard deviation based on the entire population given as arguments.
Variance
=VAR.S(start:end)
=VAR.P(start:end)
var.s() estimates variance based on a sample. var.p() calculates variance based on the entire population given as arguments.
Pearson Correlation Coefficient
=PEARSON(array1, array2)
Calculating the correlation coefficient is not simple and the best way to calculate it is to use R, Python, Excel, SPSS or some equivalent software.
Chi-Square Distribution
=CHISQ.INV.RT(p,df)
Returns the critical values of the chi-square distribution where p is the probability and df is the number of degrees of freedom.
Glossary of Mathematical Symbols
A mathematical symbol is a figure used to represent a mathematical object, an action which is performed on mathematical object or a relationship between mathematical objects. The most basic symbols used are the decimal digits (0,1,2,3… etc) and lower-case letters which are generally used to denote variables (\(x,y,z\)). The Greek alphabet is also sometimes to used to denote variables (\(\alpha\),\(\beta\), \(\gamma\), \(\delta\)… etc) or constants although upper case Greek letters can also be used to denote operations which are performed on objects (\(\large\Sigma\)).
Arithmetic Operators
- \(+\) denotes addition.
- \(-\) denotes subtraction.
- \(/\) denotes division.
- \(\times\) denotes multiplication although sometimes multiplication is implied and the operator will be omitted for instance:
\[(x-y)(y+x)=xy+x^2-y^2-xy=x^2-y^2.\]
- \(\pm\) is the “plus or minus” sign and is usually used in reporting a result to show that the result lies within a range. For example, 176 \(\pm\) 4 denotes that the measured value is 176 but the true value could lie between 172 and 180.
Equality, equivalence, approximation and comparison
- \(=\) denotes two values are equal to one another.
- \(\approx\) denotes that two values are approximately equal to one another for instance \(\pi \approx 3.14.\)
- \(<\) means less than, for example \(x <\) 4 is a statement that the variable \(x\) is less than the value 4.
- \(>\) means greater than.
- \(\le\) means less than or equal to.
- \(\ge\) means greater than or equal to.
Linear algebra
- \(\large\Sigma\) denotes the sum of some number of terms. For instance:
\[\large\Sigma x,\]
should be read as an instruction to sum all the measured values of the variable denoted by \(x\).
\[\large\Sigma_{i=0}^{n} x_i^2,\]
is more specific and indicates that a sum should be taken between the 0\(^{th}\) and the n\(^{th}\) value of the variable \(x^2\).
For example, x might represent test scores at GCSE and in a class of 8 students those scores might be:
Index (i) | Test scores (x) |
---|---|
0 | 99.0 |
1 | 91.2 |
2 | 84.6 |
3 | 73.1 |
4 | 72.4 |
5 | 55.6 |
6 | 44.3 |
7 | 32.2 |
8 | 28.6 |
An instruction to sum the first three terms squared would be written as:
\[\Sigma_{i=0}^{3} x^2 = (99.0)^2+(91.2)^2+(84.6)^2 = 25,275.6.\]
- \(\bar{x}\) A variable with a line written above it denotes the mean of the variable. Using the data from the table above the mean test score (\(\bar{x}\)) is given by the sum of the test scores divided by the number of scores recorded (9) which is \(\bar{x}=64.6.\)