Managers-Net

Standard deviation (and variance)

Frequency distributions (see Related topics) illustrate graphically how the values in the population of data are dispersed in the form of a shape. In order to use frequency distributions we need more information than just the shape.

For example, one important parameter is where the centre of the distribution is located, known as central tendency, i.e. the averages, notably the arithmetic mean, median and mode amongst others.

We also need to know the dispersion of the data, that is, how spread out from the mean are the values (e.g. are they all closely clustered around the mean or are they well scattered). The three most usual measures of dispersion are:

range: the distance between the smallest value and the largest value.
variance: the most sophisticated and useful measure, leading to:
standard deviation: which is the square root of the variance.

Uses of standard deviation

The standard deviations is essential for:

assessing the degree of dispersion of the values around its mean,
assessing the error to which the mean of a sample is subject when estimating the mean of a population from which the sample was taken (see the note at the end).
finding probabilities of events occurring in a given

Various forms of standard deviation

Standard deviation is usually denoted by the Greek symbol sigma (σ). The calculation of σ depends on the format of the data or variables, which can be divided into three categories:

Continuous variables, which are numerical values in units of length, mass, time, electrical measures etc. on a continuous scale
Discrete variables, also numerical values but can only be particular numbers, such as numbers of employees in companies or shoe sizes (7,7½,8,8½ etc),
Attribute variables, which are descriptive, like defective products, scratches or other damage on a surface, proportions of people voting or not voting, or activity sampling (e.g. "operator working" or "not working").

Calculating variance and standard deviation

Standard deviation for continuous variables and discrete variables

The variance is: (sum of the deviations of the values from their mean)² divided by (sample size)

In symbolic form this is:

var = (σ(x - mean))² ÷ n, hence, standard deviation is:
σ = √[(Σ(x - mean))² ÷ n] where n is the sample size

Example: Calculate the standard deviation for the following ten lengths:

Values: 12, 9, 3, 10, 12, 22, 7, 11, 15 and 19cm.

Mean = 120 ÷ 10 = 12

	1	2	3	4	5	6	7	8	9	10	sum
Values (cm)	12	9	3	10	12	22	7	11	15	19	120
deviations	0	-3	-9	-2	0	10	-5	-1	3	7	0
Dev.squared	0	9	81	4	0	100	25	1	9	49	278

Sum of the deviations squared = 278

so the variance = 278 ÷ 10 = 27.8 cm

and standard deviation = √27.8 = 5.27 cm

Standard deviation for attributes data:
1. Binomial: σ = √[p(1-p) x n] where p is the proportion of the values and σ is the absolute standard deviation
  
  also σ = √[p(1-p) ÷ n] where p is the proportion of the values and σ is the proportional standard deviation
  
  Examples: An activity sampling study shows that the number of times the subject was observed to be working during the day was 36 out of a total of 50 random observations. Estimate the probable proportion of the day the subject was actually working.
  
  Using the second, proportional, formula:
  
  p = 36 out of 50 = 0.72, (or 72%).
  
  So, σ = √[0.72*(1-0.72) ÷ 50] = √(0.2 ÷ 50) = 0.063 or 6.3%
  
  Therefore, our estimate of the proportion of a day the subject was working
  
  = p ± standard deviation = 0.72 ± 0.063 or 72% ± 6.3%
  
  i.e. somewhere between 65.7% and 78.3%.
  
  (Note on significance: as we have only taken one standard deviation in the calculation this result is only reliable to 68%. In other words we are only 68% confident that the result for a whole day actually IS between 65.7% and 78.3%. To be more accurate we need to take a larger sample and to be more confident in the result, more standard deviations. Statistical tables tell us that for 95.4% confidence we must take 2 s.d. and for 99.8% confidence we must take 3 s.d.
  
  So in the above calculations the estimates become, respectively:
  
  95.4% confidence: estimated mean = 0.72 ± (2 x 0.063) or 72% ± 12.6%
  
  99.8% confidence: estimated mean = 0.72 ± (3 x 0.063) or 72% ± 18.9%
  
  It is clear that the more confident we wish to be that the result is reliable, the bigger the possible error. (What you gain on the swings you lose on the roundabouts).
2. Where n is not known
  
  Example. A company calculates the mean number of orders placed per week is 400 but obviously it cannot know the number of orders not placed.
  
  This is a case of the Poisson distribution, the standard deviation for which is simply: σ = √mean. So in this example, σ = 20 orders.

Extension of σ to other distributions

Each distribution (such as Beta, Gamma, exponential, Weibull among others) has its own particular standard deviation formula.

The standard deviation all other types of data such as continuous and discrete data can be used similarly to assess errors on sample means. However, standard deviations must be converted into standard errors - but that is another story!

Custom Search

browser implementation

For more information, contact: Managers-Net.