When it is possible to collect all the data for a population, the results (for example the parameters like average (mean) or dispersion of the data values) will accurately represent the situation. However, because the sampling frame from which the sample is taken usually will be large, it is impossible to measure all the data, so a sample must be obtained. Unfortunately, because we cannot measure all of the data the sample parameters when calculated probably will not accurately represent the whole data field. This gives rise to what are known as statistical, or sampling, errors.
The reader's attention is drawn to other topics on this Web-site e.g. Data Validation.
Two important points about sampling are that the sample must be (a) representative of the situation and (b) usually random, in order to avoiding the effects of bias. Random sampling is the most usual methods of obtaining representative sampling.
As already mentioned above, when taking a sample something within the sampling frame must be random in order to avoid the effects of bias. Either the situation must be random or the sampling must be on a random basis.
One of the most common, but not the simplest, is random sampling as used in lotteries. Random samples may be taken by several methods including thoroughly mixing up the items in the sampling field and then picking the number of items in the sample size at random e.g. without selecting). Another method is to number each item in the population of values and then use randomly generated numbers to obtain the random sample. Many are already numbered such as serial numbers on equipment, passports or National Insurance numbers. Random numbers may be found in textbooks, statistical tables or as computer programs.
The following example is not necessarily how it is done in practice but is one method of sampling to illustrate the method in general terms.
Suppose an electricity supply organisation needs to assess the degree of corrosion of its main power lines in various areas of the country in order to find those areas which are prone to the worst corrosion and hence might need more attention than other areas. It is an impossibly time-consuming task to inspect every power line between every tower in every area and, indeed, not necessary. Sampling can provide a sufficiently "accurate" or reliable answer with a known degree of error. (For methods of assessing the degree of error please see the List of Topics below.)
Meanwhile, using a map of the grid system the researcher could divide the territory into areas and the areas into smaller locations. Each power line could be divided into smaller lengths (possibly "between each tower") and each smaller length would be identified in some way (e.g. numbering or coding).
In order to decide which of the thousands of lengths of cable are to be examined, first of all the sample size (i.e. how many lengths to be inspected) must be determined. It is the sample size that eventually determines the degree of error in the result, when this is applied to the whole network including those thousands of lengths which were not checked. Basically, the larger the sample size the smaller is the statistical error. These statistical errors are not to be confused with human error nor with measuring equipment error. Statistical errors are dealt with in other Topics in this series.
When the sample size has been calculated (as dealt with in a later Topic) The next stage is to identify which of the lengths are to be inspected.
For this purpose it is necessary to generate random numbers either from tables available in many books on statistical method or from computer spreadsheets (e.g. Lotus 1-2-3, or EXCEL). When the required number of random numbers has been obtained these are used to identify the corresponding numbers on the grid map as the ones to be inspected.
Figure 1 illustrates a very simplified, abridged example of this method in diagrammatic form showing only 30 lengths of cable. These are numbered 1 to 30.
A sample size of eight is used in this instance. Random numbers, taken from a random number table, are 18,28,5,13,16,9,26 and 21. These are indicated in red on the "map" below. These numbered cables would be used as the sample:
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 |
Systematic sampling (or constant skip method) is not random. Nevertheless, it can be used where the situation is random.
For example, suppose the objective of a large organization is to obtain a random selection from the 800 employees to sit as representatives on a management productivity group. Each has an employee staff identification number issued randomly by Personnel Department. To collect a sample of 20 names, management could take, for example every 40th name from the staff register (i.e. 800 divided by 20 equals 40, hence every 40th name).
This method is useful where the sampling frame has natural strata or divisions. For example, to ensure that all occupations in a company are equally represented the occupations could be the strata and within each stratum, random or systematic samples could be taken. So, using the example quoted for systematic sampling, if the employees consisted of 64 managers, 200 supervisors and 536 engineers (=800 employees) to obtain a representative proportion from each employee grade (or stratum), the proportions would be: for managers, 64 out of 800 total employees = 8%, 200 out of 800 = 25% and 536 out of 800 = 67%.
Therefore, 8% of the random numbers would be from management names, 25% from supervisors' names and the rest, 67%, from the engineers' names. This ensures a representative proportion from each group.
The "mystery shoppers" method of sampling is used in market research to determine the quality of goods and services. With this method employees or specially engaged agencies acting as "customers" make notes on the service they receive in the environment being inspected.
This method can be used for testing the "ambience" of areas (e.g. "how pleasant" is the area). For example, some rail services use the method for inspecting their rolling stock and stations for litter, vandalism, malicious damage, graffiti and the general appearance of the environment and "feel" of their assets.