Skip to content Skip to sidebar Skip to footer

How to Know How Many Views You Got on a Deviation

Missing alternative text Missing alternative text

The median is known equally a measure of location; that is, information technology tells us where the data are. As stated in , we practise not need to know all the exact values to calculate the median; if we made the smallest value even smaller or the largest value even larger, it would non alter the value of the median. Thus the median does not use all the information in the information and so it can be shown to be less efficient than the hateful or average, which does use all values of the information. To calculate the mean we add up the observed values and divide by the number of them. The total of the values obtained in Tabular array ane.1 was 22.5

Missing alternative text Missing alternative text

, which was divided by their number, 15, to give a hateful of 1.v. This familiar process is
conveniently expressed past the following symbols:

Missing alternative text Missing alternative text

Missing alternative text Missing alternative text

(pronounced "ten bar") signifies the mean; x is each of the values of urinary lead; n is the number of these values; and σ , the Greek upper-case letter sigma (our "Due south") denotes "sum of". A major disadvantage of the mean is that information technology is sensitive to outlying points. For example, replacing two.ii by 22 in Table 1.ane increases the mean to 2.82 , whereas the median will exist unchanged.

As well as measures of location we need measures of how variable the data are. We met two of these measures, the range and interquartile range, in Chapter 1.

The range is an of import measurement, for figures at the acme and bottom of it denote the findings furthest removed from the generality. Still, they do not requite much indication of the spread of observations about the mean. This is where the standard departure (SD) comes in.

The theoretical basis of the standard deviation is complex and need not trouble the ordinary user. We will discuss sampling and populations in Chapter three. A practical bespeak to note hither is that, when the population from which the data ascend have a distribution that is approximately "Normal" (or Gaussian), then the standard departure provides a useful ground for interpreting the information in terms of probability.

The Normal distribution is represented past a family of curves defined uniquely by ii parameters, which are the mean and the standard deviation of the population. The curves are always symmetrically bell shaped, merely the extent to which the bell is compressed or flattened out depends on the standard departure of the population. However, the mere fact that a bend is bong shaped does non mean that it represents a Normal distribution, because other distributions may have a similar sort of shape.

Many biological characteristics conform to a Normal distribution closely enough for it to be commonly used – for example, heights of adult men and women, claret pressures in a healthy population, random errors in many types of laboratory measurements and biochemical information. Figure 2.1 shows a Normal bend calculated from the diastolic claret pressures of 500 men, hateful 82 mmHg, standard deviation x mmHg. The ranges representing [+-1SD, +12SD, and +-3SD] nearly the mean are marked. A more extensive set of values is given in Tabular array A of the print edition.

Figure 2.1

Missing alternative text Missing alternative text

The reason why the standard deviation is such a useful measure of the besprinkle of the observations is this: if the observations follow a Normal distribution, a range covered past one standard difference above the mean and one standard divergence below it

Missing alternative text Missing alternative text

includes near 68% of the observations; a range of two standard deviations in a higher place and two below (

) nigh 95% of the observations; and of three standard deviations above and 3 below (

) well-nigh 99.7% of the observations. Consequently, if we know the mean and standard divergence of a set of observations, we can obtain some useful information past unproblematic arithmetics. By putting ane, two, or 3 standard deviations to a higher place and below the mean we can estimate the ranges that would be expected to include about 68%, 95%, and 99.vii% of the observations.

Standard deviation from ungrouped information

The standard divergence is a summary measure of the differences of each observation from the hateful. If the differences themselves were added up, the positive would exactly residual the negative and so their sum would be zero. Consequently the squares of the differences are added. The sum of the squares is then divided by the number of observations minus oneto give the mean of the squares, and the square root is taken to bring the measurements back to the units nosotros started with. (The division by the number of observations minus oneinstead of the number of observations itself to obtain the hateful square is because "degrees of freedom" must be used. In these circumstances they are one less than the full. The theoretical justification for this need not trouble the user in practice.)

To gain an intuitive feel for degrees of freedom, consider choosing a chocolate from a box of n chocolates. Every time we come to choose a
chocolate nosotros have a choice, until we come up to the concluding one (usually one with a nut in it!), and then we accept no choice. Thus we have n-1 choices, or "degrees of freedom".

The adding of the variance is illustrated in Table two.one with the 15 readings in the preliminary study of urinary lead concentrations (Tabular array ane.2). The readings are set out in column (one). In column (2) the divergence between each reading and the mean is recorded. The sum of the differences is 0. In column (3) the differences are squared, and the sum of those squares is given at the bottom of the column.

Table two.1

Missing alternative text Missing alternative text

The sum of the squares of the differences (or deviations) from the hateful, 9.96, is at present divided by the total number of observation minus 1, to give the variance.Thus,

Missing alternative text Missing alternative text

In this case we find:

Missing alternative text Missing alternative text

Finally, the foursquare root of the variance provides the standard deviation:

Missing alternative text Missing alternative text

from which we become

Missing alternative text Missing alternative text

This procedure illustrates the construction of the standard deviation, in particular that the two extreme values 0.one and 3.2 contribute almost to the sum of the differences squared.

Figurer process

Most cheap calculators take procedures that enable 1 to calculate the mean and standard deviations straight, using the "SD" mode. For example, on modern Casio calculators one presses SHIFT and '.' and a little "SD" symbol should announced on the display. On earlier Casios one presses INV and MODE , whereas on a Sharp 2nd F and Stat should be used. The data are stored via the M+ push button. Thus, having set the figurer into the "SD" or "Stat" mode, from Table two.ane we enter 0.1 Grand+ , 0.4 Yard+ , etc. When all the information are entered, we can check that the correct number of observations take been included by Shift and north, and "15" should be displayed. The mean is displayed by Shift and

Missing alternative text Missing alternative text

and the standard deviation by Shift and

Missing alternative text Missing alternative text

. Avoid pressing Shift and Ac between these operations every bit this clears the statistical memory. There is another button on many calculators. This uses the divisor n rather than n – i in the calculation of the standard deviation. On a Precipitous calculator

Missing alternative text Missing alternative text

is denoted

Missing alternative text Missing alternative text

, whereas

Missing alternative text Missing alternative text

is denoted s. These are the "population" values, and are derived bold that an entire population is available or that interest focuses solely on the information in hand, and the results are not going to be generalised (see Chapter
three for details of samples and populations). Equally this situation very rarely arises,

Missing alternative text Missing alternative text

should be used and ignored, although even for moderate sample sizes the departure is going to exist pocket-size. Retrieve to return to normal fashion before resuming calculations because many of the usual functions are not available in "Stat" mode. On a modern Casio this is Shift 0. On earlier Casios and on Sharps ane repeats the sequence that phone call up the "Stat" mode. Some calculators stay in "Stat"
mode fifty-fifty when switched off.Mullee (1) provides advice on choosing and using a calculator. The figurer formulas use the human relationship

Missing alternative text Missing alternative text

The right paw expression can exist hands memorised by the expression mean of the squares minus the hateful square". The sample variance

Missing alternative text Missing alternative text

is obtained from

Missing alternative text Missing alternative text

The higher up equation can exist seen to be true in Tabular array 2.1, where the sum of the square of the observations,

Missing alternative text Missing alternative text

, is given as 43.7l.

We thus obtain

Missing alternative text Missing alternative text

the same value given for the total in column (iii). Intendance should exist taken because this formula involves subtracting two large numbers to get a small ane, and tin lead to incorrect results if the numbers are very big. For example, try finding the standard deviation of 100001, 100002, 100003 on a calculator. The correct respond is ane, but many calculators will give 0 because of rounding error. The solution is to decrease a large number from each of the observations (say 100000) and calculate the standard departure on the remainders, namely 1, two and 3.

Standard departure from grouped information

We can likewise calculate a standard deviation for detached quantitative variables. For example, in addition to studying the lead concentration in the urine of 140 children, the paediatrician asked how often each of them had been examined past a doctor during the year. After collecting the data he tabulated the data shown in Table 2.2 columns (ane) and (2). The mean is calculated by multiplying cavalcade (1) by column (2), calculation the products, and dividing by the total number of observations. Table two.2

Missing alternative text Missing alternative text

As we did for continuous data, to calculate the standard departure we square each of the observations in turn. In this case the observation is the number of visits, but because we have several children in each class, shown in column (2), each squared number (column (4)), must be multiplied by the number of children. The sum of squares is given at the human foot of column (five), namely 1697. We then utilise the estimator formula to find the variance:

Missing alternative text Missing alternative text

and

Missing alternative text Missing alternative text

.Note that although the number of visits is not Commonly distributed, the distribution is reasonably symmetrical nigh the hateful. The approximate 95% range is given by

Fig 2.19 Fig 2.19

This excludes two children with no visits and
six children with six or more visits. Thus there are 8 of 140 = 5.vii% outside the theoretical 95% range.Annotation that it is common for discrete quantitative variables to take what is known equally skeweddistributions, that is they are not symmetrical. One inkling to lack of symmetry from derived statistics is when the mean and the median differ considerably. Another is when the standard deviation is of the same society of magnitude every bit the hateful, but the observations must exist non-negative. Sometimes a transformation volition
convert a skewed distribution into a symmetrical one. When the data are counts, such as number of visits to a physician, oftentimes the square root transformation will assist, and if in that location are no cipher or negative values a logarithmic transformation volition render the distribution more than symmetrical.

Information transformation

An anaesthetist measures the pain of a process using a 100 mm visual analogue scale on seven patients. The results are given in Table 2.3, together with the log etransformation (the ln button on a calculator). Tabular array 2.iii

Missing alternative text Missing alternative text

The data are plotted in Figure 2.two, which shows that the outlier does not appear so extreme in the logged data. The hateful and median are 10.29 and 2, respectively, for the original data, with a standard deviation of 20.22. Where the mean is bigger than the median, the distribution is positively skewed. For the logged data the hateful and median are 1.24 and i.ten respectively, indicating that the logged information have a more than symmetrical distribution. Thus it would be better to analyse the logged transformed information
in statistical tests than using the original scale.Figure 2.2

Missing alternative text Missing alternative text

In reporting these results, the median of the raw information would be given, only it should be explained that the statistical examination wascarried out on the transformed data. Note that the median of the logged information is the aforementioned as the log of the median of the raw data – however, this is not true for the hateful. The mean of the logged data is non necessarily equal to the log of the mean of the raw data.
The antilog (exp or

Missing alternative text Missing alternative text

on a calculator) of the mean of the logged information is known as the geometric mean,and is oft a
better summary statistic than the hateful for data from positively skewed distributions. For these data the geometric hateful in three.45 mm.

Between subjects and inside subjects standard deviation

If repeated measurements are made of, say, blood pressure on an individual, these measurements are likely to vary. This is within field of study, or intrasubject, variability and we tin calculate a standard deviation of these observations. If the observations are close together in fourth dimension, this standard deviation is ofttimes described every bit the measurement error.Measurements made on unlike subjects vary co-ordinate to between subject, or intersubject, variability. If many observations were made on each individual, and the average taken, then we can assume that the intrasubject variability has been averaged out and the variation in the average values is due solely to the intersubject variability. Unmarried observations on individuals clearly contain a mixture of intersubject and intrasubject variation. The coefficient of variation(CV%) is the intrasubject standard divergence divided by the mean, expressed every bit a per centum. It is often quoted as a measure of repeatability for biochemical assays, when an assay is carried out on several occasions on the aforementioned sample. It has the reward of being independent of the units of measurement, simply also numerous theoretical disadvantages. It is usually nonsensical to use the coefficient of variation every bit a measure of between bailiwick variability.

Common questions

When should I use the mean and when should I use the median to describe my
data?

It is a commonly held misapprehension that for Unremarkably distributed data 1 uses the hateful, and for non-Normally distributed information ane uses the median. Alas this is not so: if the data are Unremarkably distributed the hateful and the median volition be close; if the information are not Normally distributed and then both the mean and the median may give useful information. Consider a variable that takes the value 1 for males and 0 for females. This is conspicuously not Normally distributed. However, the mean gives the proportion of males in the grouping, whereas the median merely tells us which grouping contained more 50% of the people. Similarly, the mean from ordered categorical variables can be more useful than the median, if the ordered categories tin be given meaningful scores. For example, a lecture might be rated as 1 (poor) to 5 (fantabulous). The usual statistic for summarising the consequence would be the hateful. In the situation where at that place is a pocket-size group at one extreme of a distribution (for example, annual income) then the median will be more "representative" of the distribution. My data must take values greater than zero and even so the mean and standard deviation are about the aforementioned size. How does this happen? If data have a very skewed distribution, then the standard difference will be grossly inflated, and is not a good measure of variability to use. As we accept shown, occasionally a transformation of the information, such as a log transform, will return the distribution more symmetrical. Alternatively, quote the interquartile range.

References

1. Mullee Chiliad A. How to choose and use a reckoner. In: How to do it 2.BMJ
Publishing Group, 1995:58-62.

Exercises

Exercise 2.ane

In the campaign confronting smallpox a doc inquired into the number of times 150 people aged 16 and over in an Ethiopian village had been vaccinated. He obtained the following figures: never, 12 people; once, 24; twice, 42; three times, 38; four times, 30; five times, 4. What is the mean number of times those people had been vaccinated and what is the standard deviation?Answer

Practice 2.2

Obtain the mean and standard deviation of the data in and an guess
95% range.Reply

Exercise 2.three

Which points are excluded from the range mean – 2SD to mean + 2SD? What
proportion of the data is excluded? Answers
Chapter 2 Q3.pdfAnswer

herringbelive.blogspot.com

Source: https://www.bmj.com/about-bmj/resources-readers/publications/statistics-square-one/2-mean-and-standard-deviation

Post a Comment for "How to Know How Many Views You Got on a Deviation"