99 confidence how many standard deviations
The earlier sections covered estimation of statistics. This section considers how precise these estimates may be. Please now read the resource text below. A series of samples drawn from one population will not be identical. They will show chance variations from one to another, and the variation may be slight or considerable. For example, a series of samples of the body temperature of healthy people would show very little variation from one to another, but the variation between samples of the systolic blood pressure would be considerable.
Thus the variation between samples depends partly on the amount of variation in the population from which they are drawn. Furthermore, it is a matter of common observation that a small sample is a much less certain guide to the population from which it was drawn than a large sample.
In other words, the more people that are included in a sample, the greater chance that the sample will accurately represent the population, provided that a random process is used to construct the sample.
A consequence of this is that if two or more samples are drawn from a population, then the larger they are, the more likely they are to resemble each other - again, provided that the random sampling technique is followed.
Thus the variation between samples depends partly also on the size of the sample. If we draw a series of samples and calculate the mean of the observations in each, we have a series of means.
These means generally follow a normal distribution, and they often do so even if the observations from which they were obtained do not. This can be proven mathematically and is known as the "Central Limit Theorem". The series of means, like the series of observations in each sample, has a standard deviation. The standard error of the mean of one sample is an estimate of the standard deviation that would be obtained from the means of a large number of samples drawn from that population.
As noted above, if random samples are drawn from a population, their means will vary from one to another. The variation depends on the variation of the population and the size of the sample. We do not know the variation in the population so we use the variation in the sample as an estimate of it. This is expressed in the standard deviation. If we now divide the standard deviation by the square root of the number of observations in the sample we have an estimate of the standard error of the mean.
It is important to realise that we do not have to take repeated samples in order to estimate the standard error; there is sufficient information within a single sample.
However, the concept is that if we were to take repeated random samples from the population, this is how we would expect the mean to vary, purely by chance.
Example 1 A general practitioner has been investigating whether the diastolic blood pressure of men aged differs between printers and farm workers. For this purpose, she has obtained a random sample of 72 printers and 48 farm workers and calculated the mean and standard deviations, as shown in table 1. Table 1: Mean diastolic blood pressures of printers and farmers. When you compute a confidence interval on the mean, you compute the mean of a sample in order to estimate the mean of the population.
Clearly, if you already knew the population mean, there would be no need for a confidence interval. However, to explain how confidence intervals are constructed, we are going to work backwards and begin by assuming characteristics of the population.
Then we will show how sample data can be used to construct a confidence interval. Assume that the weights of year-old children are normally distributed with a mean of 90 and a standard deviation of What is the sampling distribution of the mean for a sample size of 9?
Note that the standard deviation of a sampling distribution is its standard error. Figure 1 shows this distribution. These limits were computed by adding and subtracting 1. The value of 1. Figure 1. The sample size is denoted by n, and we let x denote the number of "successes" in the sample. For example, if we wish to estimate the proportion of people with diabetes in a population, we consider a diagnosis of diabetes as a "success" i.
If there are more than 5 successes and more than 5 failures, then the confidence interval can be computed with this formula:. The point estimate for the population proportion is the sample proportion, and the margin of error is the product of the Z value for the desired confidence level e.
In other words, the standard error of the point estimate is:. This formula is appropriate for large samples, defined as at least 5 successes and at least 5 failures in the sample. This was a condition for the Central Limit Theorem for binomial outcomes. If there are fewer than 5 successes or failures then alternative procedures, called exact methods, must be used to estimate the population proportion.
Example: During the 7th examination of the Offspring cohort in the Framingham Heart Study there were participants being treated for hypertension and 2, who were not on treatment. The sample proportion is:. This is the point estimate, i. The sample is large, so the confidence interval can be computed using the formula:.
Specific applications of estimation for a single population with a dichotomous outcome involve estimating prevalence, cumulative incidence, and incidence rates. The table below, from the 5th examination of the Framingham Offspring cohort, shows the number of men and women found with or without cardiovascular disease CVD. There are many situations where it is of interest to compare two groups with respect to their mean scores on a continuous outcome.
For example, we might be interested in comparing mean systolic blood pressure in men and women, or perhaps compare body mass index BMI in smokers and non-smokers. Both of these situations involve comparisons between two independent groups, meaning that there are different people in the groups being compared.
We could begin by computing the sample sizes n 1 and n 2 , means and , and standard deviations s 1 and s 2 in each sample. The point estimate for the difference in population means is the difference in sample means:. The confidence interval will be computed using either the Z or t distribution for the selected confidence level and the standard error of the point estimate. The standard error of the point estimate will incorporate the variability in the outcome of interest in each of the comparison groups.
If we assume equal variances between groups, we can pool the information on variability sample variances to generate an estimate of the population variability. Therefore, the standard error SE of the difference in sample means is the pooled estimate of the common standard deviation Sp assuming that the variances in the populations are similar computed as the weighted average of the standard deviations in the samples, i.
If the sample sizes are larger, that is both n 1 and n 2 are greater than 30, then one uses the z-table. For both large and small samples Sp is the pooled estimate of the common standard deviation assuming that the variances in the populations are similar computed as the weighted average of the standard deviations in the samples.
These formulas assume equal variability in the two populations i. For analysis, we have samples from each of the comparison populations, and if the sample variances are similar, then the assumption about variability in the populations is reasonable. If not, then alternative formulas must be used to account for the heterogeneity in variances. Next, we will check the assumption of equality of population variances.
The ratio of the sample variances is Notice that for this example Sp, the pooled estimate of the common standard deviation, is 19, and this falls in between the standard deviations in the comparison groups i.
Therefore, the confidence interval is 0. Our best estimate of the difference, the point estimate, is 1. The standard error of the difference is 0. Note that when we generate estimates for a population parameter in a single sample e. In contrast, when comparing two independent samples in this fashion the confidence interval provides a range of values for the difference. In this example, we estimate that the difference in mean systolic blood pressures is between 0.
In this example, we arbitrarily designated the men as group 1 and women as group 2. Had we designated the groups the other way i. The table below summarizes differences between men and women with respect to the characteristics listed in the first column. The second and third columns show the means and standard deviations for men and women respectively.
Men have lower mean total cholesterol levels than women; anywhere from The men have higher mean values on each of the other characteristics considered indicated by the positive confidence intervals. The confidence interval for the difference in means provides an estimate of the absolute difference in means of the outcome variable of interest between the comparison groups. It is often of interest to make a judgment as to whether there is a statistically meaningful difference between comparison groups.
This judgment is based on whether the observed difference is beyond what one would expect by chance. If there is no difference between the population means, then the difference will be zero i. Zero is the null value of the parameter in this case the difference in means. If the confidence interval does not include the null value, then we conclude that there is a statistically significant difference between the groups. For each of the characteristics in the table above there is a statistically significant difference in means between men and women, because none of the confidence intervals include the null value, zero.
Note, however, that some of the means are not very different between men and women e. This means that there is a small, but statistically meaningful difference in the means. When there are small differences between groups, it may be possible to demonstrate that the differences are statistically significant if the sample size is sufficiently large, as it is in this example.
The following table contains descriptive statistics on the same continuous characteristics in the subsample stratified by sex.
We will again arbitrarily designate men group 1 and women group 2. Since the sample sizes are small i. However,we will first check whether the assumption of equality of population variances is reasonable. The ratio of the sample variances is 9. The solution is shown below. Note that again the pooled estimate of the common standard deviation, Sp, falls in between the standard deviations in the comparison groups i.
Interpretation: Our best estimate of the difference, the point estimate, is The standard error of the difference is 6. In this sample, the men have lower mean systolic blood pressures than women by 9.
Again, the confidence interval is a range of likely values for the difference in means. Since the interval contains zero no difference , we do not have sufficient evidence to conclude that there is a difference. The previous section dealt with confidence intervals for the difference in means between two independent groups.
There is an alternative study design in which two comparison groups are dependent, matched or paired. Consider the following scenarios:. A goal of these studies might be to compare the mean scores measured before and after the intervention, or to compare the mean scores obtained with the two conditions in a crossover study. Yet another scenario is one in which matched samples are used. For example, we might be interested in the difference in an outcome between twins or between siblings.
Once again we have two samples, and the goal is to compare the two means. However, the samples are related or dependent.
The confidence intervals should have been based on t distributions with 24 and 21 degrees of freedom respectively. The divisor for the experimental intervention group is 4. Calculations for the control group are performed in a similar way.
It is important to check that the confidence interval is symmetrical about the mean the distance between the lower limit and the mean is the same as the distance between the mean and the upper limit. If this is not the case, the confidence interval may have been calculated on transformed values see Section 7.
0コメント