Tests for Equality of Population Proportions
Motivation: we are interested in whether the prevalence of some characteristic is the same or different between two populations Principles: the sampling distribution of ps1 − ps2 has these properties:
„ ps1 − ps2 is normally distributed „ the usual hypothesis is that p1 = p2, E(ps1 − ps2) = 0
„ the best estimate we can give of unknown p is the weighted average of the two sample proportions, so the standard error is given by
n × p + n × p
So we will test the hypothesis H0: p1 = p2 with a z test in which a critical z value is compared to the value calculated by
NOTE: we have omitted the implicit term p1 − p2
Example: In a marketing survey of 400 adults age 40 and over, 230 said they sometimes view Baywatch; in the same survey, of 300 adults under age 40, 210 sometimes view this show. Can we conclude that younger people are more likely to watch the show than older people?
proportion problems are always z problems let α = 0.05, then zC = 1.645 ps1 = 210/300 = 0.7; ps2 = 230/400 = 0.575
n × ps + n × p
(This is of course also (x1 + x2) ÷ (n1 + n2) = (210 + 230) ÷ (300 + 400). )
Thus we reject the null hypothesis and conclude that younger people are more likely to watch Baywatch Alternatively: PS1 − PS2 = .7 − .575 = 0.125, and the presumed mean = 0, so 1 − NORMDIST(0.125, 0, 0.0369, TRUE) = 0.000353 = p-value of the test
Example: The new Drug X is proposed as a remedy for depression. We wish to determine whether the proportion of patients who receive Drug X and recover from depression is the same as the proportion who receive Prozac and recover. In a sample of 200 patients given Prozac for depression, 150 recover within three months; in a sample of 220 patients given Drug X, 140 recover within three months. Is there evidence that Drug X’s effectiveness is equal to Prozac’s?
Ø “Evidence” would mean that we cannot reject the null hypothesis H0: p1 = p2 where groups 1 and 2 are respectively Prozac and X patients, so the appropriate statistical procedure is a hypothesis test. The alternate hypothesis is H1: p1 ≠ p2. Let the test be conducted at α = 0.05. The appropriate test statistic is a z. From above, p = (150 + 140) ÷ (200 + 220) = 0.69; ps1 = 150/200 = 0.75, while ps2 = 140/220 = 0.636.
Ø Three equivalent ways to proceed
„ Find the critical z value for a 5% test: NORMSINV(0.025) = −1.96, so the critical value is ±1.96; Here z = (ps1 − ps2)/sps1-ps2 = (0.75 − 0.69)/0.045 = 2.52 > 1.96 so we reject the null hypothesis „ With the calculated z value 2.52, use 1 − NORMSDIST(2.52) = 0.006; since this is a two-tailed test, the p-value of the test = 2 × 0.006 = 0.012 < 0.05, so we reject H0
„ Use NORMDIST: the difference between sample proportions = 0.75 − 0.636 = 0.114 on a distribution whose mean is assumed to be zero and whose standard deviation is 0.045: 1 − NORMDIST(0.114, 0, 0.045, true) = 0.006. For a two-tailed test, this must be multiplied by 2 = 0.012 < 0.05
Ø Whichever way we proceed, we reject H0 and conclude that Drug X’s effectiveness is NOT equal to that of Prozac.
Contingency Tables, Tests for Equality of Proportions
Contingency Tables or Tests of Independence: is A like B like C?
We frequently are interested in whether two or more populations are the same or different with respect to two or more characteristics – if there are more than two populations and/or more than two characteristics of interest, simple pair-wise hypothesis tests run into the problem noted several times before, and a different method is required. The simplest case occurs when there are two samples, classified with regard to one characteristic, as in the Drug X/Prozac test just discussed. Presented a bit differently, that test would look like this: Medication/Outcome Recovered
Again, H0: p1 = p2 vs. H1: p1 ≠ p2. This may also be phrased as a test of independence: the null hypothesis is that recovery is independent of the drug administered. The χ2 distribution is based on the difference between observed frequencies and expected frequencies on the assumption that the population proportions are equal. Its formula may be written as
where fo indicates the observed frequency and fe the expected
frequency; the meaning of “cell” will become clear momentarily. The χ2 distribution also depends on the degrees of freedom; in this case df = (c − 1) × (r − 1) where c is the number of columns in the table and r the number of rows. To find the expected frequencies, consider this: if there is no difference in the two medications, we have in effect one sample of size 420; the best estimate we could give of the proportion who will recover with medication is the proportion in the total sample who recovered, and the best estimate we can give of the proportion who won’t recover is the proportion in the total sample who didn’t. Hence, ps = (total recovered)/(total sample) = 290/420 = 0.69. p ′
recover, then in the sample of 200 Prozac takers we would expect that 0.69 × 200 = 138 will recover and 0.31 × 200 = 62 will not recover. Similarly, from the 220 Drug X patients, we expect 0.69 × 220 = 151.8 to recover and 0.31 × 220 = 68.2 to fail of recovery. Introducing those numbers into the table above yields:
Medication/Outcome Recovered
Each of the blocks with an observed and an expected frequency constitutes a cell, so
χ2 = (150 − 138)2/138 + (140 − 151.8)2/151.8 + (50 − 62)2/62 + (80 − 68.2)2/68.2 = 6.32. From this point, we can proceed in any of three ways; we can look up a critical value from a table and compare it to the calculated value, or we can accomplish exactly the same thing with Excel’s CHIINV formula. Enter chiinv(0.05,1): the result is 3.84; this is a value such that only 0.05 of the χ2 distribution with 1 degree of freedom lies beyond that value. Since 6.32 > 3.84, we will reject the null hypothesis. Alternatively, we can use the CHIDIST formula: chidist(6.32,1) = 0.012. This is the p-value of the test, and since 0.012 < 0.05 we again reject H0. Notice that the p- value is the same as we found above using a z test; this will always be the case for two-tailed tests with only two samples. Excel Formulas:
„ CHIINV(upper tail, df) gives the χ2 which demarcates the area entered as upper tail for the given degrees of freedom. „ CHIDIST(calculated value, df) gives the probability of a χ2 as large or larger than the one calculated for the given degrees of freedom. This is the p- value of the test.
The χ2 procedure becomes important when there are three or more populations or three or more treatments or classifications. In that case, pairwise hypothesis testing is not appropriate. Except for more complicated arithmetic, the procedure is the same.
Example: Is there any difference of effect in three different methods of teaching reading: sight reading, phonics, and computer-assisted “write-to-read”? An experiment is arranged in which 150 pupils in a public school are taught sight reading, another 129 by phonics and 132 by write-to-read. After eight weeks, in the first group 89 are reading proficiently, in the second group 78 are reading proficiently, and in the third group 99. To summarize in a contingency table: Teaching Method/Level Not Proficient
The basic question is whether the proportion of pupils who learned to read by each of the three methods is the same or different so that the implicit null hypothesis is H0: p1 = p2 = p3 vs. H1: at least one proportion is different. Suppose that the proportion is the same, that H0 is true:
„ of a total sample of 411 pupils, 263 became proficient readers: 263 ÷ 411 = 0.64, while 148 ÷ 411 = 0.36 were not proficient
„ if there is no difference in the effectiveness of these three methods, the proportion of pupils who learn to read, whatever the method, is about 0.64
w nonetheless, because of sampling error, we would not expect to have exactly this proportion in each group. w But if the proportions are much different, then we cannot believe the initial hypothesis that there is no difference in effect
„ on the assumption of no difference, we can calculate the expected number in each group who should be proficient and the number who are not, as below:
In the table above, note that the expected numbers are found by assuming that the common proportion 0.64 learned to read, while the common proportion 1 − 0.64= 0.36 did NOT learn to read
thus 54.0146 = 0.36 x 150; 95.9854 = 0.64 x 150, and so on
„ 150 tried to learn by sight reading; if the same proportion learned by that method as in the sample as a whole, 0.64 should be proficient and 0.36 should not
„ let Eij be the expected number in the i-th row and j-th column; then we can write the formula Eij = [(column total for column j)/(grand total)] x (row total for row i) for example, E22 = (263/411) x 129 = 82.5474
we have now in each cell an expected and an actual number; the differences between expected and actual conform to the χ2 distribution, or, more precisely, the following quantity follows a χ2 probability distribution:
The double sums in the first version merely direct us to take each difference between an observed and expected value, square the result, divide by the expected value and then sum over both rows and columns
„ from the table above: χ2 = (64 − 54.0146)2/54.0146 + (51 − 46.453)2/46.453 +
(33 − 47.533)2/47.533 + (86 − 95.99)2/95.99 + (79 − 82.5474)2/82.5474 +(99 − 84.47)2/84.47 = 10.524
„ the χ2 statistic has (c − 1) x (r − 1) degrees of freedom, where c is the number of columns in the contingency table and r is the number of rows in the original contingency table, or the number of rows of cells. Here we have (2 − 1) x (3 − 1) = 2 df. Using chiinv we find that with 2 df. the critical chi-square for a 0.01 tail = 9.210. Thus, we can say that if there is no difference in effect between these methods of teaching reading, the result we got would occur by chance less than 1% of the time. More precisely, we find chidist(10.524, 2) = 0.0052, the p- value of the test. At α = 0.01, we can reject the hypothesis that the three teaching methods produce equal results.
Again, these contingency table tests are hypothesis tests:
Step 1: state the hypotheses: H0: whether a student learns to read is independent of the method used to teach him H1: learning is not independent of teaching method Alternately, we could phrase the hypotheses as H0: the proportion who learn to read is the same for all teaching methods H1: the proportion who learn to read differs with teaching method Or H0: p1 = p2 = p3 H1: at least one proportion is not equal Step 2: Find appropriate sampling distribution/test statistic: The appropriate test statistic is a χ2 with (c − 1) x (r − 1) = 2 df. Step 3: Choose a significance level and find critical value of statistic: Let us choose α = 0.01; then χ2C = 9.210 Step 4: Choose sample and conduct calculations as above; 10.524 > 9.210. Alternatively, find the p-value = 0.0052 < 0.01. In either case, reject H0. Step 5: Make a decision: put your money into “write-to-read”?
Spreadsheet Computation for Contingency Tables
=CHITEST(range with actual values, range with expected values) delivers the p- value of the test, but it is necessary to construct the expected values Example:
9 H-0: Murder methods do not vary by region
cell B15 contains the expression =(B$7/ $E$7)*$E3;
„ this actually comes to (column total ÷ grand total) × row total, as in the expression above
„ cell B15 is then copied across and down to include everything through cell D18. The cell addresses will adjust automatically.
„ cell B12 contains the expression =CHITEST(B3:D6,B15:D18)
Ø Note that the ranges with the actual and expected frequencies must be of the same size, and for different size contingency tables these formulas must be adjusted accordingly Ø Do NOT attempt mindlessly to reproduce the spreadsheet shown above; cell addresses must match the ranges in which you’ve placed data, and the dollar signs – which freeze the cell address under Copy operations – must be properly placed. The top-left cell in your expected frequencies table should represent [(total for first column of data)/(total for all data)]×[total for first row of data]. As you copy, the column totals should be adjusted to represent the second column, then third column of data: hence in the address representing the total for first column, the column address does not have a $ since it must change; the row address has a $ since it is not to change. Conversely, in the address representing the first row total, the column address has a $ but not the row address.
Comparing Proportions among Three or More Populations
Purpose: to avoid a set of pair-wise comparisons – similar to the concerns we had with ANOVA, χ2 with contingency tables and so on
We will have a set of samples drawn independently from three or more populations, and we wish to know whether the proportion having some characteristic is the same or different among these populations Example: A national survey of gun ownership revealed the following pattern by self-described political orientation:
Politics: Number in Sample Number Owning a Gun %-age Liberal 412 97 23.54 Moderate 523 140 26.77 Conservative 489 168 34.36 Can we conclude that there is a significant difference in gun ownership by political leanings?
These are hypothesis tests of the hypotheses
H0: p1 = p2 = …= pn H1: at least one equality does not hold
assume that the samples are drawn independently from the respective populations note that in the example above, the numbers in each group are presented; this is necessary for the χ2 calculation
calculate the test statistic χ = ∑ ∑
the double sum ΣΣ simply means that we’re adding down columns and then across rows, that is, over all cells; in practice you calculate a squared difference for each cell then add them up the expected frequencies are found by first calculating the proportion of the whole sample with the characteristic of interest, then applying this proportion to the number in each sub-sample to find the expected frequency note that the actual calculations are done in the same way as if we had an n x 2 or 2 x n contingency table – indeed, this is really a special case of a contingency table these are always two-tailed tests
degrees of freedom = k − 1 where k is the number of categories into which the sample is divided – this follows because in a contingency table we have (c − 1) x (r − 1) degrees of freedom. Depending on how we’ve laid the data out, either c or r = 2
Solution to Example:
1. H0: p1 = p2 = p3 H1: at least one equality is untrue 2. test statistic is χ2 with k − 1 = 2 df. 3. let’s pick α = 0.01; then χ2C = 9.210 4. from above, we have a total of 405 gun owners in a sample of 1424 people, so that the overall sample proportion = 405/1424 = 0.2844; applying this proportion to the number in each political group gives the following
Politics: No. in Sample No. Owning a Gun Expected Number Liberal 412 97 117.18 Moderate 523 140 148.75 Conservative 489 168 139.08 Then χ2 = (97 − 117.18)2/117.18 + (140 − 148.75)2/148.75 + (168 − 139.08)2/139.08 + (315 − 20=94.8)2/294.8 + (383 − 374.3)2/374.3 + (321 − 349.9)2/349.9 = 13.98 Since 13.98 > 5.99 we can reject the null
5. the data fairly strongly imply that self- identified conservatives are more likely to own guns
Ø Spreadsheet solutions to proportion problems: apply the techniques discussed above for contingency tables
„ note that it is irrelevant whether the data are in two columns or two rows Example: We have a sample of 3600 people who’ve been asked whether they watch the new hit sitcom “Yosef and the Seven Woolly Worms”; the sample may be subdivided by age, and the results are in the following table. We would like to know whether there is a systematic difference in the proportions of various age groups who watch Y&SWW
DATA: Age: 15-22 23-25 36-45 46-55 56 and up Totals No. Watching 400 600 500 200 100 1800 No. Not Watching 200 400 500 400 300 1800
The solution of this example is left to the student as an exercise
note that here the data are arrayed in rows instead of columns for this test, χ2 = 273.33, and the p value is effectively zero
Do sample data appear to come from a population of given distribution?
„ Once the expected frequencies have been determined, a goodness-of-fit test is formally identical to a test of proportions; that is, it’s a 2-column contingency table
EXAMPLE: Dr. Dinwiddie believes that her tests are perfectly random with respect to correct answers. On a recent test there were sixty questions; the correct answer was “A” to 20 questions, “B” to 5 questions, “C” to 17 questions, and “D” to 18 questions. Is she random? Solution: This exam may be viewed as a sample of the population of Dinwiddie’s
questions. If completely random, the answers conform to a discrete uniform distribution, so one-quarter of correct answers should be A, B, C, or D, although any one test might depart to some degree. Accordingly, we ha ve: Correct Answer
So χ2 = 1.67 + 6.67 + .267 + .6 = 9.207; p-value = 0.027 Normal distribution: In the normal, 0.34 of the distribution is between the mean and +1 st. dev.; 0.14 of the distribution is between +1 and +2 st. dev.; and 0.023 is more than 2 st. dev. above the mean. Since the distribution is symmetric, the same proportions hold for between the mean and −1 st. dev. and so on. Suppose we have a sample of 100 with mean = 50 and st. dev. = 5. If the sample was drawn from a normal population, the following table would give the expected frequencies:
Expected Frequency
These expected frequencies will be used with the frequencies in those ranges actually observed in the sample to calculate a χ2 statistic.
Conducting a GM@W Organizational Review: Getting Started Why do a GM@W Organizational Review? A GM@W Organizational Review of existing policies and practices related to the protection of your employees’ psychological health is the first step toward determining the extent to which your workplace is psychologically safe. A psychologically healthy and safe workplace is one that promotes empl
Vortrag Dr. Stephan Schleissing Röntgen-Gymnasium Würzburg, 24. Juni 2010 Das Innere röntgen Der Mensch im Blick von Naturwissenschaft und Theologie wo man Hände, Körper, ja das Gehirn durchleuchten kann, – bleibt da noch irgendetwas im Menschen selber unentdeckt, unerkannt und höchst persönlich? Müssen wir uns vor so einem „Röntgenblick“, der auch gut verhüllte Körper