# 5.3: Goodness-of-Fit Test

In this type of hypothesis test, you determine whether the data "fit" a particular distribution or not. You use a chi-square test (meaning the distribution for the hypothesis test is chi-square) to determine if there is a fit or not. The null and the alternative hypotheses for this test may be written in sentences or may be stated as equations or inequalities.

The test statistic for a goodness-of-fit test is:

where:

• (O =) observed values (data)
• (E =) expected values (from theory)
• (k =) the number of different data cells or categories

The observed values are the data values and the expected values are the values you would expect to get if the null hypothesis were true. There are (n) terms of the form (frac{(O - E)^{2}}{E}).

The number of degrees of freedom is (df = ( ext{number of categories} - 1)).

The goodness-of-fit test is almost always right-tailed. If the observed values and the corresponding expected values are not close to each other, then the test statistic can get very large and will be way out in the right tail of the chi-square curve.

The expected value for each cell needs to be at least five in order for you to use this test.

Example 11.3.1

Absenteeism of college students from math classes is a major concern to math instructors because missing class appears to increase the drop rate. Suppose that a study was done to determine if the actual student absenteeism rate follows faculty perception. The faculty expected that a group of 100 students would miss class according to Table.

Number of absences per termExpected number of students
0–250
3–530
6–812
9–116
12+2

A random survey across all mathematics courses was then done to determine the actual number (observed) of absences in a course. The chart in Table displays the results of that survey.

Number of absences per termActual number of students
0–235
3–540
6–820
9–111
12+4

Determine the null and alternative hypotheses needed to conduct a goodness-of-fit test.

• (H_{0}): Student absenteeism fits faculty perception.

The alternative hypothesis is the opposite of the null hypothesis.

• (H_{a}): Student absenteeism does not fit faculty perception.

Exercise (PageIndex{1}).1

a. Can you use the information as it appears in the charts to conduct the goodness-of-fit test?

a. No. Notice that the expected number of absences for the "12+" entry is less than five (it is two). Combine that group with the "9–11" group to create new tables where the number of students for each entry are at least five. The new results are in Table and Table.

Number of absences per termExpected number of students
0–250
3–530
6–812
9+8
Number of absences per termActual number of students
0–235
3–540
6–820
9+5

Exercise (PageIndex{1}).2

b. What is the number of degrees of freedom ((df))?

b. There are four "cells" or categories in each of the new tables.

(df = ext{number of cells} - 1 = 4 - 1 = 3)

Exercise (PageIndex{1})

how many are produced. The number of expected defects is listed in Table.

Number producedNumber defective
0–1005
101–2006
201–3007
301–4008
401–50010

A random sample was taken to determine the actual number of defects. Tableshows the results of the survey.

Number producedNumber defective
0–1005
101–2007
201–3008
301–4009
401–50011

State the null and alternative hypotheses needed to conduct a goodness-of-fit test, and state the degrees of freedom.

(H_{0}):The number of defaults fits expectations.

(H_{a}):The number of defaults does not fit expectations.

(df = 4)

Example 11.3.2

Employers want to know which days of the week employees are absent in a five-day work week. Most employers would like to believe that employees are absent equally during the week. Suppose a random sample of 60 managers were asked on which day of the week they had the highest number of employee absences. The results were distributed as in Table. For the population of employees, do the days for the highest number of absences occur with equal frequencies during a five-day work week? Test at a 5% significance level.

Day of the Week Employees were Most Absent
MondayTuesdayWednesdayThursdayFriday
Number of Absences15129915

The null and alternative hypotheses are:

• (H_{0}): The absent days occur with equal frequencies, that is, they fit a uniform distribution.
• (H_{a}): The absent days occur with unequal frequencies, that is, they do not fit a uniform distribution.

If the absent days occur with equal frequencies, then, out of 60 absent days (the total in the sample: (15 + 12 + 9 + 9 + 15 = 60)), there would be 12 absences on Monday, 12 on Tuesday, 12 on Wednesday, 12 on Thursday, and 12 on Friday. These numbers are the expected ((E)) values. The values in the table are the observed ((O)) values or data.

This time, calculate the (chi^{2}) test statistic by hand. Make a chart with the following headings and fill in the columns:

• Expected ((E)) values ((12, 12, 12, 12, 12))
• Observed ((O)) values ((15, 12, 9, 9, 15))
• ((O – E))
• ((O – E)^{2})
• (frac{(O - E)^{2}}{E})

Now add (sum) the last column. The sum is three. This is the (chi^{2}) test statistic.

To find the p-value, calculate (P(chi^{2} > 3)). This test is right-tailed. (Use a computer or calculator to find the p-value. You should get (p ext{-value} = 0.5578).)

The (dfs) are the ( ext{number of cells} - 1 = 5 - 1 = 4)

Press2nd DISTR. Arrow down to (chi^{2})cdf. PressENTER. Enter(3,10^99,4). Rounded to four decimal places, you should see 0.5578, which is the (p ext{-value}).

Next, complete a graph like the following one with the proper labeling and shading. (You should shade the right tail.) Figure (PageIndex{1}).

The decision is not to reject the null hypothesis.

Conclusion: At a 5% level of significance, from the sample data, there is not sufficient evidence to conclude that the absent days do not occur with equal frequencies.

TI-83+ and some TI-84 calculators do not have a special program for the test statistic for the goodness-of-fit test. The next example Example has the calculator instructions. The newer TI-84 calculators have inSTAT TESTSthe testChi2 GOF. To run the test, put the observed values (the data) into a first list and the expected values (the values you expect if the null hypothesis is true) into a second list. PressSTAT TESTSandChi2 GOF. Enter the list names for the Observed list and the Expected list. Enter the degrees of freedom and presscalculateordraw. Make sure you clear any lists before you start. To Clear Lists in the calculators: Go intoSTAT EDITand arrow up to the list name area of the particular list. PressCLEARand then arrow down. The list will be cleared. Alternatively, you can pressSTATand press 4 (forClrList). Enter the list name and pressENTER.

Exercise (PageIndex{2})

Teachers want to know which night each week their students are doing most of their homework. Most teachers think that students do homework equally throughout the week. Suppose a random sample of 49 students were asked on which night of the week they did the most homework. The results were distributed as in Table.

SundayMondayTuesdayWednesdayThursdayFridaySaturday
Number of Students1181071055

From the population of students, do the nights for the highest number of students doing the majority of their homework occur with equal frequencies during a week? What type of hypothesis test should you use?

(df = 6)

(p ext{-value} = 0.6093)

We decline to reject the null hypothesis. There is not enough evidence to support that students do not do the majority of their homework equally throughout the week.

Example 11.3.3

One study indicates that the number of televisions that American families have is distributed (this is the given distribution for the American population) as in Table.

Number of TelevisionsPercent
010
116
255
311
4+8

The table contains expected ((E)) percents.

A random sample of 600 families in the far western United States resulted in the data in Table.

Number of TelevisionsFrequency
Total = 600
066
1119
2340
360
4+15

The table contains observed ((O)) frequency values.

Exercise (PageIndex{3}).1

At the 1% significance level, does it appear that the distribution "number of televisions" of far western United States families is different from the distribution for the American population as a whole?

This problem asks you to test whether the far western United States families distribution fits the distribution of the American families. This test is always right-tailed.

The first table contains expected percentages. To get expected (E) frequencies, multiply the percentage by 600. The expected frequencies are shown in Table.

Number of TelevisionsPercentExpected Frequency
010(0.10)(600) = 60
116(0.16)(600) = 96
255(0.55)(600) = 330
311(0.11)(600) = 66
over 38(0.08)(600) = 48

Therefore, the expected frequencies are 60, 96, 330, 66, and 48. In the TI calculators, you can let the calculator do the math. For example, instead of 60, enter (0.10*600).

(H_{0}): The "number of televisions" distribution of far western United States families is the same as the "number of televisions" distribution of the American population.

(H_{a}): The "number of televisions" distribution of far western United States families is different from the "number of televisions" distribution of the American population.

Distribution for the test: (chi^{2}_{4}) where (df = ( ext{the number of cells}) - 1 = 5 - 1 = 4).

Note 11.3.3.1

(df eq 600 - 1)

Calculate the test statistic: (chi^{2} = 29.65)

Graph: Figure (PageIndex{2}).

Probability statement: (p ext{-value} = P(chi^{2} > 29.65) = 0.000006)

Compare α and the p-value:

(alpha = 0.01)

(p ext{-value} = 0.000006)

So, (alpha > p ext{-value}).

Make a decision: Since (alpha > p ext{-value}), reject (H_{0}).

This means you reject the belief that the distribution for the far western states is the same as that of the American population as a whole.

Conclusion: At the 1% significance level, from the data, there is sufficient evidence to conclude that the "number of televisions" distribution for the far western United States is different from the "number of televisions" distribution for the American population as a whole.

PressSTATandENTER. Make sure to clear listsL1,L2, andL3if they have data in them (see the note at the end of Example). IntoL1, put the observed frequencies66,119,349,60,15. IntoL2, put the expected frequencies.10*600, .16*600,.55*600,.11*600,.08*600. Arrow over to listL3and up to the name area"L3". Enter(L1-L2)^2/L2andENTER. Press2nd QUIT. Press2nd LISTand arrow over toMATH. Press5. You should see"sum" (Enter L3). Rounded to 2 decimal places, you should see29.65. Press2nd DISTR. Press7or Arrow down to7:χ2cdfand pressENTER. Enter(29.65,1E99,4). Rounded to four places, you should see5.77E-6 = .000006(rounded to six decimal places), which is the p-value.

The newer TI-84 calculators have inSTAT TESTSthe testChi2 GOF. Make sure you clear any lists before you start.

Exercise (PageIndex{3})

The expected percentage of the number of pets students have in their homes is distributed (this is the given distribution for the student population of the United States) as in Table.

Number of PetsPercent
018
125
230
318
4+9

A random sample of 1,000 students from the Eastern United States resulted in the data in Table.

Number of PetsFrequency
0210
1240
2320
3140
4+90

At the 1% significance level, does it appear that the distribution “number of pets” of students in the Eastern United States is different from the distribution for the United States student population as a whole? What is the (p ext{-value})?

(p ext{-value} = 0.0036)

We reject the null hypothesis that the distributions are the same. There is sufficient evidence to conclude that the distribution “number of pets” of students in the Eastern United States is different from the distribution for the United States student population as a whole.

Example 11.3.4

Suppose you flip two coins 100 times. The results are 20 HH, 27 HT, 30 TH, and 23 TT. Are the coins fair? Test at a 5% significance level.

This problem can be set up as a goodness-of-fit problem. The sample space for flipping two fair coins is ({HH, HT, TH, TT}). Out of 100 flips, you would expect 25 HH, 25 HT, 25 TH, and 25 TT. This is the expected distribution. The question, "Are the coins fair?" is the same as saying, "Does the distribution of the coins ((20 HH, 27 HT, 30 TH, 23 TT)) fit the expected distribution?"

Random Variable: Let (X =) the number of heads in one flip of the two coins. (X) takes on the values 0, 1, 2. (There are 0, 1, or 2 heads in the flip of two coins.) Therefore, the number of cells is three. Since (X =) the number of heads, the observed frequencies are 20 (for two heads), 57 (for one head), and 23 (for zero heads or both tails). The expected frequencies are 25 (for two heads), 50 (for one head), and 25 (for zero heads or both tails). This test is right-tailed.

(H_{0}): The coins are fair.

(H_{a}): The coins are not fair.

Distribution for the test: (chi^{2}_{2}) where (df = 3 - 1 = 2).

Calculate the test statistic: (chi^{2} = 2.14)

Graph: Figure (PageIndex{3}).

Probability statement: (p ext{-value} = P(chi^{2} > 2.14) = 0.3430)

Compare α and the p-value:

(alpha = 0.05)

(p ext{-value} = 0.3430)

(alpha < p ext{-value}).

Make a decision: Since (alpha < p ext{-value}), do not reject (H_{0}).

Conclusion: There is insufficient evidence to conclude that the coins are not fair.

PressSTATandENTER. Make sure you clear listsL1,L2, andL3if they have data in them. IntoL1, put the observed frequencies20,57,23. IntoL2, put the expected frequencies25,50,25. You should see"sum".Enter L3. Rounded to two decimal places, you should see2.14. Arrow down to7:χ2cdf(or press7). Enter2.14,1E99,2). Rounded to four places, you should see.3430, which is the p-value.

The newer TI-84 calculators have inSTAT TESTSthe testChi2 GOF. Make sure you clear any lists before you start.

Exercise (PageIndex{4})

Students in a social studies class hypothesize that the literacy rates across the world for every region are 82%. Table shows the actual literacy rates across the world broken down by region. What are the test statistic and the degrees of freedom?

Developed Regions99.0
Commonwealth of Independent States99.5
Northern Africa67.3
Sub-Saharan Africa62.5
Latin America and the Caribbean91.0
Eastern Asia93.8
Southern Asia61.9
South-Eastern Asia91.9
Western Asia84.5
Oceania66.4

(df = 9)

(chi^{2} ext{ test statistic} = 26.38) Figure (PageIndex{4}).

PressSTATandENTER. Make sure you clear listsL1, L2,andL3if they have data in them. Into L1, put the observed frequencies99, 99.5, 67.3, 62.5, 91, 93.8, 61.9, 91.9, 84.5, 66.4. IntoL2, put the expected frequencies82, 82, 82, 82, 82, 82, 82, 82, 82, 82. Rounded to two decimal places, you should see26.38. Enter26.38,1E99,9). Rounded to four places, you should see.0018, which is the p-value.

The newer TI-84 calculators have inSTAT TESTSthe testChi2 GOF. Make sure you clear any lists before you start.

## References

1. Data from the U.S. Census Bureau
2. Data from the College Board. Available online at http://www.collegeboard.com.
3. Data from the U.S. Census Bureau, Current Population Reports.
4. Ma, Y., E.R. Bertone, E.J. Stanek III, G.W. Reed, J.R. Hebert, N.L. Cohen, P.A. Merriam, I.S. Ockene, “Association between Eating Patterns and Obesity in a Free-living US Adult Population.” American Journal of Epidemiology volume 158, no. 1, pages 85-92.
5. Ogden, Cynthia L., Margaret D. Carroll, Brian K. Kit, Katherine M. Flegal, “Prevalence of Obesity in the United States, 2009–2010.” NCHS Data Brief no. 82, January 2012. Available online at http://www.cdc.gov/nchs/data/databriefs/db82.pdf (accessed May 24, 2013).
6. Stevens, Barbara J., “Multi-family and Commercial Solid Waste and Recycling Survey.” Arlington Count, VA. Available online at www.arlingtonva.us/department.../file84429.pdf (accessed May 24,2013).

## Review

To assess whether a data set fits a specific distribution, you can apply the goodness-of-fit hypothesis test that uses the chi-square distribution. The null hypothesis for this test states that the data come from the assumed distribution. The test compares observed values against the values you would expect to have if your data followed the assumed distribution. The test is almost always right-tailed. Each observation or cell category must have an expected value of at least five.

## Formula Review

(sum_k frac{(O - E)^{2}}{E}) goodness-of-fit test statistic where:

(O): observed values

(E): expected value

(k): number of different data cells or categories

(df = k - 1) degrees of freedom

Determine the appropriate test to be used in the next three exercises.

Exercise (PageIndex{5})

An archeologist is calculating the distribution of the frequency of the number of artifacts she finds in a dig site. Based on previous digs, the archeologist creates an expected distribution broken down by grid sections in the dig site. Once the site has been fully excavated, she compares the actual number of artifacts found in each grid section to see if her expectation was accurate.

Exercise (PageIndex{6})

An economist is deriving a model to predict outcomes on the stock market. He creates a list of expected points on the stock market index for the next two weeks. At the close of each day’s trading, he records the actual points on the index. He wants to see how well his model matched what actually happened.

a goodness-of-fit test

Exercise (PageIndex{7})

A personal trainer is putting together a weight-lifting program for her clients. For a 90-day program, she expects each client to lift a specific maximum weight each week. As she goes along, she records the actual maximum weights her clients lifted. She wants to know how well her expectations met with what was observed.

Use the following information to answer the next five exercises: A teacher predicts that the distribution of grades on the final exam will be and they are recorded in Table.

A0.25
B0.30
C0.35
D0.10

The actual distribution for a class of 20 is in Table.

A7
B7
C5
D1

Exercise (PageIndex{8})

(df =) ______

3

Exercise (PageIndex{9})

State the null and alternative hypotheses.

Exercise (PageIndex{10})

(chi^{2} ext{test statistic} =) ______

2.04

Exercise (PageIndex{11})

(p ext{-value} =) ______

Exercise (PageIndex{12})

At the 5% significance level, what can you conclude?

We decline to reject the null hypothesis. There is not enough evidence to suggest that the observed test scores are significantly different from the expected test scores.

Use the following information to answer the next nine exercises: The following data are real. The cumulative number of AIDS cases reported for Santa Clara County is broken down by ethnicity as in Table.

EthnicityNumber of Cases
White2,229
Hispanic1,157
Black/African-American457
Asian, Pacific Islander232
Total = 4,075

The percentage of each ethnic group in Santa Clara County is as in Table.

EthnicityPercentage of total county populationNumber expected (round to two decimal places)
White42.9%1748.18
Hispanic26.7%
Black/African-American2.6%
Asian, Pacific Islander27.8%
Total = 100%

Exercise (PageIndex{13})

If the ethnicities of AIDS victims followed the ethnicities of the total county population, fill in the expected number of cases per ethnic group.

Perform a goodness-of-fit test to determine whether the occurrence of AIDS cases follows the ethnicities of the general population of Santa Clara County.

Exercise (PageIndex{14})

(H_{0}): _______

(H_{0}): the distribution of AIDS cases follows the ethnicities of the general population of Santa Clara County.

Exercise (PageIndex{15})

(H_{a}): _______

Exercise (PageIndex{16})

Is this a right-tailed, left-tailed, or two-tailed test?

right-tailed

Exercise (PageIndex{17})

degrees of freedom = _______

Exercise (PageIndex{18})

(chi^{2} ext{test statistic}) = _______

88,621

Exercise (PageIndex{19})

(p ext{-value} =) _______

Exercise (PageIndex{20})

Graph the situation. Label and scale the horizontal axis. Mark the mean and test statistic. Shade in the region corresponding to the (p ext{-value}). Figure (PageIndex{5}).

Let (alpha = 0.05)

Decision: ________________

Reason for the Decision: ________________

Conclusion (write out in complete sentences): ________________

Graph: Check student’s solution.

Decision: Reject the null hypothesis.

Reason for the Decision: (p ext{-value} < alpha)

Conclusion (write out in complete sentences): The make-up of AIDS cases does not fit the ethnicities of the general population of Santa Clara County.

Exercise (PageIndex{21})

Does it appear that the pattern of AIDS cases in Santa Clara County corresponds to the distribution of ethnic groups in this county? Why or why not?

## Content Preview

Marginal odds ratios are odds ratios between two variables in the marginal table, and can be used to test for marginal independence between two variables while ignoring the third. For example, for AC margin,( μ_), where (mu) denotes expected counts, the "marginal odds ratio" is:

or, sample (observed) marginal odds-ratio for our running death example is:

The odds of death penalty for a white defendant are 1.18 times as high as they are for a black defendant. But is this value statistically significant?

## Testing the Joint Significance of All Predictors Section

Testing the null hypothesis that the set of coefficients is simultaneously zero. For example,

( extleft(dfrac<1-pi> ight)=eta_0+eta_1 X_1+eta_2 X_2+ldots+eta_k X_k)

test (H_0 : eta_1 = eta_2 = . = 0) versus the alternative that at least one of the coefficients (eta_1, . . . , eta_k) is not zero.

This is like the overall F−test in linear regression. In other words, this is testing the null hypothesis that an intercept-only model is correct,

versus the alternative that the current model is correct

( extleft(dfrac<1-pi> ight)=eta_0+eta_1 X_1+eta_2 X_2+ldots+eta_k X_k)

In our example, we are testing the null hypothesis that an intercept-only model is correct,

versus the alternative that the current model (in this case saturated model) is correct

In the SAS output, three different chisquare statistics for this test are displayed in the section "Testing Global Null Hypothesis: Beta=0," corresponding to the likelihood ratio, score and Wald tests. Recall their definitions from the very first lessons.

This test has k degrees of freedom (e.g. the number of dummy indicators (design variables), that is the number of (eta)-parameters (except the intercept)).

Large chisquare statistics lead to small p-values and provide evidence against the intercept-only model in favor of the current model.

The Wald test is based on asymptotic normality of ML estimates of (eta)'s. Rather than using the Wald, most statisticians would prefer the LR test.

If these three tests agree, that is evidence that the large-sample approximations are working well and the results are trustworthy. If the results from the three tests disagree, most statisticians would tend to trust the likelihood-ratio test more than the other two.

In our example, the "intercept only" model or the null model says that student's smoking is unrelated to parents' smoking habits. Thus the test of the global null hypothesis (eta_1 = 0) is equivalent to the usual test for independence in the 2 × 2 table. We will see that the estimated coefficients and SE's are as we predicted before, as well as the estimated odds and odds ratios.

Residual deviance is the difference in (G^2 = −2 ext) between a saturated model and the built model. The high residual deviance shows that the model cannot be accepted.

The null deviance is the difference in (G^2 = −2 ext) between a saturated model and the intercept-only model. The high residual deviance shows that the intercept-only model does not fit. In our 2 × 2 table smoking example, the residual deviance is almost 0 because the model we built is the saturated model. And notice that the degree of freedom is 0, too. Regarding the null deviance, we could see it equivalent to the section "Testing Global Null Hypothesis: Beta=0," by likelihood ratio in SAS output.

For our example, Null deviance = 29.1207 with df = 1. Notice that this matches Deviance we got in the earlier text above.

## Pearson and deviance test statistics Section

The Pearson goodness-of-fit statistic is

An easy way to remember it is

where (O_j = X_j) is the observed count in cell j, and (E_j=E(X_j)=nhat_j) is the expected count in cell j under the assumption that null hypothesis is true, i.e. the assumed model is a good one. Notice that (hat_j) is the estimated (fitted) cell proportion (pi_j) under (H_0).

The deviance statistic is

where "log" means natural logarithm. An easy way to remember it is

(G^2=2sumlimits_j O_j extleft(dfrac ight))

In some texts, (G^2) is also called the likelihood-ratio test statistic, for comparing the likelihoods ((l_0) and (l_1)) of two models, that is comparing the loglikelihoods under (H_0) (i.e., loglikelihood of the fitted model, (L_0)) and loglikelihood under (H_A) (i.e., loglikelihood of the larger, less restricted, or saturated model (L_1)): (G^2 = -2 extleft(dfrac ight) = -2left(L_0 - L_1 ight)). A common mistake in calculating (G^2) is to leave out the factor of 2 at the front.

Note that (X^2) and (G^2) are both functions of the observed data X and a vector of probabilities (pi). For this reason, we will sometimes write them as (X^2left(x, pi ight)) and (G^2left(x, pi ight)), respectively when there is no ambiguity, however, we will simply use (X^2) and (G^2). We will be dealing with these statistics throughout the course in the analysis of 2-way and k-way tables, and when assessing the fit of log-linear and logistic regression models.

## Steps to perform Chi Square goodness of fit

Step1: Define the null hypothesis and alternative hypothesis

• Null hypothesis (H0): There is no difference between observed value and the expected value
• Alternative hypothesis (H1): There is a significant difference between observed value and the expected value

Step 2: Specify the level of significance

Step 3: Compute χ2 statistic Step 4: Calculate the degree of freedom:

The degrees of freedom in chi square test depends on the sample distribution Step5: Find the critical value, based on degrees of freedom

Step 6: Finally, draw the statistical conclusion:

If the test statistic value is greater than the critical value, reject the null hypothesis, and hence we can conclude that there is a significant difference between observed value and expected value.

## 5.3: Goodness-of-Fit Test

A weakness of the chi-square goodness-of-fit test is its dependence on the choice of histogram midpoints. An advantage of the EDF tests is that they give the same results regardless of the midpoints, as illustrated in this example.

In Example 4.2, the option MIDPOINTS=0.2 TO 1.8 BY 0.2 was used to specify the histogram midpoints for GAP. The following statements refit the lognormal distribution using default midpoints (0.3 to 1.8 by 0.3).

The histogram is shown in Output 4.3.1.

Output 4.3.1: Lognormal Curve Fit with Default Midpoints
A summary of the lognormal fit is shown in Output 4.3.2. The p -value for the chi-square goodness-of-fit test is 0.0822. Since this value is less than 0.10 (a typical cutoff level), the conclusion is that the lognormal distribution is not an appropriate model for the data. This is the opposite conclusion drawn from the chi-square test in Example 4.2, which is based on a different set of midpoints and has a p -value of 0.2756 (see Output 4.2.2). Moreover, the results of the EDF goodness-of-fit tests are the same since these tests do not depend on the midpoints. When available, the EDF tests provide more powerful alternatives to the chi-square test. For a thorough discussion of EDF tests, refer to D'Agostino and Stephens (1986).

J. P. VERMA, PHD, is Professor of Statistics and Dean of Students Welfare at Lakshmibai National University of Physical Education, India. He is the author of Sports Research with Analytical Solution using SPSS, Repeated Measures Design for Empirical Researchers, and Statistics for Exercise Science and Health with Microsoft Office Excel.

ABDEL-SALAM G. ABDEL-SALAM, PHD, is Assistant Professor of Statistics and Head of Student Data Management Section and Coordinator for the Statistical Consulting Unit at Qatar University, Qatar. He is a PStat ® by the American Statistical Association and CStat by the Royal Statistical Society. He taught at different international universities such as Virginia Polytechnic Institute and State University (Virginia Tech), Oklahoma State University and Cairo University.

## Chi-Square Goodness of Fit Test

If a die is fair, we would expect the probability of rolling a 6 on any given toss to be 1/6. Assuming the 3 dice are independent (the roll of one die should not affect the roll of the others), we might assume that the number of sixes in three rolls is distributed Binomial(3,1/6). To determine whether the gambler's dice are fair, we may compare his results with the results expected under this distribution. The expected values for 0, 1, 2, and 3 sixes under the Binomial(3,1/6) distribution are the following:

Null Hypothesis:
p 1 = P(roll 0 sixes) = P(X=0) = 0.58
p 2 = P(roll 1 six) = P(X=1) = 0.345
p 3 = P(roll 2 sixes) = P(X=2) = 0.07
p 4 = P(roll 3 sixes) = P(X=3) = 0.005.

Since the gambler plays 100 times, the expected counts are the following: The two plots shown below provide a visual comparison of the expected and observed values:

From these graphs, it is difficult to distinguish differences between the observed and expected counts. A visual representation of the differences is the chi-gram , which plots the observed - expected counts divided by the square root of the expected counts, as shown below:

The chi-square statistic is the sum of the squares of the plotted values,
(48-58)²/58 + (35-34.5)²/58 + (15-7)²/7 + (3-0.5)²/0.5
= 1.72 + 0.007 + 9.14 + 12.5 = 23.367.

Given this statistic, are the observed values likely under the assumed model? A random variable is said to have a chi-square distribution with m degrees of freedom if it is the sum of the squares of m independent standard normal random variables (the square of a single standard normal random variable has a chi-square distribution with one degree of freedom). This distribution is denoted ( m ), with associated probability values available in Table G in Moore and McCabe and in MINITAB.

The standardized counts (observed - expected )/sqrt(expected) for k possibilities are approximately normal, but they are not independent because one of the counts is entirely determined by the sum of the others (since the total of the observed and expected counts must sum to n ). This results in a loss of one degree of freedom, so it turns out the the distribution of the chi-square test statistic based on k counts is approximately the chi-square distribution with m = k-1 degrees of freedom, denoted ( k-1 ).

### Hypothesis Testing

Let p 1 , p 2 , . p k denote the probabilities hypothesized for k possible outcomes. In n independent trials, we let Y 1 , Y 2 , . Y k denote the observed counts of each outcome which are to be compared to the expected counts np 1 , np 2 , . np k . The chi-square test statistic is q k-1 = Reject H 0 if this value exceeds the upper critical value of the ( k-1 ) distribution, where is the desired level of significance.

### Example

Given this information, the casino asked the gambler to take his dice (and his business) elsewhere.

### Example

Suppose the random variable Y 1 has a Bin( n,p 1 ) distribution, and let Y 2 = n - Y 1 and p 2 = 1 - p 1 . Since ( Y 1 - np 1 )² = (n - Y 2 - n + np 2 )² = (Y 2 - np 2 )², where Z ² has a chi-square distribution with 1 degree of freedom. If the observed values Y 1 and Y 2 are close to their expected values np 1 and np 2 , then the calculated value Z ² will be close to zero. If not, Z ² will be large.

In general, for k random variables Y i , i = 1, 2. k , with corresponding expected values np i , a statistic measuring the "closeness" of the observations to their expectations is the sum which has a chi-square distribution with k-1 degrees of freedom.

### Example

The chi-square goodness of fit test may also be applied to continuous distributions. In this case, the observed data are grouped into discrete bins so that the chi-square statistic may be calculated. The expected values under the assumed distribution are the probabilities associated with each bin multiplied by the number of observations. In the following example, the chi-square test is used to determine whether or not a normal distribution provides a good fit to observed data.

### Example

The plot indicates that the assumption of normality is not unreasonable for the verbal scores data.

To compute a chi-square test statistic, I first standardized the verbal scores data by subtracting the sample mean and dividing by the sample standard deviation. Since these are estimated parameters, my value for d in the test statistic will be equal to two. The 200 standardized observations are the following: I chose to divide the observations into 10 bins, as follows: The corresponding standard normal probabilities and the expected number of observations (with n =200) are the following: The chi-square statistic is the sum of the squares of the values in the last column, and is equal to 2.69.

Since the data are divided into 10 bins and we have estimated two parameters, the calculated value may be tested against the chi-square distribution with 10 -1 -2 = 7 degrees of freedom. For this distribution, the critical value for the 0.05 significance level is 14.07. Since 2.69 < 14.07, we do not reject the null hypothesis that the data are normally distributed.

## Content Preview

Before we proceed, we would like to determine if the model adequately fits the data. The goodness-of-fit test in this case compares the variance-covariance matrix under a parsimonious model to the variance-covariance matrix without any restriction, i.e. under the assumption that the variances and covariances can take any values. The variance-covariance matrix under the assumed model can be expressed as:

(mathbf) is the matrix of factor loadings, and the diagonal elements of (Ψ) are equal to the specific variances. This is a very specific structure for the variance-covariance matrix. A more general structure would allow those elements to take any value. To assess goodness-of-fit, we use the Bartlett-Corrected Likelihood Ratio Test Statistic:

The test is a likelihood ratio test, where two likelihoods are compared, one under the parsimonious model and the other without any restrictions. The constant is the statistic is called the Bartlett correction. The log is the natural log. In the numerator we have the determinant of the fitted factor model for the variance-covariance matrix, and below, we have a sample estimate of the variance-covariance matrix assuming no structure where:

and (mathbf) is the sample variance-covariance matrix. This is just another estimate of the variance-covariance matrix which includes a small bias. If the factor model fits well then these two determinants should be about the same and you will get a small value for (X_<2>). However, if the model does not fit well, then the determinants will be difference and (X_<2>) will be large.

Under the null hypothesis that the factor model adequately describes the relationships among the variables,

Under the null hypothesis, that the factor model adequately describes the data, this test statistic has a chi-square distribution with an unusual set of degrees of freedom as shown above. The degrees of freedom are the difference in the number of unique parameters in the two models. We reject the null hypothesis that the factor model adequately describes the data if (X_<2>) exceeds the critical value from the chi-square table.

Back to the Output. Looking just past the iteration results, we have.

Significance Tests based on 329 Observations

For our Places Rated dataset, we find a significant lack of fit. (X _ < 2 >= 92.67 d . f = 12 p < 0.0001). We conclude that the relationships among the variables is not adequately described by the factor model. This suggests that we do not have the correct model.

The only remedy that we can apply in this case is to increase the number m of factors until an adequate fit is achieved. Note, however, that m must satisfy

In the present example, this means that m ≤ 4.

Let's return to the SAS program and change the "nfactors" value from 3 to 4:

Significance Tests based on 329 Observations

We find that the factor model with m = 4 does not fit the data adequately either, (X _ < 2 >= 41.69 d . f . = 6 p < 0.0001). We cannot properly fit a factor model to describe this particular data and conclude that a factor model does not work with this particular dataset. There is something else going on here, perhaps some non-linearity. Whatever the case, it does not look like this yields a good-fitting factor model. A next step could be to drop variables from the data set to obtain a better fitting model.

A Weibull Distribution describes the type of failure mode experienced by the population (infant mortality, early wear out, random failures, rapid wear-out). Estimates are given for Beta (shape factor) and Eta (scale). MTBF (Mean Time Between Failures) is based on characteristic life curve, not straight arithmetic average.

A Weibull Distribution uses the following parameters:

• Beta : Beta, also called the shape factor, controls the type of failure of the element (infant mortality, wear-out, or random).
• Eta : Eta is the scale factor, representing the time when 63.2 % of the total population is failed.
• Gamma : Gamma is the location parameter that allows offsetting the Weibull distribution on time. The Gamma parameter should be used if the datapoints on the Weibull plot do not fall on a straight line.

If the value of Beta is greater than one (1), you can perform Preventative Maintenance (PM) Optimizations. A Gamma different from a value zero (0) means that the distribution is shifted to fit the datapoints more closely.

Note: This is an advanced feature and should be used in the proper context and with a good understanding of how to apply a three-parameter Weibull distribution.

You can use the following information to compare the results of individual Weibull analyses. The following results are for good populations of equipment.