Chi-square Description

The chi-square (X2) method allows researchers to assess what is called the "goodness of fit" between expected and observed values. In our example this would be the number of tomato plants expected to be resistant to bacterial spot and those which really were resistant in our experiment, as well as those expected to be susceptible compared with those which we observed to be susceptible in the field or greenhouse. After calculating a chi-square value we can obtain a probability that any differences seen between expected and observed numbers are simply due to chance. Or alternatively, if the differences are found to be significant, the plant breeder may revisit the genetic hypothesis. Chi-square is used when participants/plants can be classified into distinct categories. In order to use chi-square an individual plant (in this example) must fit in one and only one category (i.e. either resistant or susceptible, but not both) (Fig. 2).

Figure 2. Example of a tomato line showing resistance to bacterial spot. Photo credit: David Francis, The Ohio State University.

There are two types of chi-square analysis methods: goodness of fit (such as our tomato example) and chi-square test for independence (which determines if there is an association between variables). As we will see with this tomato bacterial spot resistance example, the chi-square test for goodness of fit compares the expected and observed numbers of resistant and susceptible plants to determine how well the researcher’s predictions fit the data. Statisticians refer to what is called the "null hypothesis", which is merely the hypothesis you are testing in your experiment and states "there is no difference" between values being compared. For a calculation with only one variable (as in this example dealing with disease resistance or susceptibility), the null hypothesis states that “there is no difference between the observed values and the expected values”.  In pure statistical terms, the null hypothesis is what we either 1) reject or 2) fail to reject based upon our calculations. You’ll notice that we are not “proving” a hypothesis correct, only repeated experimentation over time can do that. In our tomato example our null hypothesis might be “Bacterial spot resistance due to the  Rx-4 gene is inherited in a completely dominant manner, such that we expect a 3:1 resistant to susceptible ratio in our F2 generation.”

There is also a chi-square test for independence which compares two sets of categories to determine whether the two groups are distributed differently among the categories. In this case, the null hypothesis typically states that “there is no association between the variables”. An example of this would be “there is no association between bacterial spot resistance and DNA marker Cos57”. Again, the null hypothesis is what we reject or fail to reject based upon our calculations.