Introduction - Why Chi-Square Is Needed

We will follow a plant breeding example to demonstrate how to calculate a chi-square analysis, a statistical test researchers use to determine if their experimental data supports (or doesn’t support) the hypothesis being tested. Chi-square gives a measure of  “goodness of fit” which defines how well data that was expected from their hypothesis fits with what was actually observed in their experiment. Later sections of this lesson will go through details on how this is calculated.

In this lesson's case study example, tomato breeders are trying to incorporate genes which code for resistance to bacterial spot disease (See Facts About Bacterial Spot for more information on the disease.)  into cultivated tomato lines. One gene which has been characterized is Rx-4. A tomato line, 6.8068, was chosen as one parent for use in a crossing scheme because it is resistant to bacterial spot and contains this gene. This line also closely resembles cultivated tomato, therefore not as much backcrossing would be needed to remove unwanted traits. OH88119 was chosen as the other parent because even though susceptible to bacterial spot, it was a suitable parent for commercial hybrids due to other desirable traits in its  genome such as fruit size, fruit color, time to maturity, etc. (Fig 1).

Fig 1: On the left is a tomato line susceptible to bacterial spot disease, but has other desirable qualities. On the right, is a line derived from wild germplasm that is resistant to bacterial spot, but lacks other cultivated tomato traits. Photo credit: David Francis, The Ohio State University.

The resulting F1 offspring from this OH88119 x 6.8068 cross all showed resistance to bacterial spot. These were then allowed to self-pollinate and a hypothesis was devised to describe the mode of inheritance for the Rx-4 gene.   The breeders’ hypothesis was that Rx-4 expression is due to a single gene with complete dominance. The observed results in the F2 generation though, deviated from what the researchers expected in that there were more susceptible F2 plants than they anticipated. The question remains, as to how far can observed results deviate from what was expected before they should be considered significant? In other words, is the difference in the number of susceptible plants seen and those expected simply due to chance or is something else going on such as a wrong genetic hypothesis, high influence of the environment, etc.? An example of differences due to chance in this tomato example might be a plant that didn’t get inoculated properly, a worker accidentally hoed out a plant,  an incorrectly categorized phenotypic observation etc. So these would be random events which impact results, but are unrelated to the Rx-4 gene. This is where statistics, and in this case study, chi-square comes into play - to help plant breeders accurately interpret results.