The Data Analyst View
Now we want to look at this augmented design example from the data analyst's perspective. But before we proceed in working with Jayfred's data from the field we will take a slight diversion and take a look at a simplified example from Jennifer Kling's "Introduction to the Augmented Experimental Design Webinar". The analysis of Jennifer's data is similar to what we will be doing with Jayfred's data, however Jennifer's example dataset is smaller so it will be easy for us to follow the analytical concepts. You may view and download Jennifer's example data used in her webinar from this direct link http://pbgworks.org/sites/pbgworks.org/files/ExampleData.txt .
Once you have downloaded the file ExampleData.txt, open it with a text editor. You may observe that there are five columns in the file. These columns contain the values of Plot, Entry, Name, Block, and TSW.
- The Plot values refers to the IDs of the plots in the field.
- The Entry refers to the IDs of the Genotype planted in that plot.
- The Name refers to the name of the plant planted in that plot.
- The Block refers to the Block in which the plot is located.
- The column TSW contains the corresponding 1000-seed weight from the yield of the plot.
In addition in this downloaded file, you may observe that we have 50 new experimental treatments (entries/plant varieties) named with numeric identifiers (e.g. 31, 136, etc.) and 3 checks (controls) named MF183, Ross, Starlight as listed in the Name column. Additionally, the new treatment varieties are not replicated in the experiment but check treatments are replicated once in every block. In this case our goal is to compare the new entries/genotypes and the checks.
We must look at the analysis from two perspectives: 1) treating the new entries as fixed effects and 2) treating the new entries as random effects. In fixed effects all possible levels of a categorical variable (factor) are included in the study whereas in random effects only a subset of all possible levels of a factor are included in the study. So in the data, the blocks will always be treated as random effects - the reasoning behind doing that is that the blocks that have been selected in the study only categorize or represent a few levels of all possible blocks that could exist in all fields.
However, we will treat the plant varieties (entry) as both fixed as well as random depending upon the perpectrive we take. When we treat entries as fixed the underlying assumption is that all the new entries that we wanted to test have been included in the experiment and these are the only entries that the scientist will have. Although this is a narrow view, one may still take this perspective. In the second case, if we were to say that the new entries that we have included in our experiment are only a portion of all possible new entries (so just a representation of all genotypes that the scientists have developed or will develop) - in that case we must treat the new entries as random effects.
To analyze such scenarios we will fit what is called as linear mixed-effects models to the data. Like other models, mixed effect models describe the relationship between the response variable and other variables observed from an experiment. Since these models incorporate both fixed effect parameters as well as random effect parameters these are called mixed effects models.