The R Functions

In both Jennifer and Jayfred's analysis that we will work on, the data is in a format which is very close to what the R function for the linear mixed model requires. Thus the data preparation step here is less involved as compared to the data preparation steps with the eLesson: Correlation Using the R Statistical Package - Part 2: Data Preparation. However, there are four new generic functions that we will like to introduce here before we write the linear mixed model program to fit our data, these are: 1) str, 2) factor, 3) xtabs, and 4) sink.  

1. The "str" function

The str function is among the most useful generic functions in R. With the help of this function one may ask R to summarize the structure of any R object (an object is the data table in this case) and print it on the console (screen). For now you may think of the word object as a programming term that could be used to refer to things in R such as variables, etc.  In this next video we demonstrate how to use the str function with Jennifer's example dataset. The idea is to let R summarize the structure of what was read from Jennifer's data file, to give us a better perspective of the dataset.

2. The "factor" function

In Jennifer's data the Plot, Entry, Name, and Block columns are Categorical variables (also known as factors with levels). Further, it is clear from the values in the "Name" column that Name is a categorical variable because some of the values in that column are recorded as string literals e.g. Starlight, Ross, etc. However classification of variable values in field experiments may many times use only numeric (numbers rather than a descriptive string) to represent a categorical value.  A good example is the Block variable which essentially is a factor but uses numeric to represent the levels.

Having string literals in variable values is enough clue for us and a software such as R to treat a variable column as Categorical. However, when R reads a column such as Plot, Entry, or Block it is not possible for R to easily distinguish that this must be treated as categorical variable, rather than a data point collected in the experiment representing a quantity.

For Jennifer's data all values that are recorded in Plot, Entry, and Block are categorical, however using numbers instead of strings. In such cases we must use the function factor to explicitly direct R to treat these columns as a Categorical variable, rather than a numeric value used in a calculation. Not doing so will have unintended consequences because R will very likely treat them as numeric variables. In this next video we demonstrates how to use the factor function with Jennifer's example dataset. 

3. The "xtabs" function

Assume that we want to find out from Jennifer's dataset how many observations we have per Block of a given Entry (i.e. plant variety).  To answer a question like this one, we may have to construct a frequency table that counts the number of observations per Block per Entry and represent that as a table. This is exactly what the xtabs function does.  The xtabs function constructs contingency tables (cross-tabulations) of count data. Let's say we have a dataset with multiple columns of categorical variables and we want to see the frequency/count of observations that match a group of levels of these categorical variables. Then we will use the function called xtabs. In this next video we demonstrates how to use the xtabs function with Jennifer's dataset.

4. The "sink" function

Often we want to write the output of a program to an external text file. Storing the output to a file may be needed because you want to share the results from the software with someone or use the output in another analysis later, etc. This can be done easily by using the R sink function. The sink function is simply a method to divert the output from R into a text file. In most of the cases you will use "sink call" in a paired statement form. The first sink call with the name of the file as a parameter, e.g. MYOUTPUTFILE.txt will divert the output to the file named MYOUTPUTFILE.txt.  With the subsequent call to sink, this time with no parameter, diverts the output back to the R console. In this next video we demonstrate a simple example of using the sink function.

Quiz

Question

Question : Select the correct order of the R code statements below to accomplish the following: Read the data from the file ExampleData.txt  and create a cross table that counts the number of observations per Block per Entry  and write it to a file myOutFile.txt 

a) sink(); b) sink("myOutFile.txt "); c) xtabs(~ Block + Entry, inputData); d) inputData <- read.csv("ExampleData.txt");

The correct order is:

Looks Good! Correct: Explanation: The sink must first divert the output, then you execute the output function call and finally you re divert the output to console.
Question

Question : The R function factor accomplishes the following:

Looks Good! Correct: Explanation: When all the values of a categorical variable are numeric, one must use the function factor to force R to treat the variable as a categorical variable.
Question

Question : If you wanted to summarize the data of an R data object you will use the function:

Looks Good! Correct: Explanation: The str function summarizes the structure and not the data. You may want to explore the R function called 'summary' for that.