The R Functions
The term "function" might be confusing you, so we want to take a few minutes to describe what this computer programming term means. A "function" is a piece of a computer program that will do some action with variables that you pass to it, it performs a particular calculation, and then returns an output based upon the calculation. For example you have worked with functions if you have calculated AVERAGE or SUM in Excel. Thus you can think of AVERAGE, SUM, etc. as Excel equivalent to R functions.
Let us take a look at the relevant R functions for Sarah's data analysis: 1) subset, 2) aggregate, 3) rbind, 4) transpose, 5) transform and 6) merge.
These six functions were selected based upon the steps identified in the computational approach. We will use these functions first for preparing the data before we are able to calculate the correlation coefficient. These are the functions that you may think of as part of the Cycle we discussed in the presentations found in the chapter, "Methodology and the Computational Approach" . Remember that we are trying to determine if there is a correlation between Sarah's wheat yields and CSR indices.
Let's define one more term that we will be using very often and that is a "script". In this context you can think of a set of R instructions/commands (i.e. an R program) saved in a text file, named with a file extension ".R". For example a text file named "myProgram.R" which contains multiple lines of R commands.
** It is recommended that for each of the following videos you download the R script along with the data files (if any) and try running them in the RStudio. This exercise will give you a hands on practice and better understanding of the R functions.
Additionally, it has been my personal experience that while learning any new programming language if you experience errors in running your code, the first step to look into for resolving the issue is to make sure that there are no typos such as an extra or missing space, unclosed brackets, etc.
1. The "Subset" Function
The first function we will discuss is the subset function. In general the subset function is used to extract out select entries of the data from our entire file. In our plant breeding case, we want to work with the data which falls within a certain range of light wavelengths. In this next video we demonstrates how to use the subset function with a simple dataset.
Let's make sure you fully understand the concepts we've been discussing up to this point by asking you a couple of general questions.
2. The "Aggregate" Function
The next R function we will look into is called "aggregate". In general the aggregate function is used to perform a certain calculation on a subgroup of the data. In our plant breeding case, we want to work with the data which falls within a certain range of light wavelengths. Additionally for these subgroups we will like to calculate measures such as group mean etc. before using them further in the analysis. In this next video discussion, we demonstrate how to use the aggregate function with a simple dataset.
Before moving onto the third R function we want to discuss, here are a couple of reflecting questions for you to double check that you are accurately understanding the aggregate function.
Modify the aggregate script such that instead of aggregating just the "Attribute1" column it aggregates the complete rows. Thus your output will look something like:
myGroupedIndexValue GroupOn Attribute1 Attribute2
1 100 100 30 20 2 101 101 20 15
[HINT] Try to pass (send to the function) the table variable tabData rather than the column tabData["Attribute1"] as the first parameter to the aggregate function.
3. The "rbind" Function
The next R function we will look into is called "rbind" (think of it named from "row bind" ). In general the rbind function is used to join together 2 or more data tables which contain the same attributes or data categories. In our plant breeding case scenario, we will have to use this function to join multiple variables by rows into a single variable. Here we demonstrate how to use the rbind function with the ongoing simple dataset.
Here are a couple of reflecting questions for you to double check that you are accurately understanding the rbind function in R.
Modify the rbind script such that instead of binding all the rows it only binds the rows from data file 1 where the column "FilterOn" has a value less than 100.25
[HINT] use the function subset on variable in which you read the data file 1 before binding the two files
4. The "Transpose" Function
The next R function that we will be using is called "transpose". In general the transpose function is used to reorganize data tables by moving the rows into columns and columns into rows. In many cases the data must be transposed before it can be passed (sent into) to a function for further calculation. In this next video discussion, we demonstrate how to use the transpose function with an ongoing simple dataset. We also show how to name columns and have the final format be in a table (data type), rather than a matrix (data type).
Once again, before moving on, here are a couple of reflecting questions for you to double check that you are accurately understanding the transpose function in R.
5. The "Transform" Function
The next R function that we will be using is called "transform". In general the transform function is used to add a new column to a data table by performing a specified calculation from the existing columns. In this next video discussion, we demonstrate how to use the transform function with an ongoing simple dataset.
Before moving onto the final R function we want to discuss, here are a couple of reflecting questions for you to double check that you are accurately understanding the transform function.
6. The "Merge" Function
And finally we look into an R function called "merge". In general the merge function is used to join 2 data tables side by side based upon common column values of the two data tables. In this next video discussion, we demonstrate how to use the merge function with an ongoing simple dataset.
Finally here is a quick question for you to double check that you are accurately understanding the merge function.
That concludes our quick overview of 6 commonly used R functions, which we will need to use to format Sarah's data in order to be able to analyze. In our next topic, we will discuss common R scripts.