The R Functions

The term "function" might be confusing you, so we want to take a few minutes to describe what this computer programming term means. A "function" is a piece of a computer program that will do some action with variables that you pass to it, it performs a particular calculation, and then returns an output based upon the calculation. For example you have worked with functions if you have calculated AVERAGE or SUM in Excel. Thus you can think of AVERAGE, SUM, etc. as Excel equivalent to R functions. 

Let us take a look at the relevant R functions for Sarah's data analysis: 1) subset, 2) aggregate, 3) rbind, 4) transpose, 5) transform and 6) merge.

These six functions were selected based upon the steps identified in the computational approach. We will use these functions first for preparing the data before we are able to calculate the correlation coefficient. These are the functions that you may think of as part of the Cycle we discussed in the presentations found in the chapter, "Methodology and the Computational Approach". Remember that we are trying to determine if there is a correlation between Sarah's wheat yields and CSR indices.

Let's define one more term that we will be using very often and that is a "script". In this context you can think of a set of R instructions/commands (i.e. an R program) saved in a text file, named with a file extension ".R". For example a text file named "myProgram.R" which contains multiple lines of R commands.

** It is recommended that for each of the following videos you download the R script along with the data files (if any) and try running them in the RStudio. This exercise will give you a hands on practice and better understanding of the R functions.

Additionally, it has been my personal experience that while learning any new programming language if you experience errors in running your code, the first step to look into for resolving the issue is to make sure that there are no typos such as an extra or missing space, unclosed brackets, etc.

1.  The "Subset" Function

The first function we will discuss is the subset function. In general the subset function is used to extract out select entries of the data from our entire file. In our plant breeding case, we want to work with the data which falls within a certain range of light wavelengths. In this next video we demonstrates how to use the subset function with a simple dataset. 

Let's make sure you fully understand the concepts we've been discussing up to this point by asking you a couple of general questions.

Quiz

Question

The representation "tabData$FilterOn" in the above example refers to:  

Looks Good! Correct: Explanation: Using the '$' to access on a table variable refers to a Vector variable.
Question

The subset function allows you to:

Looks Good! Correct: Explanation: Using a filter with a set allows you to select rows from a table based upon a criterion

2.  The "Aggregate" Function

The next R function we will look into is called "aggregate". In general the aggregate function is used to perform a certain calculation on a subgroup of the data. In our plant breeding case, we want to work with the data which falls within a certain range of light wavelengths. Additionally for these subgroups we will like to calculate measures such as group mean etc. before using them further in the analysis. In this next video discussion, we demonstrate how to use the aggregate function with a simple dataset. 

Before moving onto the third R function we want to discuss, here are a couple of reflecting questions for you to double check that you are accurately understanding the aggregate function.

Quiz

Question

The aggregate function in R can be used for compute summary statistics of data subsets:

Looks Good! Correct: Explanation: Aggregate is a generic function which can be used to compute functions such as averages, etc grouped on a particular variable

Exercise 3

Modify the aggregate script such that instead of aggregating just the "Attribute1" column it aggregates the complete rows. Thus your output will look something like:

  myGroupedIndexValue GroupOn Attribute1 Attribute2

1                 100              100         30         20 2                 101              101         20         15

[HINT] Try to pass (send to the function) the table variable tabData rather than  the column tabData["Attribute1"] as the first parameter to the aggregate function.

3.  The "rbind" Function

The next R function we will look into is called "rbind" (think of it named from "row bind" ). In general the rbind function is used to join together 2 or more data tables which contain the same attributes or data categories. In our plant breeding case scenario, we will have to use this function to join multiple variables by rows into a single variable. Here we demonstrate how to use the rbind function with the ongoing simple dataset. 

Here are a couple of reflecting questions for you to double check that you are accurately understanding the rbind function in R.

Quiz

Question

"rbind" is an R function that may be used to:

Looks Good! Correct: Explanation: rbind can take a sequence of matrix or data frames arguments and combine them by rows

Exercise 4

Modify the rbind script such that instead of binding all the rows it only binds the rows from data file 1 where the column "FilterOn"  has a value less than 100.25

[HINT] use the function subset on variable in which you read the data file 1 before binding the two files

4. The "Transpose" Function

The next R function that we will be using is called  "transpose". In general the transpose function is used to reorganize data tables by moving the rows into columns and columns into rows. In many cases the data must be transposed before it can be passed (sent into) to a function for further calculation. In this next video discussion, we demonstrate how to use the transpose function with an ongoing simple dataset. We also show how to name columns and have the final format be in a table (data type), rather than a matrix (data type). 

Once again, before moving on, here are a couple of reflecting questions for you to double check that you are accurately understanding the transpose function in R.

Quiz

Question

Why did the author run 'as.data.frame' instruction after transposing the table in the transpose video?

Looks Good! Correct: Explanation: R 'table' provides some features that are very useful so keeping the variable in table type is benefitial

5.  The "Transform" Function

The next R function that we will be using is called  "transform". In general the transform function is used to add a new column to a data table by performing a specified calculation from the existing columns. In this next video discussion, we demonstrate how to use the transform function with an ongoing simple dataset. 

Before moving onto the final R function we want to discuss, here are a couple of reflecting questions for you to double check that you are accurately understanding the transform function.

Quiz

Question

The R 'transform' function may be used to:

Looks Good! Correct: Explanation: R transform function can be used with data tables (frames) for mathematical operations.

6.  The "Merge" Function

And finally we look into an R function called "merge".  In general the merge function is used to join 2 data tables side by side based upon common column values of the two data tables. In this next video discussion, we demonstrate how to use the merge function with an ongoing simple dataset. 

Finally here is a quick question for you to double check that you are accurately understanding the merge function.

Quiz

Question

True or False?  By default the R 'merge' function may be used to join two tables side by side and it returns only the matching rows.

Looks Good! Correct: Explanation: R merge is like a table join in databases where you can join tables and provide options to only output rows with match, rows from both tables, or all rows from one of the two tables

That concludes our quick overview of 6 commonly used R functions, which we will need to use to format Sarah's data in order to be able to analyze. In our next topic, we will discuss common R scripts.