This tutorial illustrates how to calculate correlations using the cor function in the R programming language.
The tutorial will consist of five examples for the application of the cor function. To be more specific, the content is structured as follows:
1) Example Data
2) Example 1: Using cor() Function to Calculate Pearson Correlation
3) Example 2: Using cor() Function to Calculate Kendall Correlation
4) Example 3: Using cor() Function to Calculate Spearman Correlation
5) Example 4: Calculate Correlation of Data with NA Values
6) Example 5: Calculate Correlation Matrix for Entire Data Frame
It’s time to dive into the examples!
Example Data
The data below is used as basem*nt for this R programming tutorial.
First, we have to set a random seed for reproducibility:
set.seed(35843367) # Set random seed
Then, we can create a first variable as shown below:
x <- rnorm(100) # Create x variablehead(x) # Print head of x variable# [1] 0.5613421 0.3596981 -0.6503523 0.4343684 0.6023800 0.0320683
The previously shown output of the RStudio console shows that our example data is a randomly distributed numerical vector called x.
Next, we have to create a second variable:
y <- rnorm(100) + x # Create y variablehead(y) # Print head of y variable# [1] 0.5934054 2.0107541 -1.4445170 -1.2551753 1.5718713 -1.1317284
The previous output shows the first values of our second numerical variable.
Let’s use these data to calculate some correlations!
Example 1: Using cor() Function to Calculate Pearson Correlation
In this example, I’ll illustrate how to apply the cor function to compute the Pearson correlation coefficient.
Have a look at the following R code and its output:
cor(x, y) # Pearson correlation# [1] 0.63733
As you can see, the Pearson correlation coefficient of our two example variables is 0.63733.
Example 2: Using cor() Function to Calculate Kendall Correlation
We can also use the cor function to calculate other types of correlation coefficients. This example explains how to compute a Kendall Correlation:
cor(x, y, method = "kendall") # Kendall correlation# [1] 0.4719192
As shown in the previous R code, we had to set the method argument to be equal to “kendall”.
Example 3: Using cor() Function to Calculate Spearman Correlation
Similar to Example 2, we can use the method argument of the cor function to return the Spearman correlation coefficient for our two variables:
cor(x, y, method = "spearman") # Spearman correlation# [1] 0.6522172
Example 4: Calculate Correlation of Data with NA Values
In this example, I’ll explain how to calculate a correlation when the given data contains missing values (i.e. NA).
First, we have to modify our example data:
x_NA <- x # Create variable with missing valuesx_NA[c(1, 3, 5)] <- NAhead(x_NA)# [1] NA 0.3596981 NA 0.4343684 NA 0.0320683
As you can see in the RStudio console, we have inserted some NA values in our x variable.
If we now use the new x variable with NA values to calculate a correlation, NA is returned as result:
cor(x_NA, y) # Try to calculate correlation# [1] NA
If we want to remove those NA observations from our data to calculate a valid correlation coefficient, we have to set the use argument to be equal to “complete.obs”:
cor(x_NA, y, use = "complete.obs") # Remove NA from calculation# [1] 0.6317544
Example 5: Calculate Correlation Matrix for Entire Data Frame
In Example 5, I’ll demonstrate how to create a correlation matrix for an entire data frame.
For this, we first have to create an exemplifying data set:
data <- data.frame(x, y, z = rnorm(100)) # Create example data framehead(data) # Print head of example data frame
Table 1 illustrates the first lines of our example data.
Next, we can use the cor function to calculate a correlation matrix of these data:
cor(data) # Create correlation matrix
In Table 2 it is shown that we have created a correlation matrix for our example data frame by using the previous syntax.
Video & Further Resources
In case you need more information on the topics of this tutorial, I recommend having a look at the following video on the Statistics Globe YouTube channel. In the video, I illustrate the examples of this article:
In addition, you might want to read the other tutorials on this website:
- Correlation Matrix in R
- Correlation of One Variable to All Others
- Calculate Correlation Matrix Only for Numeric Columns
- Remove Highly Correlated Variables from Data Frame
- Variance in R
- Standard Deviation in R
- Useful Commands in R
- R Programming Language
To summarize: You have learned in this article how to compute correlations using the cor function in the R programming language. If you have any additional comments or questions, let me know in the comments.