How to Calculate Correlation Coefficients in R (5 Examples) | cor Function (2024)

This tutorial illustrates how to calculate correlations using the cor function in the R programming language.

The tutorial will consist of five examples for the application of the cor function. To be more specific, the content is structured as follows:

1) Example Data

2) Example 1: Using cor() Function to Calculate Pearson Correlation

3) Example 2: Using cor() Function to Calculate Kendall Correlation

4) Example 3: Using cor() Function to Calculate Spearman Correlation

5) Example 4: Calculate Correlation of Data with NA Values

6) Example 5: Calculate Correlation Matrix for Entire Data Frame

It’s time to dive into the examples!

Example Data

The data below is used as basem*nt for this R programming tutorial.

First, we have to set a random seed for reproducibility:

set.seed(35843367) # Set random seed

Then, we can create a first variable as shown below:

x <- rnorm(100) # Create x variablehead(x) # Print head of x variable# [1] 0.5613421 0.3596981 -0.6503523 0.4343684 0.6023800 0.0320683

The previously shown output of the RStudio console shows that our example data is a randomly distributed numerical vector called x.

Next, we have to create a second variable:

y <- rnorm(100) + x # Create y variablehead(y) # Print head of y variable# [1] 0.5934054 2.0107541 -1.4445170 -1.2551753 1.5718713 -1.1317284

The previous output shows the first values of our second numerical variable.

Let’s use these data to calculate some correlations!

Example 1: Using cor() Function to Calculate Pearson Correlation

In this example, I’ll illustrate how to apply the cor function to compute the Pearson correlation coefficient.

Have a look at the following R code and its output:

cor(x, y) # Pearson correlation# [1] 0.63733

As you can see, the Pearson correlation coefficient of our two example variables is 0.63733.

Example 2: Using cor() Function to Calculate Kendall Correlation

We can also use the cor function to calculate other types of correlation coefficients. This example explains how to compute a Kendall Correlation:

cor(x, y, method = "kendall") # Kendall correlation# [1] 0.4719192

As shown in the previous R code, we had to set the method argument to be equal to “kendall”.

Example 3: Using cor() Function to Calculate Spearman Correlation

Similar to Example 2, we can use the method argument of the cor function to return the Spearman correlation coefficient for our two variables:

cor(x, y, method = "spearman") # Spearman correlation# [1] 0.6522172

Example 4: Calculate Correlation of Data with NA Values

In this example, I’ll explain how to calculate a correlation when the given data contains missing values (i.e. NA).

First, we have to modify our example data:

x_NA <- x # Create variable with missing valuesx_NA[c(1, 3, 5)] <- NAhead(x_NA)# [1] NA 0.3596981 NA 0.4343684 NA 0.0320683

As you can see in the RStudio console, we have inserted some NA values in our x variable.

If we now use the new x variable with NA values to calculate a correlation, NA is returned as result:

cor(x_NA, y) # Try to calculate correlation# [1] NA

If we want to remove those NA observations from our data to calculate a valid correlation coefficient, we have to set the use argument to be equal to “complete.obs”:

cor(x_NA, y, use = "complete.obs") # Remove NA from calculation# [1] 0.6317544

Example 5: Calculate Correlation Matrix for Entire Data Frame

In Example 5, I’ll demonstrate how to create a correlation matrix for an entire data frame.

For this, we first have to create an exemplifying data set:

data <- data.frame(x, y, z = rnorm(100)) # Create example data framehead(data) # Print head of example data frame

How to Calculate Correlation Coefficients in R (5 Examples) | cor Function (1)

Table 1 illustrates the first lines of our example data.

Next, we can use the cor function to calculate a correlation matrix of these data:

cor(data) # Create correlation matrix

How to Calculate Correlation Coefficients in R (5 Examples) | cor Function (2)

In Table 2 it is shown that we have created a correlation matrix for our example data frame by using the previous syntax.

Video & Further Resources

In case you need more information on the topics of this tutorial, I recommend having a look at the following video on the Statistics Globe YouTube channel. In the video, I illustrate the examples of this article:

In addition, you might want to read the other tutorials on this website:

  • Correlation Matrix in R
  • Correlation of One Variable to All Others
  • Calculate Correlation Matrix Only for Numeric Columns
  • Remove Highly Correlated Variables from Data Frame
  • Variance in R
  • Standard Deviation in R
  • Useful Commands in R
  • R Programming Language

To summarize: You have learned in this article how to compute correlations using the cor function in the R programming language. If you have any additional comments or questions, let me know in the comments.

Leave a Reply

How to Calculate Correlation Coefficients in R (5 Examples) | cor Function (2024)

FAQs

How to calculate the correlation coefficient in R? ›

R calculates the correlation coefficient with the function cor() . In its basic form, cor() needs two inputs: the x-coordinates and the y-coordinates. The result of cor(bm$height, bm$upper_arm_length) is NA because at least one of the two input vectors contains missing values.

How do you calculate correlation coefficient easily? ›

Formula for the Correlation Coefficient

To calculate the Pearson correlation, start by determining each variable's standard deviation as well as the covariance between them. The correlation coefficient is covariance divided by the product of the two variables' standard deviations.

How to use the cor() function in R? ›

The cor() function will calculate the correlation between two vectors, or will create a correlation matrix when given a matrix. cor(apple, micr) simply returned the correlation between the two stocks.

How to find the correlation coefficient R with the regression line? ›

To calculate the correlation coefficient, the following steps are typically used:
  1. Step 1: Calculate the mean of each variable.
  2. Step 2: Subtract the mean of each variable from each data point to obtain the deviations.
  3. Step 3: Multiply the deviations of each variable for each data point to obtain the products.

How to compare two correlation coefficients in R? ›

The way to do this is by transforming the correlation coefficient values, or r values, into z scores. This transformation, also known as Fisher's r to z transformation, is done so that the z scores can be compared and analyzed for statistical significance by determining the observed z test statistic.

What is the quickest method to find correlation? ›

The quickest method to find correlation between two variables is the method of concurrent deviation. This method involves finding the deviation of each value of one variable from its mean and the deviation of each value of the other variable from its mean.

What is the formula for the correlation function? ›

To get some insight on the relation between X(t1) and X(t2), we define correlation and covariance functions. For a random process {X(t),t∈J}, the autocorrelation function or, simply, the correlation function, RX(t1,t2), is defined by RX(t1,t2)=E[X(t1)X(t2)],for t1,t2∈J.

Why do we calculate correlation coefficient? ›

In summary, correlation coefficients are used to assess the strength and direction of the linear relationships between pairs of variables.

How to calculate correlation between two variables? ›

Here are the steps to take in calculating the correlation coefficient:
  1. Determine your data sets. ...
  2. Calculate the standardized value for your x variables. ...
  3. Calculate the standardized value for your y variables. ...
  4. Multiply and find the sum. ...
  5. Divide the sum and determine the correlation coefficient.
Jul 31, 2023

How to create a correlation plot in R? ›

One of the most common is the corrplot function. We first need to install the corrplot package and load the library. Next, we'll run the corrplot function providing our original correlation matrix as the data input to the function. A default correlation matrix plot (called a Correlogram) is generated.

What is a good correlation coefficient? ›

If we wish to label the strength of the association, for absolute values of r, 0-0.19 is regarded as very weak, 0.2-0.39 as weak, 0.40-0.59 as moderate, 0.6-0.79 as strong and 0.8-1 as very strong correlation, but these are rather arbitrary limits, and the context of the results should be considered.

What is the formula for the correlation coefficient in R? ›

r=∑(xi−¯x)(yi−¯y)√∑(xi−¯x)2∑(yi−¯y)2 .

How to calculate the correlation coefficient? ›

The correlation coefficient formula is: r = n ∑ X Y − ∑ X ∑ Y ( n ∑ X 2 − ( ∑ X ) 2 ) ⋅ ( n ∑ Y 2 − ( ∑ Y ) 2 ) . The terms in that formula are: n = the number of data points, i.e., (x, y) pairs, in the data set. ∑ X Y = the sum of the product of the x-value and y-value for each point in the data set.

How to calculate correlation in RStudio? ›

You can use the cor() function to calculate the Pearson correlation coefficient in R. To test the significance of the correlation, you can use the cor. test() function.

How to find R2 in R Studio? ›

You can use the summary() function to view the R² of a linear model in R. You will see the “R-squared” near the bottom of the output.

How do you find R in rank correlation coefficient? ›

R=1−6∑D2N(N2−1)=1−6×427(72−1)=1−252336=1−0.75=0.25. Q. In a poem recitation competition, ten participants were recorded following marks by two different judges X and Y. Calculate the coefficient of rank correlation.

Is correlation coefficient R or r2? ›

The Pearson correlation coefficient (r) is used to identify patterns in things whereas the coefficient of determination (R²) is used to identify the strength of a model.

What is the computed value of correlation coefficient R? ›

Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward). A correlation coefficient close to 0 suggests little, if any, correlation.

References

Top Articles
Latest Posts
Article information

Author: Virgilio Hermann JD

Last Updated:

Views: 6512

Rating: 4 / 5 (41 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Virgilio Hermann JD

Birthday: 1997-12-21

Address: 6946 Schoen Cove, Sipesshire, MO 55944

Phone: +3763365785260

Job: Accounting Engineer

Hobby: Web surfing, Rafting, Dowsing, Stand-up comedy, Ghost hunting, Swimming, Amateur radio

Introduction: My name is Virgilio Hermann JD, I am a fine, gifted, beautiful, encouraging, kind, talented, zealous person who loves writing and wants to share my knowledge and understanding with you.