Linear Regression in R
Published byat September 20th, 2021 , Revised On April 4, 2023
In statistics, regression analysis is used to study the relationship between an independent and dependent variable. In this method, one tries to ‘regress’ the value of ‘y,’ a dependent variable, with respect to ‘x,’ independent variables. In other words, one tries to see how ‘y’ changes as ‘x’ is changed.
If the regression between x and y is linear, on a graph, the line connecting the two would be linear. This implies that when x increases, so do y; when x decreases, so does y. Both variables are connected through the following equation:
y = ax + b
- y is the dependent or ‘response’ variable
- x is the independent or ‘predictor’ variable
- a and b = coefficients (constants)
‘R’ in Linear Regression
In regression analysis, R represents the correlation between predicted and observed values of y. And R square is the square of this coefficient. It indicates the percentage of variation (out of the total variation) as represented by the regression line.
Steps to Formulate Regression in R
Once the data has been gathered and categorized into dependent and independent variables, carry out the following steps to find linear regression in R:
Step # 1 – Develop a relationship model with the help of lm() function in R.
Syntax of this function: The basic syntax for lm() function in linear regression is:
- formula = symbol denoting the relation between x and y
- data = vector which the formula is applied on
Step # 2 – Find coefficients from the regression model created and formulate an equation using them. It will look something like this:
lm(formula = y ~ x)
…where the values will vary, of course, depending on the data input into the equation.
Step # 3 – Determine the relationship model’s summary to find out the average error in prediction, also known as called residuals. Residuals are basically unexplained variance. They are not the same as model error, although they are calculated from it. A bias discovered in residuals means there is a bias in error, too.
The basic syntax for predict() function in linear regression is:
- Object = formula, which was created using the lm() function.
- newdata = the vector containing the independent variable’s new value.
Linear Regression in R – Sample
In this sample, the aforementioned functions have been executed, their live demo provided to show what the model and data in it will look like. A simple example has been used, involving the calculation of a person’s weight (dependent variable) based on height (independent variable) which is already known.
Get statistical analysis help at an affordable price
- An expert statistician will complete your work
- Rigorous quality checks
- Confidentiality and reliability
- Any statistical software of your choice
- Free Plagiarism Report
Linear Regression in R Software
Step # 1 – Download R and RStudio. After opening RStudio, click File > New File > R Script. There are codes that need to be copy-pasted to first install some analysis tools and second to make R run itself.
To load required packages, use the following codes in R:
- tidyverse: used for data manipulation and visualization
- ggpubr: used to create a publication ready-plot
R displays the codes as follows:
Step # 2 – Load the data into R by imported the file contained within R that contains data sets. R will automatically arrange independent and dependent variables in respective columns from the file that’s imported.
Step # 3 – Ensure the data meets all the assumptions, whether it’s a simple of multiple linear regression. They are homoscedasticity, linearity, normality, and independent variables.
Step # 4 – Conduct a regression analysis by running codes, depending on whether it’s a simple or multiple linear regression.
Step # 5 – Check that the data meets the assumption of homoscedasticity before representing it in a graph.
Step # 6 – Represent the data in a graph. To plot the graph, first plot the points on the graph, add a line representing linear regression to the data and input the equation for the regression line. This will determine how the line looks on the graph.
Step # 7 – Interpret and report, in words, results represented graphically. For instance, they can be reported as: “It was observed that for every 1% increase in rainfall, there was a 2% increase in crop growth.”
To better understand how linear regression in R works, view some examples with every code needed in every step, its visual result, and resulting graphs as present in R.
Example case 1
Tip: Watch the video! Learn how to compute linear regression in R in 30 minutes.
An effective way to test whether a regression model will be a good fit is to look at the residuals. They are the differences between observed and predicted values.
squared (R2) is a statistical measure that represents a specific part of the variance in the case of a dependent variable. That variance is explained by one or more independent variables in a regression model (more than once in the case of multiple linear regression).
On a scatter plot, the direction and strength of a line denoting the relationship between independent and dependent variables are R. It is also called Pearson’s correlational coefficient. Its values can be anywhere from -1 to 1. They are interpreted as follows:
- –1 = perfect downwards (negative) linear relationship
- –0.70 = strong downwards (negative) linear relationship
- –0.50 = moderate downwards (negative) relationship
- –0.30 =weak downwards (negative) linear relationship
- 0 = no linear relationship
- +0.30 = weak upwards (positive) linear relationship
- +0.50 = moderate upwards (positive) relationship
- +0.70 = strong upwards (positive) linear relationship
- +1 = perfect upwards (positive) linear relationship