# Linear Regression in R

Published by

at September 20th, 2021 , Revised On April 4, 2023**Regression**

In statistics, regression analysis is used to study the relationship between an independent and dependent variable. In this method, one tries to ‘regress’ the value of ‘y,’ a dependent variable, with respect to ‘x,’ independent variables. In other words, one tries to see how ‘y’ changes as ‘x’ is changed.

**Linear regression**

If the regression between x and y is linear, on a graph, the line connecting the two would be linear. This implies that when x increases, so do y; when x decreases, so does y. Both variables are connected through the following equation:

y = ax + b

Where,

- y is the dependent or ‘response’ variable
- x is the independent or ‘predictor’ variable
- a and b = coefficients (constants)

**‘R’ in Linear Regression**

In regression analysis, R represents the correlation between predicted and observed values of y. And R square is the square of this coefficient. It indicates the percentage of variation (out of the total variation) as represented by the regression line.

**Steps to Formulate Regression in R**

Once the data has been gathered and categorized into dependent and independent variables, carry out the following steps to find linear regression in R:

**Step # 1 – **Develop a relationship model with the help of lm() function in R.

Syntax of this function: The basic syntax for lm() function in linear regression is:

lm(formula,data)

Where:

- formula = symbol denoting the relation between x and y
- data = vector which the formula is applied on

**Step # 2 – **Find coefficients from the regression model created and formulate an equation using them. It will look something like this:

Call:

lm(formula = y ~ x)

Coefficients:

(Intercept) x

-38.4551 0.6746

…where the values will vary, of course, depending on the data input into the equation.

**Step # 3 – **Determine the relationship model’s summary to find out the average error in prediction, also known as called **residuals**. Residuals are basically unexplained variance. They are not the same as model error, although they are calculated from it. A bias discovered in residuals means there is a bias in error, too.

**predict() Function**

The basic syntax for predict() function in linear regression is:

predict(object, newdata)

Where:

- Object = formula, which was created using the lm() function.
- newdata = the vector containing the independent variable’s new value.

**Linear Regression in R – Sample**

In this sample, the aforementioned functions have been executed, their live demo provided to show what the model and data in it will look like. A simple example has been used, involving the calculation of a person’s weight (dependent variable) based on height (independent variable) which is already known.

## Get statistical analysis help at an affordable price

We have:

- An expert statistician will complete your work
- Rigorous quality checks
- Confidentiality and reliability
- Any statistical software of your choice
- Free Plagiarism Report

**Linear Regression in R Software**

**Step # 1 – **Download R and RStudio. After opening RStudio, click File > New File > R Script. There are codes that need to be copy-pasted to first install some analysis tools and second to make R run itself.

To load required packages, use the following codes in R:

- tidyverse: used for data manipulation and visualization
- ggpubr: used to create a publication ready-plot

R displays the codes as follows:

library(tidyverse)

library(ggpubr)

theme_set(theme_pubr())

**Step # 2 – **Load the data into R by imported the file contained within R that contains data sets. R will automatically arrange independent and dependent variables in respective columns from the file that’s imported.

**Step # 3 – **Ensure the data meets all the assumptions, whether it’s a simple of multiple linear regression. They are homoscedasticity, linearity, normality, and independent variables.

**Step # 4 – **Conduct a regression analysis by running codes, depending on whether it’s a simple or multiple linear regression.

**Step # 5 – **Check that the data meets the assumption of homoscedasticity before representing it in a graph.

**Step # 6 – **Represent the data in a graph. To plot the graph, first plot the points on the graph, add a line representing linear regression to the data and input the equation for the regression line. This will determine how the line looks on the graph.

**Step # 7 – **Interpret and report, in words, results represented graphically. For instance, they can be reported as: “It was observed that for every 1% increase in rainfall, there was a 2% increase in crop growth.”

**Example**

To better understand how linear regression in R works, view some examples with every code needed in every step, its visual result, and resulting graphs as present in R.

Example case 1

**Tip:** Watch the video! Learn how to compute linear regression in R in 30 minutes.

**FAQs**

An effective way to test whether a regression model will be a good fit is to look at the residuals. They are the differences between observed and predicted values.

squared (R2) is a statistical measure that represents a specific part of the variance in the case of a dependent variable. That variance is explained by one or more independent variables in a regression model (more than once in the case of multiple linear regression).

On a scatter plot, the direction and strength of a line denoting the relationship between independent and dependent variables are R. It is also called Pearson’s correlational coefficient. Its values can be anywhere from -1 to 1. They are interpreted as follows:

- –1 = perfect downwards (negative) linear relationship
- –0.70 = strong downwards (negative) linear relationship
- –0.50 = moderate downwards (negative) relationship
- –0.30 =weak downwards (negative) linear relationship
- 0 = no linear relationship
- +0.30 = weak upwards (positive) linear relationship
- +0.50 = moderate upwards (positive) relationship
- +0.70 = strong upwards (positive) linear relationship
- +1 = perfect upwards (positive) linear relationship