Linear regression is an easy way of evaluating the relationship between two variables.
Previously, performing linear regression in Excel was nothing less than a complex task. But with advanced Excel data analysis tools, it is now only a matter of a few clicks.
The guide below will not only teach you how to perform linear regression in Excel but also how you may analyze a linear regression graph in Excel.
So, without further ado, let’s dive right in 👇
Table of Contents
Simple linear regression draws the relationship between a dependent and an independent variable.
👉 The dependent variable is the variable that needs to be predicted (or whose value is to be found).
👉 The independent variable explains (or causes) the change in the dependent variable.
Simply put, the dependent variable depends upon the independent variable. And as the independent variable changes, the dependent variable changes too.
Mathematically, the linear relationship between these two variables is explained as follows:
Y= a + bx
Y = dependent variable
a = regression intercept term
b = regression slope coefficient
x = independent variable
“a” and “b” are also called regression coefficients. And Excel returns the predicted values of these regression coefficients too.
Kasper Langmann , Microsoft Office Specialist
Imagine a company that sells sweaters in a cold region. And the sale of sweaters is directly linked to the temperatures in that region.
The colder it is (low temperatures 🥶), the higher the sales of sweaters 🧣 go. This means sales (the dependent variable) depend upon the temperature (the independent variable).
Now, to predict the company’s sales for the future, you must analyze the sales trend in the past. This can be done by drawing a trendline.
Drawing this trendline between a dependent variable Y (the sales) and an independent variable X (the temperature) is called running linear regression.
The image above contains the historical data for both variables (temperatures and sales) for a few months.
To explain the relationship between these variables, we need to make a scatter plot.
To plot the above data in a scatter plot in Excel:
Excel plots the data in a scatter plot.
Note that each dot in the scatter plot above is formed at the intersection of Variable X and Y.
For example, the first dot is plotted at the point where Y = 625 and X = 2.
Next, we must draw a trend line out of this scatter plot. To do so:
A drop-down menu appears.
And there you go! Excel draws a linear trendline on the scatterplot.
The above image shows a downward regression line which represents a negative trend. But why is that?
To understand that, you must know how to analyze the results of a linear regression graph. And don’t worry – it’s only a section ahead.
We also want Excel to show the equation and R-squared for this graph. For that:
And Excel will display the following regression statistics on the graph:
Equation: y= -19.622x + 612.77
R-squared= 0.7456
What are these? And what do they tell? We will discuss this shortly.
Pro Tip!
How to quickly interpret the relationship between two variables? By checking the sign of the x variable 💡
A positive sign means a positive relationship. And a negative sign means a negative relationship between the two variables.
Since our equation shows a “-19.622x”, the relation between our variables is negative.
Do you also find the trendline a little overshadowed? Not to worry – You can always format it in Excel.
For example, to change the color of the trendline:
Guess we will go with red for now 🚩 What do you think about it?
Not only the color, but you can also change the style of the trendline.
Say, we want to change our dotted trendline to a solid one. To do so:
This will change the style of the trendline from a dotted line to a perfectly solid line.
To enhance the readability of the graph, you may add graph titles and axes titles to it as follows:
How about adding the Axis titles too?
To add a vertical title (for the Y-axis) to your chart:
We have set the title for the Y-axis to “Sale of Sweaters”.
To add a horizontal Axis Title (for the X-axis):
We have set the title for the X-axis to “Avg. Temperature”
And that’s it. We’ve successfully run linear regression in Excel 🥳
Good job with running linear regression in Excel.
Now is the time that we analyze the linear regression trendline formed above.
A linear trendline in Excel can take the following three shapes:
If your trendline is upward facing (it elevates as it goes from left to right), it denotes a positive trend.
This means that there exists a positive relationship between both variables. An increase in the independent variable causes the dependent variable to increase.
This is how your graph will look with a positive trendline to it.
If your trendline is downward sloping (it slopes down as it goes from left to right), it denotes a negative trend.
A negative trendline means a negative relationship between both variables.
When there is a negative relationship between two variables, an increase in the independent variable causes the dependent variable to decrease.
This is how your graph will look with a negative trendline to it.
Jog down your memory lane to remember the trendline type in our example above. It was also a downward-sloping (negative) trendline.
That’s because there exists a negative relationship between sales and temperature. As the temperature falls, sales increase.
The two variables can also be independent of each other. In this case, movement in both variables is random with no relation to each other.
As there exists no relationship between them (neither positive nor negative), there is no particular slope for the trendline between them (neither upward facing nor downward sloping).
Such a trendline might look like this.
The trendline above is not exactly horizontal but very close to that. This is because there is no relation between the variables.
What if we want to know the percentage of change in Y caused by a change in X?
For example, for every 1% decrease in temperature, sales increase by what percentage?
The slope of the graph is an answer to this. Remember the linear regression equation?
Y = a + bx
In the above equation, the slope is represented by “b”. And the linear regression equation for our example turned out as follows:
Y= 612.77 – 19.622x
Here, the value for b is -19.622 and so is our slope. This means that a 1% change in the X variable (the temperature) causes a -19.622% change in the Y variable (the sales).
Also, as the sign with the value for b is a minus sign, this means that a 1% decrease in Variable X (temperature) causes a 19.622% increase in Variable Y (Sales).
Pro Tip!
An easy way to remember the slope is to remember Rise over Run. Rise means vertical axis. Run means horizontal axis. So the slope defines the change in variable Y caused by a change in variable X.
Another important output of our scatterplot is the R-squared value 👀
It tells us how much variation of the dependent variable comes from the change in the independent variable.
and independent variables" width="650" height="388" />
The R-squared for our example is 0.7456.
This tells that only 74.56% variation of Variable Y can be explained by Variable X.
Another statistical measure relevant to the linear regression model is the p value. However, it is totally opposite to the concept of R-squared.
Kasper Langmann , Microsoft Office Specialist
The above guide explains how to perform a linear regression analysis in Excel. And then, how to analyze the linear regression trendline and other relevant statistics.
👉 In addition to that, it also explains how you may format a trendline in Excel in different ways.
Performing linear regression in Excel through a scatter plot is super smart. But this is only one feature of Excel.
And there are many more smart functions in Excel. Like the VLOOKUP, SUMF, and IF functions.
Want to learn them already? Enroll in my 30-minute free email course that teaches you these and many more functions of Excel.