Analytics For Business

Sunday 18 October 2015

Linear Regression Using Microsoft Excel


In this post, we will talk about the multiple linear regression using Excel. We will learn about the basic assumptions of classical linear regression model, performing regression using Excel, and interpreting the result.
Let’s start with the basic definition of regression analysis.
The term regression was introduced by Francis Galton in their research paper. Modern interpretation of Regression analysis is concerned with the study of dependence of one variable (dependent variable), on one or more other variables (the explanatory or independent variables), with a view of estimating or predicting the dependent variable’s value in terms of independent variables.
Method of estimation:
There are two mostly used method of estimation
      1-      Ordinary Least Squares (OLS)
      2-      Maximum Likelihood (ML)
OLS is highly used method for estimation because it is intuitive appealing and simpler than ML method. Besides, in linear regression context the both methods generally give similar results.
Assumptions of OLS method:
These are the following assumptions of OLS. Under these assumptions, OLS has some useful statistical properties which makes it one of the most powerful method of regression analysis. Regression model will be valid under the following assumptions.
1-      Regression model is linear in parameters, though it may or may not be linear in the variables i.e. regressand Y and regressor X may be nonlinear.
                                                         Yi = β0 + β1 Xi + ui
2-      Fixed regressor X or X values are independent of the error term
                                                  i.e. covariance(Xi ,ui) = 0
3-      Expected value or mean of error term (random disturbance) is zero.
                                                                E(ui) = 0
4-      Variance of error term (random disturbance) is same regardless of the value of regressor X. This property is called Homoscedasticity (Home – equal , Scedasticity – spread i.e. equal variance)
                                                             Var(ui) = σ2
               Homoscedasticity refers to the assumption that that the dependent variable exhibits similar   amounts of variance across the range of values for an independent variable.
                                             

               Heteroscedasticity is the absence of homoscedasticity.
5-      No Autocorrelation between the any 2 error terms (random disturbances).
                                                  covariance(ui ,uj) = 0             i ≠ j
6-      No Multicollinearity between the any 2 regressors or X values.
                                                  covariance(Xi ,Xj) = 0             i ≠ j
7-      Number of observations must be greater than number of parameters to be estimated.
                                              
For example, we have the data set for a store who sells cold drinks. We need to find out the relation between the demand of cold drink (dependent variable) and the price of cold drink & advertisement expenses done (independent variable).
Week
Demand (Q)
Price (P)
Ad Expenses (A)
1
51345
2.78
4280
2
50337
2.35
3875
3
86732
3.22
12360
4
118117
1.85
19250
5
48024
2.65
6450
6
97375
2.95
8750
7
75751
2.86
9600
 8
78797
3.35
9600
9
59856
3.45
9600
10
23696
3.25
6250
11
61385
3.21
4780
12
63750
3.02
6770
13
60996
3.16
6325
14
84276
2.95
9655
15
54222
2.65
10450
16
58131
3.24
9750
17
55398
3.55
11500
18
69943
3.75
8975
19
79785
3.85
8975
20
38892
3.76
6755
21
43240
3.65
5500
22
52078
3.58
4365
23
11321
3.78
9525
24
73113
3.75
18600
25
79988
3.22
14450
26
98311
3.42
15500
27
78953
2.27
21225
28
52875
3.78
7580
29
81263
3.95
4175
30
67260
3.52
4365
31
83323
3.45
12250
32
68322
3.92
11850
33
71925
4.05
14360
34
29372
4.01
9540
35
21710
3.68
7250
36
37833
3.62
4280
37
41154
3.57
13800
38
50925
3.65
15300
39
57657
3.89
5250
40
52036
3.86
7650
41
58677
3.95
6650
42
73902
3.91
9850
43
55327
3.88
8350
44
16262
4.12
10250
45
38348
3.94
16450
46
29810
4.15
13200
47
69613
4.12
14600
48
45822
4.16
13250
49
43207
4
18450
50
81998
3.93
16500
51
46756
3.89
6500
52
34593
3.83
5650


Performing multiple regression in Excel
      1-      Click on the data tab. Then click on the menu Data Analysis at the corner. Then a pop up window will open. Now select Regression from this pop up window. And click in OK.

       2-      Now a pop up window will open asking for inputs. Now click on arrow sign to select the X range (Regressor) and Y range.



 
      3-      You can select any of the parameters like confidence interval (by default it is 95%) or any other as per your requirement.



      4-      Now select the output range. You have options to get output in same work sheet, new worksheet and new workbook. Here we want to get output in same worksheet and selected a range. As you can see in above pic. Now press OK to get output.

 
Interpretation of regression output:
Regression Statistics table:
Multiple R – is the coefficient of multiple correlation. You may be knowing the bivariate correlation. Multiple correlation can be interpreted in the same way. R is a measure of the strength of the association between the independent variables and the one dependent variable.
                                              

R Square – is the square of Multiple R. It is called as multiple coefficient of determination. It is useful to explain that how much of total variation can be explained by the predicted model. It is measure of goodness of fit of predicted model.
In our case R square = 0.263782. It means that 26.37% of the variation (variation of Y around its mean) can be explained by the model (by independent variables).
Unfortunately, in case of multiple regression R square is not the unbiased estimation of coefficient of determination. Adjusted R square is a better estimate for coefficient of determination in case of multiple regression.
Adjusted R Square - It indicates that how much of total variation can be explained by the model, adjusted for the degree of freedom. It should be used in place of R square for multiple regression
                                                     

Where k = the number of variables in the model (Independent as well as dependent)  and n = the number of data elements in the sample.
In our case, k=3 (2 IV, and 1DV) , and n =52
Standard Error - refers to the estimated standard deviation of the error term u. Sometimes it is called as standard error of the regression.
                                                  Standard Error = sqrt(Square sum of error ∕(n-k))

ANOVA Table:
Sum Squares (SS) corresponding to regression is called Explained Sum Square (ESS) or Sum Square Regression (SSR).
Sum Squares (SS) corresponding to residual is called Residual Sum Square (RSS) or Sum Square Error (SSE).
Total Sum Square (TSS) = ESS + RSS
Mean Sum Square = SS/df
Significance of F – indicates weather value of R square is significant or not i.e. significance of measure of goodness of fit.


Coefficients - gives the least squares estimates of βj.
Standard error- gives the standard errors of the least squares estimates bj of βj.
t Stat-  gives the computed t-statistic for hypotheses H0: βj = 0 against Ha: βj ≠ 0. It checks for the significance of the parameter βj values. Also called as significance test.
T stat = Coefficients / Standard error. It is compared to a t value from t-table with df degrees of freedom and a particular confidence interval (1-α).
P-value- gives the p-value for test of H0: βj = 0 against Ha: βj ≠ 0. It is the testing of hypotheses.
As we have taken a confidence interval of 95%. So, if p-value corresponding to a parameter is less than 0.05. Then, we will reject the Null Hypothesis i.e. βj ≠ 0 or is significant.
Note that this p-value is for a two-sided test. For a one-sided test divide this p-value by 2 (also checking the sign of the t-Stat).
Columns Lower 95% and Upper 95% values define a 95% confidence interval for βj.

Estimated Regression Model:

Estimated Industry Demand (Q cap) = 100626.034 -16392.63 * Price (P) + 1.5763295 * Ad Expense (A)

No comments:

Post a Comment