Simple Linear Regression not So simple

Sukant Ranjan
5 min readJun 30, 2021
Simple Linear Regression

Predicting a quantitative response Y on the basis of a single predictor X. Assuming Linear relationship between X and Y

Y ≈ β0 + β1X means regressing Y on X

Here, β0 and β1 are two unknown constants representing intercept and slope. Together they are called as model coefficients or parameters.

ˆy = ˆ β0 + ˆ β1x

Our goal is to obtain coefficient estimates β0 and β1 such that linear model fits the available data well so that y ≈ β0 + β1xi for i =1, . . . , n

In other words, We need to find the an intercept β0 and slope β1 such that the resulting line is as close as possible. There are number of ways to measure this closeness. However, the most common approach involved minimizing the least squares.

Let yi = β0 + β1*xi Then we define ei = yi — y(pred)i which is residual

Then RSS(Residual sum of squares) = e1²+e2²+e3²+…..+en²

or We can say that RSS = (y1-β0 + β1*x1)²+(y2-β0 + β1*x2)²+…

Solving for minimum RSS, We get

The estimate we get for β0 and β1 is based on sample dataset. However, If we could average huge number of β0 and β1, We would probably get the population β0 and β1 and eventually population regression line in the same as we estimate population mean based on averaging sample means.

Similarly, How far will be the single estimate of sample mean from population mean can be calculated by getting Standard Error (SE)

Where σ is population Standard deviation

In the same way, We can calculate how far will be the single estimate of β0 and β1 from population β0 and β1

σ2 = Var(e)

Standard errors pointed above are useful in calculating confidence intervals like for 95% confidence interval β1 ± 2 · SE( β1)

Standard errors can also be used to perform hypothesis testing. In the case of Simple Linear Regression Null and alternate hypothesis will be:

H0 : There is no relationship between X and Y whereas

H1: There is some relationships between X and Y

Mathematically, It corresponds to testing:

H0 : β1 = 0 and H1: β1 <> 0, Since if β1 = 0 then Y = β0 + e and hence no relationship with x

So, β1 should be far from 0 and in case where SE( β1) is very small even small amount of β1 estimated tells that Y and X are related and in contrast if SE( β1) is large than estimated β1 should be sufficiently large in order to reject the Null hypothesis.

In practice, We compute t-statistic t = (β1–0)/SE(β1)

Consequently, We can compute the probability of observing any number equal to |t| or larger assuming β1 = 0. This probability is called p-value. In other words, p-value explains it is unlikely to observe such a substantial association between X and Y due to chance, in the absence of any real association between them. Hence, If we observe a smaller p-value then we can infer that there is an association between X and Y and we reject the null hypothesis.

Interpreting above table from Linear regression model, We see that Std. Error is small and relatively t-statistic is high. Also, p-value is almost 0. We can say that the Y is associated with X and we can reject the Null hypothesis.

Once, We have rejected the Null hypothesis, We proceed with assessment of underlying model. The Linear regression model is assessed using 2 related quantities:

  1. Residual Standard Error or RSE
  2. R² Statistic

As We know that, There is an error term e associated with each observation. Due to the presence of these error terms even after knowing the coefficients β0 and β1,We couldn’t predict perfectly Y from X. RSE is nothing but the estimate of standard deviation of error term e. Roughly speaking, RSE is an average of deviation of response from regression line

Since, RSE explains the magnitude in terms of units

R² is calculated on the basis of proportion of variance and always takes the value between 0 and 1

R² is calculated as

TSS is Total Sum of Squares and represents Total variance in response Y and can be thought of as the amount of variability inherent in the response before the regression is performed

--

--