Multiple linear regression

Multiple linear regressionProf. Maria Tackett1

Click for PDF of slides

Review3

Vocabulary4

VocabularyResponse variable: Variable whose behavior or variation you are trying 
to understand. 
4

Vocabulary

Response variable: Variable whose behavior or variation you are trying to understand.
Explanatory variables: Other variables that you want to use to explain the variation in the response.

Vocabulary

Response variable: Variable whose behavior or variation you are trying to understand.
Explanatory variables: Other variables that you want to use to explain the variation in the response.
Predicted value: Output of the model function
- The model function gives the typical value of the response variable conditioning on the explanatory variables.

Vocabulary

Response variable: Variable whose behavior or variation you are trying to understand.
Explanatory variables: Other variables that you want to use to explain the variation in the response.
Predicted value: Output of the model function
- The model function gives the typical value of the response variable conditioning on the explanatory variables.

Residuals: Shows how far each case is from its predicted value
- Residual = Observed value - Predicted value

The linear model with a single predictor

We're interested in the (population parameter for the intercept) and the (population parameter for the slope) in the following model:

The linear model with a single predictor

We're interested in the (population parameter for the intercept) and the (population parameter for the slope) in the following model:

Unfortunately, we can't get these values
So we use sample statistics to estimate them:

Least squares regression

The regression line minimizes the sum of squared residuals.

Residuals: ,
The regression line minimizes .
Equivalently, minimizing

Data and Packages

library(tidyverse)
library(broom)

paris_paintings <- read_csv("data/paris_paintings.csv", 
               na = c("n/a", "", "NA"))

Paris Paintings Codebook
Source: Printed catalogues from 28 auction sales held in Paris 1764 - 1780
3,393 paintings, prices, descriptive details, characteristics of the auction and buyer (over 60 variables)

Single numerical predictor

m_ht_wd <- lm(Height_in ~ Width_in, data = paris_paintings)
tidy(m_ht_wd)

## # A tibble: 2 x 5
##   term        estimate std.error statistic  p.value
##   <chr>          <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)    3.62    0.254        14.3 8.82e-45
## 2 Width_in       0.781   0.00950      82.1 0.

Single categorical predictor (2 levels)

m_ht_lands <- lm(Height_in ~ factor(landsALL), data = paris_paintings)
tidy(m_ht_lands)

## # A tibble: 2 x 5
##   term              estimate std.error statistic  p.value
##   <chr>                <dbl>     <dbl>     <dbl>    <dbl>
## 1 (Intercept)          22.7      0.328      69.1 0.      
## 2 factor(landsALL)1    -5.65     0.532     -10.6 7.97e-26

Single categorical predictor (> 2 levels)

m_ht_sch <- lm(Height_in ~ school_pntg, data = paris_paintings)
tidy(m_ht_sch)

## # A tibble: 7 x 5
##   term            estimate std.error statistic p.value
##   <chr>              <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)        14.        10.0     1.40  0.162  
## 2 school_pntgD/FL     2.33      10.0     0.232 0.816  
## 3 school_pntgF       10.2       10.0     1.02  0.309  
## 4 school_pntgG        1.65      11.9     0.139 0.889  
## 5 school_pntgI       10.3       10.0     1.02  0.306  
## 6 school_pntgS       30.4       11.4     2.68  0.00744
## 7 school_pntgX        2.87      10.3     0.279 0.780

The linear model with multiple predictors11

The linear model with multiple predictors

Population model:

The linear model with multiple predictors

Population model:

Sample model that we use to estimate the population model:

Data

The data set contains prices for Porsche and Jaguar cars for sale on cars.com.

car: car make (Jaguar or Porsche)

price: price in USD

age: age of the car in years

mileage: previous miles driven

Price, age, and make

Price vs. age and make

Does the relationship between age and price depend on the make of the car?

Modeling with main effects

m_main <- lm(price ~ age + car, data = sports_car_prices)
m_main %>%
  tidy() %>%
  select(term, estimate)

## # A tibble: 3 x 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   44310.
## 2 age           -2487.
## 3 carPorsche    21648.

Modeling with main effects

m_main <- lm(price ~ age + car, data = sports_car_prices)
m_main %>%
  tidy() %>%
  select(term, estimate)

## # A tibble: 3 x 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   44310.
## 2 age           -2487.
## 3 carPorsche    21648.

Plug in 0 for carPorsche to get the linear model for Jaguars.

Plug in 0 for carPorsche to get the linear model for Jaguars.

Plug in 0 for carPorsche to get the linear model for Jaguars.

Plug in 1 for carPorsche to get the linear model for Porsches.

Plug in 0 for carPorsche to get the linear model for Jaguars.

Plug in 1 for carPorsche to get the linear model for Porsches.

Jaguar

Porsche

Rate of change in price as the age of the car increases does not depend on make of car (same slopes)
Porsches are consistently more expensive than Jaguars (different intercepts)

Interpretation of main effects

Main effects

## # A tibble: 3 x 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   44310.
## 2 age           -2487.
## 3 carPorsche    21648.

Main effects

## # A tibble: 3 x 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   44310.
## 2 age           -2487.
## 3 carPorsche    21648.

All else held constant, for each additional year of a car's age, the price of the car is predicted to decrease, on average, by $2,487.

Main effects

## # A tibble: 3 x 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   44310.
## 2 age           -2487.
## 3 carPorsche    21648.

All else held constant, for each additional year of a car's age, the price of the car is predicted to decrease, on average, by $2,487.
All else held constant, Porsches are predicted, on average, to have a price that is $21,647 greater than Jaguars.

Main effects

## # A tibble: 3 x 2
##   term        estimate
##   <chr>          <dbl>
## 1 (Intercept)   44310.
## 2 age           -2487.
## 3 carPorsche    21648.

All else held constant, for each additional year of a car's age, the price of the car is predicted to decrease, on average, by $2,487.
All else held constant, Porsches are predicted, on average, to have a price that is $21,647 greater than Jaguars.
Jaguars that are new (age = 0) are predicted, on average, to have a price of $44,309.

Why is our linear regression model different from what we got from geom_smooth(method = "lm")?

What went wrong?22

What went wrong?car is the only variable in our model that affects the intercept.
22

What went wrong?

car is the only variable in our model that affects the intercept.
The model we specified assumes Jaguars and Porsches have the same slope and different intercepts.

What went wrong?

car is the only variable in our model that affects the intercept.
The model we specified assumes Jaguars and Porsches have the same slope and different intercepts.
What is the most appropriate model for these data?
- same slope and intercept for Jaguars and Porsches?
- same slope and different intercept for Jaguars and Porsches?
- different slope and different intercept for Jaguars and Porsches?

Interacting explanatory variables

Including an interaction effect in the model allows for different slopes, i.e. nonparallel lines.
This means that the relationship between an explanatory variable and the response depends on another explanatory variable.
We can accomplish this by adding an interaction variable. This is the product of two explanatory variables.

Price vs. age and car interacting

ggplot(data = sports_car_prices,
       mapping = aes(y = price, x = age, color = car)) +
  geom_point() + 
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Age (years)", y = "Price (USD)", color = "Car Make")

Modeling with interaction effects

m_int <- lm(price ~ age + car + age * car, data = sports_car_prices) 
m_int %>%
  tidy() %>%
  select(term, estimate)

## # A tibble: 4 x 2
##   term           estimate
##   <chr>             <dbl>
## 1 (Intercept)      56988.
## 2 age              -5040.
## 3 carPorsche        6387.
## 4 age:carPorsche    2969.

Interpretation of interaction effects

Plug in 0 for carPorsche to get the linear model for Jaguars.

Interpretation of interaction effects

Plug in 0 for carPorsche to get the linear model for Jaguars.

Plug in 1 for carPorsche to get the linear model for Porsches.

Interpretation of interaction effects

Jaguar

Porsche

Rate of change in price as the age of the car increases depends on the make of the car (different slopes).
Porsches are consistently more expensive than Jaguars (different intercepts).

Continuous by continuous interactions

Interpretation becomes trickier
Slopes conditional on values of explanatory variables

Continuous by continuous interactions

Interpretation becomes trickier
Slopes conditional on values of explanatory variables

Third order interactions

Can you? Yes
Should you? Probably not if you want to interpret these interactions in context of the data.

Assessing quality of model fit30

Assessing the quality of the fit

The strength of the fit of a linear model is commonly evaluated using .
It tells us what percentage of the variability in the response variable is explained by the model. The remainder of the variability is unexplained.
is sometimes called the coefficient of determination.

What does "explained variability in the response variable" mean?

Obtaining in R

price vs. age and make

glance(m_main)

## # A tibble: 1 x 12
##   r.squared adj.r.squared  sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.607         0.593 11848.      44.0 2.73e-12     2  -646. 1301. 1309.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

glance(m_main)$r.squared

## [1] 0.6071375

Obtaining in R

price vs. age and make

glance(m_main)

## # A tibble: 1 x 12
##   r.squared adj.r.squared  sigma statistic  p.value    df logLik   AIC   BIC
##       <dbl>         <dbl>  <dbl>     <dbl>    <dbl> <dbl>  <dbl> <dbl> <dbl>
## 1     0.607         0.593 11848.      44.0 2.73e-12     2  -646. 1301. 1309.
## # … with 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

glance(m_main)$r.squared

## [1] 0.6071375

About 60.7% of the variability in price of used cars can be explained by age and make.

glance(m_main)$r.squared #model with main effects

## [1] 0.6071375

glance(m_int)$r.squared #model with main effects + interactions

## [1] 0.6677881

glance(m_main)$r.squared #model with main effects

## [1] 0.6071375

glance(m_int)$r.squared #model with main effects + interactions

## [1] 0.6677881

The model with interactions has a higher .

glance(m_main)$r.squared #model with main effects

## [1] 0.6071375

glance(m_int)$r.squared #model with main effects + interactions

## [1] 0.6677881

The model with interactions has a higher .
Using for model selection in models with multiple explanatory variables is not a good idea as increases when any variable is added to the model.

- first principles

We can write explained variation using the following ratio of sums of squares:

Why does this expression make sense?

But remember, adding any explanatory variable will always increase

Adjusted

where

is the number of observations and

is the number of predictors in the model.

Adjusted

where

is the number of observations and

is the number of predictors in the model.

Adjusted doesn't increase if the new variable does not provide any new information or is completely unrelated.

Adjusted

where

is the number of observations and

is the number of predictors in the model.

Adjusted doesn't increase if the new variable does not provide any new information or is completely unrelated.
This makes adjusted a preferable metric for model selection in multiple regression models.

Comparing modelsglance(m_main)$r.squared

## [1] 0.6071375
glance(m_int)$r.squared

## [1] 0.6677881
glance(m_main)$adj.r.squared

## [1] 0.5933529
glance(m_int)$adj.r.squared

## [1] 0.649991
36

In pursuit of Occam's Razor

Occam's Razor states that among competing hypotheses that predict equally well, the one with the fewest assumptions should be selected.
Model selection follows this principle.
We only want to add another variable to the model if the addition of that variable brings something valuable in terms of predictive power to the model.
In other words, we prefer the simplest best model, i.e. parsimonious model.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Multiple linear regression

Prof. Maria Tackett

Click for PDF of slides

Review

Vocabulary

Vocabulary

Vocabulary

Vocabulary

Vocabulary

The linear model with a single predictor

The linear model with a single predictor

Least squares regression

Data and Packages

Single numerical predictor

Single categorical predictor (2 levels)

Single categorical predictor (> 2 levels)

The linear model with multiple predictors

The linear model with multiple predictors

The linear model with multiple predictors

Data

Price, age, and make

Price vs. age and make

Modeling with main effects

Modeling with main effects

Interpretation of main effects

Main effects

Main effects

Main effects

Main effects

What went wrong?

What went wrong?

What went wrong?

What went wrong?

Interacting explanatory variables

Price vs. age and car interacting

Modeling with interaction effects

Interpretation of interaction effects

Interpretation of interaction effects

Interpretation of interaction effects

Interpretation of interaction effects

Continuous by continuous interactions

Continuous by continuous interactions

Third order interactions

Assessing quality of model fit

Assessing the quality of the fit

Obtaining R2 in R

Obtaining R2 in R

R2

R2

R2

R2 - first principles

Adjusted R2

Adjusted R2

Adjusted R2

Comparing models

In pursuit of Occam's Razor

Click for PDF of slides

Help

Obtaining in R

Obtaining in R

- first principles

Adjusted

Adjusted

Adjusted