Announcements

Questions from video?

Resource for full examples in infer

https://infer.netlify.app/articles/observed_stat_examples.html

From last class

library(tidyverse)
library(infer)
asheville <- read_csv("data/asheville.csv")
## Parsed with column specification:
## cols(
##   ppg = col_double()
## )

Suppose you are interested in whether the mean price per guest per night is actually less than $80. Conduct a hypothesis test to assess this claim.

Hypotheses

\(H_0\): The mean price per guest per night is $80

\(H_a\): The mean price per guest per night is less than $80

\(H_0: \mu = 80\)

\(H_a: \mu < 80\)

Simulate null distribution

set.seed(092320)
null_dist <- asheville %>%
  specify(response = ppg) %>%
  hypothesize(null = "point", mu = 80) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean")
mean_ppg <- asheville %>%
  summarise(mean_ppg = mean(ppg)) %>%
  pull()

Visualize Null distribution using ggplot

ggplot(data = null_dist, aes(x = stat)) +
  geom_histogram(alpha = 0.8) + 
  geom_vline(xintercept = mean_ppg, color = "red")

Visualize null distribution using infer

visualize(null_dist) +
  shade_p_value(obs_stat = mean_ppg, direction = "less")

Calculate p-value

null_dist %>%
  filter(stat <= mean_ppg) %>%
  summarise(p_value = n() / nrow(null_dist))
## # A tibble: 1 x 1
##   p_value
##     <dbl>
## 1   0.316

Conclusion

The p-value 0.312 is greater than \(\alpha = 0.05\), so we fail to reject the null hypothesis. The data do not provide sufficient evidence that the mean price per night is less than $80.

Clone a repo + start a new project

Clone the ae-12 repo on GitHub and start a new project in RStudio. Be sure to configure git in the RStudio console, so you can so you can push your results back up to GitHub.

Exercise 1

Suppose you are interested in whether at least half of the Airbnb listings in Asheville are more than $50 per guest per night. What would be your null and alternative hypotheses?

Option 1

  • Null: The median price per guest per night is 50
  • Alternative: The median price per guest per night is > 50

Option 2

  • Null: The proportion of Airbnbs less than $50 ppg is 0.5
  • Alternative: The proportion of Airbnbs less than $50 ppg is less than 0.5

Exercise 2

Simulate the null distribution to test your hypotheses. You can use 100 reps for the in-class exercise.

Option 1

set.seed(1234)
null_dist_opt1 <- asheville %>%
  specify(response = ppg) %>%
  hypothesize(null = "point", med = 50) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "median")

Option 2

#create variable to track price
asheville <- asheville %>%
  mutate(less_50_ppg = if_else(ppg < 50, "Yes", "No"))
set.seed(1234)
null_dist_opt2 <- asheville %>%
  specify(response = less_50_ppg, success = "Yes") %>%
  hypothesize(null = "point", p = 0.5) %>%
  generate(reps = 1000, type = "simulate") %>%
  calculate(stat = "prop")

Exercise 3

What was your p-value? What decision do you make with respect to your hypotheses, and what conclusion do you make in the context of the research problem?

Option 1

#calculate observed statistic
obs_med <- asheville %>%
  summarise(med_ppg = median(ppg)) %>%
  pull()

obs_med
## [1] 62.5
null_dist_opt1 %>%
  filter(stat >= obs_med) %>%
  summarise(p_val = n() / nrow(null_dist_opt1))
## # A tibble: 1 x 1
##   p_val
##   <dbl>
## 1  0.06

The p-value is 0.06 which is greater than \(\alpha = 0.05\), so we fail to reject the null. There is not sufficient evidence to conclude that at least half of the hotels have a price per guest per night greater than 50.

Option 2

#calculate observed statistic
obs_prop <- asheville %>%
  count(less_50_ppg) %>%
  mutate(prop = n/sum(n)) %>%
  filter(less_50_ppg == "Yes") %>%
  pull()

obs_prop
## [1] 0.38
null_dist_opt2 %>%
  filter(stat <=  obs_prop) %>%
  summarise(p_val = n() / nrow(null_dist_opt2))
## # A tibble: 1 x 1
##   p_val
##   <dbl>
## 1  0.06

The p-value of 0.064 is greater than \(\alpha = 0.05\), so we fail to reject the null. There is not sufficient evidence to conclude that at least half of the hotels have a price per guest per night greater than 50.

Exercise 4

Suppose you are interested in whether the proportion of listings with a price per guest per night greater than $50 is 0.5. How would your null and alternative hypotheses change in this case? Carry out the appropriate hypothesis test, and report your p-value, decision, and conclusion in context of the research problem.

Null: The proportion of Airbnbs greater than $50 ppg is 0.5 Alternative: The proportion of Airbnbs greater than $50 ppg is not equal to 0.5

set.seed(1234)
null_dist_ex4 <- asheville %>%
  specify(response = less_50_ppg, success = "No") %>%
  hypothesize(null = "point", p = 0.5) %>%
  generate(reps = 1000, type = "simulate") %>%
  calculate(stat = "prop")
#calculate observed statistic
obs_prop_ex4 <- asheville %>%
  count(less_50_ppg) %>%
  mutate(prop = n/sum(n)) %>%
  filter(less_50_ppg == "No") %>%
  pull()

obs_prop_ex4
## [1] 0.62
null_dist_ex4 %>%
  filter(stat >= obs_prop_ex4 | stat <= 0.5 - (obs_prop_ex4 - 0.5)) %>%
  summarise(p_val = n() / nrow(null_dist_ex4))
## # A tibble: 1 x 1
##   p_val
##   <dbl>
## 1 0.115

The p-value of 0.114 is greater than 0.05, so we fail to reject the null hypothesis. The data do not provide sufficient evidence to refute the claim that the proportion of Airbnbs with price per guest per night greater than $50 is different from 0.5.