Lab 06: Hypothesis Testing

due Wed, Oct 7 at 11:59p

Introduction and Data

The goal of today’s lab is to practice statistical inference using both simulation procedures and the Central Limit Theorem. Keep the team workflow in mind when completing the assignment and keep in mind that all lab members must meaningfully contribute to all lab components.

The data for today’s lab may be found by cloning your repository available at the class GitHub repository.

The dataset is adapted from Little et al. (2007), and contains voice measurements from individuals both with and without Parkinson’s Disease (PD), a progressive neurological disorder that affects the motor system. The aim of Little et al.’s study was to examine whether they could diagnose PD by examining the spectral (sound-wave) properties of patients’ voices.

147 measurements were taken from patients with PD, and 48 measurements were taken from healthy controls. For the purposes of this lab, you may assume that measurements are representative of the underlying populations (PD vs. healthy).

The variables in the dataset are as follows:

You may load in the data with the following code, where ____ should be replaced by a meaningful name of your choosing:

library(tidyverse)
____ <- read_csv("data/parkinsons.csv")

Exercises


NOTES

Is there enough evidence to suggest that HNR in the voice recordings of the healthy volunteers is significantly different from 25 at the \(\alpha\) = 0.05 significance level? Conduct this hypothesis test using a simulation method.

  1. Write out the null and alternative hypotheses for this question in both words and symbols.

  2. Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis (called the rejection region). Does our observed sample mean lie in this rejection region?

  3. What is your p-value, decision, and conclusion in context of the research question?

  4. Given your conclusion in Exercise 3, which type of error could you possibly have made? What would making such an error mean in the context of the research question?

Researchers suspect that patients with PD are less able to control their vocal muscles, and thus may have a different HNR (tonal component to noise ratio) compared to healthy volunteers. Thus, they are interested in whether the mean HNR in voice recordings among patients with PD is statistically significantly different from 24.7 at the 0.05 significance level. Conduct this hypothesis test using the Central Limit Theorem.

  1. Write out the null and alternative hypotheses for this question in both words and symbols.

Hint: Be careful about which distribution you use to anwer this question.

  1. What is the distribution of the test statistic under the null hypothesis, the test statistic itself, the p-value, decision, and conclusion in context of the research question?

  2. Given your conclusion in Exercise 6, which type of error could you possibly have made? What would making such an error mean in the context of the research question?

  3. Would you expect a 95% confidence interval computed using the same data to contain 24.7 or not? Explain.

Suppose we are now interested in testing whether a correlation exists between voice jitter and voice shimmer among healthy volunteers. Test whether the correlation between these two values is non-zero at the \(\alpha\) = 0.01 level. Conduct this hypothesis test using a simulation method.

As an aside, correlation is given in symbols by \(\rho\).

  1. Write out the null and alternative hypotheses in words.

  2. Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis. Does the observed sample correlation lie in this rejection region?

  3. What is your p-value, decision, and conclusion in context of the research question?

  4. What is the probability you’ve made a Type 1 error? If you cannot tell for sure, explain why. Similarly, what is the probability you’ve made a Type 2 error? If you cannot tell for sure, explain why.

Submission

Knit to PDF to create a PDF document. Knit and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.

Please only upload your PDF document to Gradescope. Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.