Introduction and Data

The goal of today’s lab is to practice statistical inference using both simulation procedures and the Central Limit Theorem. Keep the team workflow in mind when completing the assignment and keep in mind that all lab members must meaningfully contribute to all lab components.

The data for today’s lab may be found by cloning your repository available at the class GitHub repository.

The dataset is adapted from Little et al. (2007), and contains voice measurements from individuals both with and without Parkinson’s Disease (PD), a progressive neurological disorder that affects the motor system. The aim of Little et al.’s study was to examine whether they could diagnose PD by examining the spectral (sound-wave) properties of patients’ voices.

147 measurements were taken from patients with PD, and 48 measurements were taken from healthy controls. For the purposes of this lab, you may assume that measurements are representative of the underlying populations (PD vs. healthy).

The variables in the dataset are as follows:

clip: ID of the recording number
jitter: a measure of variation in fundamental frequency
shimmer: a measure of variation in amplitude
hnr: a ratio of total components vs. noise in the voice recording
status: PD vs. Healthy
avg.f.q: 1, 2, or 3, corresponding to average vocal fundamental frequency (1 = low, 2 = mid, 3 = high)

You may load in the data with the following code, where ____ should be replaced by a meaningful name of your choosing:

library(tidyverse)
____ <- read_csv("data/parkinsons.csv")

Exercises

NOTES

To write , type in $\mu$ in the narrative. To write , type in $\alpha$ in your narrative. To write , type in $\neq$ in your narrative.
At that start of each exercise that requires simulation, set a random seed equal to the Exercise number in the R chunk.

Is there enough evidence to suggest that HNR in the voice recordings of the healthy volunteers is significantly different from 25 at the = 0.05 significance level? Conduct this hypothesis test using a simulation method.

Write out the null and alternative hypotheses for this question in both words and symbols.
Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis (called the rejection region). Does our observed sample mean lie in this rejection region?
What is your p-value, decision, and conclusion in context of the research question?
Given your conclusion in Exercise 3, which type of error could you possibly have made? What would making such an error mean in the context of the research question?

Researchers suspect that patients with PD are less able to control their vocal muscles, and thus may have a different HNR (tonal component to noise ratio) compared to healthy volunteers. Thus, they are interested in whether the mean HNR in voice recordings among patients with PD is statistically significantly different from 24.7 at the 0.05 significance level. Conduct this hypothesis test using the Central Limit Theorem.

Write out the null and alternative hypotheses for this question in both words and symbols.

⊕Hint: Be careful about which distribution you use to anwer this question.

What is the distribution of the test statistic under the null hypothesis, the test statistic itself, the p-value, decision, and conclusion in context of the research question?
Given your conclusion in Exercise 6, which type of error could you possibly have made? What would making such an error mean in the context of the research question?
Would you expect a 95% confidence interval computed using the same data to contain 24.7 or not? Explain.

Suppose we are now interested in testing whether a correlation exists between voice jitter and voice shimmer among healthy volunteers. Test whether the correlation between these two values is non-zero at the = 0.01 level. Conduct this hypothesis test using a simulation method.

⊕As an aside, correlation is given in symbols by .

Write out the null and alternative hypotheses in words.
Display a visualization of your simulated null distribution, and describe the values that would cause you to reject your null hypothesis. Does the observed sample correlation lie in this rejection region?
What is your p-value, decision, and conclusion in context of the research question?
What is the probability you’ve made a Type 1 error? If you cannot tell for sure, explain why. Similarly, what is the probability you’ve made a Type 2 error? If you cannot tell for sure, explain why.

Lab 06: Hypothesis Testing

due Wed, Oct 7 at 11:59p

Introduction and Data

Exercises

Submission