A private repository has already been created for you.
Go to the course organization on GitHub.
In addition to your private individual repositories, you should now see a repo with the prefix hw-03-. Go to that repository.
Clone the repo and start a new project in RStudio. See Lab 01 for more details about how to clone a repo and start a new project.
In 1975 the Biomedical Department of the Highway Safety Research Insitute (HSRI) under the sponsorship of the Consumer Product Safety Commission completed an anthropometric survey of infants and children.
The goal of the study was to provide child measurement data for design, hazard assessment, and standards for products including car seats, restraints, children’s furniture, playground equipment, toys, bicycles, and more. See here and here for more information (“Anthropometry of Infants, Children, and Youths to Age 18 for Product Safety Design”, 1977).
The file kids.csv
contains a subset of these data for 1000 children between the ages of 8 and 16 and includes the following variables:
height
: height of the child (centimeters)sex
: sex of the child (1 = male, 2 = female)age
: age of the child (months)Mutate the sex
variable to be a factor with levels “male” and “female” and set “male” to be the reference level.
Fit a linear model with height
as the response and sex
and age
as predictors. Display the model output in tidy format and write out the linear model.
Now let’s interpret the terms in the model.
age
and sex
in the context of the problem.Construct a 90% confidence interval for \(\beta_{\text{age}}\) and interpret the interval in the context of the problem.
Report the value of \(R^2\) and interpret it in the context of the problem.
What is the predicted height for a fourteen-and-a-half-year-old male?
Create an effective, well-labeled scatterplot of height versus age with the points colored by sex. Include a visualization of your linear model from Exercise #2 on this scatterplot.
Fit a linear model with height
as the response and sex
, age
, and their interaction as predictors. Display the model output in tidy format and write out the linear model.
Write out the model equations for males and females.
Is there evidence that the relationship between age and height depends on sex? Answer comprehensively using a formal hypothesis test.
Create an effective well-labeled scatterplot of height versus age with points colored by sex. Visualize your linear model from Exercise #10 on this scatterplot.
Create appropriate and effective diagnostic plots to check the conditions for multiple linear regression (same conditions as for simple linear regression). For each plot, provide a one or two sentence comment describing which condition(s) you are checking, what you observe, and whether or not the condition is met.
Briefly describe any potential limitations of this model beyond your comments from Exercise #12.
Knit to PDF to create a PDF document. Stage and commit all remaining changes, and push your work to GitHub. Make sure all files are updated on your GitHub repo.
Please only upload your PDF document to Gradescope. Before you submit the uploaded document, mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages. Make sure to associate the “Overall” section with the first page.