See STA 199 Teams to see your team assignment. This will be your team for labs and the final project.
Before you get started on the lab assignment, we will take a few minutes to help you develop a plan for working as a team.
✅ Come up with a team name. I encourage you to be creative! Your TA will get your team name by the end of lab.
✅ Identify something everyone on the team has in common that’s not necessarily in common with everyone else in the class.
✅ Fill out the team agreement. This will help you figure out a plan for working together during labs and outside of lab times. You can find the team agreement in the GitHub repo team-agreement-[github_team_name].
In January 2017, Buzzfeed published an article titled “These Nobel Prize Winners Show Why Immigration Is So Important For American Science”. In the article they explore where many Nobel laureates in the sciences were born and where they lived when they won their prize.
In this lab we will work with the data from this article to recreate some of their visualizations as well as explore new questions.
The learning goals of this lab are:
A repository has already been created for you and your teammates. Everyone in your team has access to the same repo.
Go to course organization on GitHub.
In addition to your private individual repositories, you should now see a repo named lab-03-nobel-[github_team_name]. Go to that repository.
Each person on the team should clone the repository and open a new project in RStudio. Do not make any changes to the .Rmd file until the instructions tell you do to so.
Assign each person on your team a number 1 through 4. For teams of three, Member 1 can take on the role of Member 4.
The following exercises must be done in order. Only one person should type in the .Rmd file and push updates at a time. When it is not your turn to type, you should still share ideas and contribute to the team’s discussion.
Team Member 1: Change the author to your team name and include each team member’s name in the author
field of the YAML in the following format. Team Name: Member 1, Member 2, Member 3, Member 4
.
We’ll use the tidyverse package for this analysis. Run the following code in the Console to load this package.
The dataset for this assignment can be found as a csv file in the data
folder of your repository. You can read it in using the following.
The variable descriptions are as follows:
id
: ID numberfirstname
: First name of laureatesurname
: Surnameyear
: Year prize woncategory
: Category of prizeaffiliation
: Affiliation of laureatecity
: City of laureate in prize yearcountry
: Country of laureate in prize yearborn_date
: Birth date of laureatedied_date
: Death date of laureategender
: Gender of laureateborn_city
: City where laureate was bornborn_country
: Country where laureate was bornborn_country_code
: Code of country where laureate was borndied_city
: City where laureate dieddied_country
: Country where laureate dieddied_country_code
: Code of country where laureate diedoverall_motivation
: Overall motivation for recognitionshare
: Number of other winners award is shared withmotivation
: Motivation for recognitionIn a few cases the name of the city/country changed after prize was given (e.g. in 1975 Bosnia and Herzegovina was part of the Socialist Federal Republic of Yugoslavia). In these cases the variables below reflect a different name than their counterparts without the suffix _original
.
born_country_original
: Original country where laureate was bornborn_city_original
: Original city where laureate was borndied_country_original
: Original country where laureate dieddied_city_original
: Original city where laureate diedcity_original
: Original city where laureate lived at the time of winning the awardcountry_original
: Original country where laureate lived at the time of winning the awardTeam Member 1: Type the team’s responses to exercises 1 and 2.
There are some observations in this dataset that we will exclude from our analysis to match the Buzzfeed results.
nobel_living
that filters forcountry
is available"org"
as their gender
)died_date
is NA
)Confirm that once you have filtered for these characteristics you are left with a data frame with 228 observations.
✅ ⬆️ Team Member 1: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
You can pull by clicking the blue down arrow in the Git pane in RStudio. Once you click to pull, you will see the updates your team member pushed to GitHub in your RStudio project.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 1 and 2.
Team Member 2: It’s your turn! Type the team’s response to exercise 3.
… says the Buzzfeed article. Let’s see if that’s true.
First, we’ll create a new variable to identify whether the laureate was in the US when they won their prize. We’ll use the mutate()
function for this. The following pipeline mutates the nobel_living
data frame by adding a new variable called country_us
. We use an if/else statement to create this variable. The first argument in the if_else()
function is the condition we’re testing for. If country
is equal to "USA"
, we set country_us
to "USA"
. If not, we set the country_us
to "Other"
.
Note that we can achieve the same result using the fct_other()
function (i.e. with country_us = fct_other(country, “USA”)
).
Next, we will limit our analysis to only the following categories: Physics, Medicine, Chemistry, and Economics.
nobel_living_science <- nobel_living %>%
filter(category %in% c("Physics", "Medicine", "Chemistry", "Economics"))
You will work with the nobel_living_science
data frame you created above for the remainder of the lab. This means you’ll need to define this data frame in your R Markdown document.
Hint: You can change the orientation of the bars using the coord_flip()
function in ggplot2. Click here to read more about the function.
✅ ⬆️ Team Member 2: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.*
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercise 3.
Team Member 3: It’s your turn! Type the team’s response to exercises 4 - 5.
Hint: You should be able to borrow from code you used earlier to create the country_us
variable.
Create a new variable called born_country_us
that has the value "USA"
if the laureate is born in the US, and "Other"
otherwise. Be sure to save the variable to the nobel_living_science
data frame.
Add a second variable to your visualization from Exercise 3 based on whether the laureate was born in the US or not. Your final visualization should contain a facet for each category, within each facet a bar for whether they won the award in the US or not, and within each bar whether they were born in the US or not. Based on your visualization, do the data appear to support Buzzfeed’s claim? Explain your reasoning in 1-2 sentences.
✅ ⬆️ Team Member 3: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the responses to exercises 4 and 5.
Team Member 4: It’s your turn! Type the team’s response to exercise 6.
Note that your bar plot won’t exactly match the one from the Buzzfeed article. This is likely because the data has been updated since the article was published.
count
function) for their birth country (born_country
), and arrange the resulting data frame in descending order of number of observations for each country.✅ ⬆️ Team Member 4: Knit, commit and push your changes to GitHub with an appropriate commit message again. Make sure to commit and push all changed files so that your Git pane is cleared up afterwards.
All other team members: Pull to get the updated documents from GitHub. Click on the .Rmd file, and you should see the team’s completed lab!
Go back through your write up to make sure you followed the coding style guidelines we discussed in class (e.g. no long lines of code).
Also, make sure all of your R chunks are properly labeled, and your figures are reasonably sized.
Team Member 2: Make any edits as needed. Then knit, commit, and push the updated documents to GitHub if you made any changes.
All other team members can click to pull the finalized document.
Team Member 3: Upload the team’s PDF to Gradescope. Be sure to include every team member’s name in the Gradescope submission Associate the “Overall” graded section with the first page of your PDF, and mark where each answer is to the exercises. If any answer spans multiple pages, then mark all pages.
There should only be one submission per team on Gradescope.
The plots in the Buzzfeed article are called waffle plots. You can find the code used for making these plots in Buzzfeed’s GitHub repo (yes, they have one!) here. You’re not expected to recreate them as part of your assignment, but you’re welcomed to do so for fun! © 2020 GitHub, Inc.
This lab was adapted from Data Science in a Box.