shape:
center: mean (mean
), median (median
), mode (not always useful)
spread: range (range
), standard deviation (sd
), inter-quartile range (IQR
)
outliers: observations outside of the usual pattern
starwars
## # A tibble: 87 x 14## name height mass hair_color skin_color eye_color birth_year sex gender## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> ## 1 Luke… 172 77 other fair blue 19 male mascu…## 2 C-3PO 167 75 none gold yellow 112 none mascu…## 3 R2-D2 96 32 none white, bl… red 33 none mascu…## 4 Dart… 202 136 none white yellow 41.9 male mascu…## 5 Leia… 150 49 brown light brown 19 fema… femin…## 6 Owen… 178 120 brown light blue 52 male mascu…## 7 Beru… 165 75 brown light blue 47 fema… femin…## 8 R5-D4 97 32 none white, red red NA none mascu…## 9 Bigg… 183 84 black light brown 24 male mascu…## 10 Obi-… 182 77 other fair blue-gray 57 male mascu…## # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,## # films <list>, vehicles <list>, starships <list>
ggplot(data = starwars, mapping = aes(x = height)) + geom_histogram(binwidth = 10)
ggplot(data = starwars, mapping = aes(x = height)) + geom_density()
ggplot(data = starwars, mapping = aes(y = height, x = hair_color)) + geom_boxplot()
ggplot(data = starwars, mapping = aes(x = hair_color, fill = eye_color2)) + geom_bar()
ggplot(data = starwars, mapping = aes(x = hair_color, fill = eye_color2)) + geom_bar(position = "fill") + labs(y = "proportion")
Which plot is more useful for visualizing the relationship between hair color and eye color? Why?
Anything that converts data sources into a visual representation
datasaurus_dozen
Below is an excerpt from the datasaurus_dozen
dataset:
## # A tibble: 142 x 8## away_x away_y bullseye_x bullseye_y circle_x circle_y dino_x dino_y## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 32.3 61.4 51.2 83.3 56.0 79.3 55.4 97.2## 2 53.4 26.2 59.0 85.5 50.0 79.0 51.5 96.0## 3 63.9 30.8 51.9 85.8 51.3 82.4 46.2 94.5## 4 70.3 82.5 48.2 85.0 51.2 79.2 42.8 91.4## 5 34.1 45.7 41.7 84.0 44.4 78.2 40.8 88.3## 6 67.7 37.1 37.9 82.6 45.0 77.9 38.7 84.9## 7 53.3 97.5 39.5 80.8 48.6 78.8 35.6 79.9## 8 63.5 25.1 39.6 82.7 42.1 76.9 33.1 77.6## 9 68.0 81.0 34.8 80.0 41.0 76.4 29.0 74.5## 10 67.4 29.7 27.6 72.8 34.6 72.7 26.2 71.4## # … with 132 more rows
datasaurus_dozen %>% group_by(dataset) %>% summarise(r = cor(x, y))
## # A tibble: 13 x 2## dataset r## <chr> <dbl>## 1 away -0.0641## 2 bullseye -0.0686## 3 circle -0.0683## 4 dino -0.0645## 5 dots -0.0603## 6 h_lines -0.0617## 7 high_lines -0.0685## 8 slant_down -0.0690## 9 slant_up -0.0686## 10 star -0.0630## 11 v_lines -0.0694## 12 wide_lines -0.0666## 13 x_shape -0.0656
How similar do the relationships between x
and y
look based on the plots? Based on the summary statistics?
library(Tmisc)quartet
## set x y## 1 I 10 8.04## 2 I 8 6.95## 3 I 13 7.58## 4 I 9 8.81## 5 I 11 8.33## 6 I 14 9.96## 7 I 6 7.24## 8 I 4 4.26## 9 I 12 10.84## 10 I 7 4.82## 11 I 5 5.68## 12 II 10 9.14## 13 II 8 8.14## 14 II 13 8.74## 15 II 9 8.77## 16 II 11 9.26## 17 II 14 8.10## 18 II 6 6.13## 19 II 4 3.10## 20 II 12 9.13## 21 II 7 7.26## 22 II 5 4.74
## set x y## 23 III 10 7.46## 24 III 8 6.77## 25 III 13 12.74## 26 III 9 7.11## 27 III 11 7.81## 28 III 14 8.84## 29 III 6 6.08## 30 III 4 5.39## 31 III 12 8.15## 32 III 7 6.42## 33 III 5 5.73## 34 IV 8 6.58## 35 IV 8 5.76## 36 IV 8 7.71## 37 IV 8 8.84## 38 IV 8 8.47## 39 IV 8 7.04## 40 IV 8 5.25## 41 IV 19 12.50## 42 IV 8 5.56## 43 IV 8 7.91## 44 IV 8 6.89
quartet %>% group_by(set) %>% summarise( mean_x = mean(x), mean_y = mean(y), sd_x = sd(x), sd_y = sd(y), r = cor(x, y) )
## # A tibble: 4 x 6## set mean_x mean_y sd_x sd_y r## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>## 1 I 9 7.50 3.32 2.03 0.816## 2 II 9 7.50 3.32 2.03 0.816## 3 III 9 7.5 3.32 2.03 0.816## 4 IV 9 7.50 3.32 2.03 0.817
ggplot(quartet, aes(x = x, y = y)) + geom_point() + facet_wrap(~ set, ncol = 4)
ggplot(student_survey, aes(x = first_kiss)) + geom_histogram(binwidth = 1) + labs(title = "How old were you when you had your first kiss?")
ggplot(student_survey, aes(x = fb_visits_per_day)) + geom_dotplot(binwidth = 5, dotsize = 0.4) + labs(title = "How many times do you go on Facebook per day?")
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |