# Lab 3B: Descriptive Statistics (Hard)

1. Set the working directory

2. Read the mortality rates CSV file into a data frame called “rates”

### Problem 1: Analyze One Categorical Variable

1. Create a frequency table for Gender
## Female   Male
##   3005   3158

Question: Which gender has more mortality rate records?

### Problem 2: Analyze One Numeric Variable

1. Get the mean Rate
## [1] 0.01109453
1. Get the median Rate
## [1] 0.0033
1. Get the minimum Rate
## [1] 0
1. Get the maximum Rate
## [1] 0.1369
1. Get the quantiles
##     0%    25%    50%    75%   100%
## 0.0000 0.0008 0.0033 0.0124 0.1369
1. Get the interquartile range
## [1] 0.0116
1. Get the varience
## [1] 0.0003195681
1. Get the standard deviation
## [1] 0.01787647
## Warning: package 'moments' was built under R version 3.4.1
1. Get the skewness of Rate
## [1] 2.529177
1. Get the kertosis of Rate
## [1] 9.852523
1. Summarize a numeric variable
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## 0.00000 0.00080 0.00330 0.01109 0.01240 0.13690

Question: What do you imagine this distribution looks like?

### Problem 3: Analyze Two Categorical Variables

1. Create a contingency table for Age Group and Gender
##
##                Female Male
##   < 1 year        173  177
##   1-4 years       134  136
##   10-14 years     119  120
##   15-19 years     142  161
##   20-24 years     149  174
##   25-29 years     151  174
##   30-34 years     156  173
##   35-39 years     163  182
##   40-44 years     174  185
##   45-49 years     182  190
##   5-9 years       117  120
##   50-54 years     188  192
##   55-59 years     187  195
##   60-64 years     192  197
##   65-69 years     191  196
##   70-74 years     192  197
##   75-79 years     198  195
##   80-84 years     197  194

Question: Which age group and gender has the most mortality rate records?

### Problem 4: Analyze Two Numeric Variables

1. Get the correlation coefficient of Age Group (Low) and Rate
## [1] 0.6582571
1. Get the correlation coefficient of Population and Rate
## [1] -0.1339946

Question: Which is a stronger correlation? Why?

### Problem 5: Analyze a Numeric Variable Grouped by a Categorical Variable

1. Get average Rate by Gender
##      Female        Male
## 0.009318636 0.012784389

Question: Who has the higher mortality rate?

### Problem 6: Analyze Many Variables

1. Create a correlation matrix for the last five columns
##                   Deaths Population       Rate Age.Group.Low
## Deaths         1.0000000  0.3576057  0.3198476     0.2722092
## Population     0.3576057  1.0000000 -0.1339946    -0.1265807
## Rate           0.3198476 -0.1339946  1.0000000     0.6582571
## Age.Group.Low  0.2722092 -0.1265807  0.6582571     1.0000000
## Age.Group.High 0.2704145 -0.1226291  0.6516676     0.9997010
##                Age.Group.High
## Deaths              0.2704145
## Population         -0.1226291
## Rate                0.6516676
## Age.Group.Low       0.9997010
## Age.Group.High      1.0000000
1. Summarize the entire rates table
##             State        State.Code       Gender     Gender.Code
##  California    : 144   Min.   : 1.00   Female:3005   F:3005
##  Minnesota     : 144   1st Qu.:16.00   Male  :3158   M:3158
##  Washington    : 144   Median :28.00
##  Arizona       : 143   Mean   :28.49
##  North Carolina: 143   3rd Qu.:41.00
##  Florida       : 139   Max.   :56.00
##  (Other)       :5306
##         Age.Group    Five.Year.Age.Groups.Code
##  75-79 years : 393   75-79  : 393
##  80-84 years : 391   80-84  : 391
##  60-64 years : 389   60-64  : 389
##  70-74 years : 389   70-74  : 389
##  65-69 years : 387   65-69  : 387
##  55-59 years : 382   55-59  : 382
##  (Other)     :3832   (Other):3832
##                                Race          Deaths
##  American Indian or Alaska Native:1274   Min.   :    10
##  Asian or Pacific Islander       :1423   1st Qu.:    55
##  Black or African American       :1634   Median :   299
##  White                           :1832   Mean   :  4516
##                                          3rd Qu.:  2206
##                                          Max.   :247582
##
##    Population            Rate         Age.Group.Low   Age.Group.High
##  Min.   :     186   Min.   :0.00000   Min.   : 0.00   Min.   : 1.00
##  1st Qu.:   23203   1st Qu.:0.00080   1st Qu.:20.00   1st Qu.:24.00
##  Median :  148791   Median :0.00330   Median :45.00   Median :49.00
##  Mean   :  761320   Mean   :0.01109   Mean   :40.97   Mean   :44.76
##  3rd Qu.:  788086   3rd Qu.:0.01240   3rd Qu.:65.00   3rd Qu.:69.00
##  Max.   :17038412   Max.   :0.13690   Max.   :80.00   Max.   :84.00
##