Lab 3A: Descriptive Statistics (Easy)
- Load the CSV data files.
movies <- read.csv("Movies.csv")
genres <- read.csv("Genres.csv")
- Peek at the data.
head(movies)
## Title Year Rating Runtime Critic.Score Box.Office
## 1 The Whole Nine Yards 2000 R 98 45 57.3
## 2 Gladiator 2000 R 155 76 187.3
## 3 Cirque du Soleil 2000 G 39 45 13.4
## 4 Dinosaur 2000 PG 82 65 135.6
## 5 Big Momma's House 2000 PG-13 99 30 0.5
## 6 Gone in Sixty Seconds 2000 PG-13 118 24 101.0
head(genres)
## Title Genre Year Rating Runtime Critic.Score Box.Office
## 1 The Whole Nine Yards Crime 2000 R 98 45 57.3
## 2 The Whole Nine Yards Comedy 2000 R 98 45 57.3
## 3 Cirque du Soleil Drama 2000 G 39 45 13.4
## 4 Cirque du Soleil Family 2000 G 39 45 13.4
## 5 Gladiator Action 2000 R 155 76 187.3
## 6 Gladiator Drama 2000 R 155 76 187.3
Analyzing One Categorical Variable
- Create a frequency table of observations of movies by rating category.
table(movies$Rating)
##
## G PG PG-13 R
## 93 497 1225 1423
Analyzing One Numeric Variable
- Analyze measures of central tendancy (i.e. location) for movie runtime.
mean(movies$Runtime)
## [1] 104.4052
median(movies$Runtime)
## [1] 101
- Analyze measures dispersion (i.e. spread) for movie runtime.
min(movies$Runtime)
## [1] 38
max(movies$Runtime)
## [1] 219
range(movies$Runtime)
## [1] 38 219
diff(range(movies$Runtime))
## [1] 181
quantile(movies$Runtime)
## 0% 25% 50% 75% 100%
## 38 93 101 113 219
quantile(movies$Runtime, 0.95)
## 95%
## 135
IQR(movies$Runtime)
## [1] 20
var(movies$Runtime)
## [1] 284.4487
sd(movies$Runtime)
## [1] 16.86561
- Analyze measures of the shape of movie runtime.
library(moments)
skewness(movies$Runtime)
## [1] 1.007788
kurtosis(movies$Runtime)
## [1] 5.956355
- Summarize a quantitative variable (i.e. movie runtime).
summary(movies$Runtime)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 38.0 93.0 101.0 104.4 113.0 219.0
Analyzing Two Categorical Variables
- Create a contingency table containing the frequency of observations of movies by genre and rating.
table(genres$Genre, genres$Rating)
##
## G PG PG-13 R
## Action 2 70 311 229
## Adventure 44 179 209 64
## Animation 43 111 8 6
## Biography 0 27 73 93
## Comedy 45 258 472 506
## Crime 0 9 141 328
## Documentary 27 73 78 65
## Drama 12 136 586 836
## Family 38 181 10 1
## Fantasy 6 51 115 43
## History 3 12 36 35
## Horror 0 3 71 195
## Music 5 31 81 59
## Musical 0 11 20 6
## Mystery 0 6 102 136
## Sci-Fi 0 7 119 72
## Sport 4 36 62 19
## Thriller 0 2 167 324
## War 1 0 19 31
## Western 0 4 6 10
Analyzing Two Numeric Variables
- Analyze the correlation coefficient for runtime and box office.
cor(movies$Runtime, movies$Box.Office)
## [1] 0.347748
- Analyze the correlation coefficient for runtime and box office.
cor(movies$Critic.Score, movies$Box.Office)
## [1] 0.1608324
Analyzing a Numeric Variable Grouped by a Categorical Variable
- Create a table of aggregate numeric values (i.e average box office revenue) grouped by a categorical variable (i.e. rating category).
tapply(movies$Box.Office, movies$Rating, mean)
## G PG PG-13 R
## 55.47561 56.40439 54.56134 22.26118
- Create a table of average box office revenue grouped by a genre.
tapply(genres$Box.Office, genres$Genre, mean)
## Action Adventure Animation Biography Comedy Crime
## 76.530806 101.745110 96.603311 26.500308 40.860973 34.320142
## Documentary Drama Family Fantasy History Horror
## 6.268575 24.740296 68.339200 93.251211 24.181583 27.932895
## Music Musical Mystery Sci-Fi Sport Thriller
## 21.978918 37.172776 40.328661 86.874763 27.739240 38.523364
## War Western
## 26.474298 36.146105
Analyzing Many Variables
- Create a correlation matrix
cor(movies[, 4:6])
## Runtime Critic.Score Box.Office
## Runtime 1.0000000 0.1881713 0.3477480
## Critic.Score 0.1881713 1.0000000 0.1608324
## Box.Office 0.3477480 0.1608324 1.0000000
- Summarize an entire table.
summary(movies)
## Title Year Rating Runtime
## Camp : 2 Min. :2000 G : 93 Min. : 38.0
## Frozen : 2 1st Qu.:2004 PG : 497 1st Qu.: 93.0
## The Other Woman : 2 Median :2008 PG-13:1225 Median :101.0
## (500) Days of Summer: 1 Mean :2008 R :1423 Mean :104.4
## (Untitled) : 1 3rd Qu.:2011 3rd Qu.:113.0
## 10 Items or Less : 1 Max. :2015 Max. :219.0
## (Other) :3229
## Critic.Score Box.Office
## Min. : 0.00 Min. : 0.0002
## 1st Qu.: 26.00 1st Qu.: 1.0000
## Median : 49.00 Median : 16.1000
## Mean : 49.68 Mean : 40.6756
## 3rd Qu.: 74.00 3rd Qu.: 51.4750
## Max. :100.00 Max. :760.5000
##