Set the working directory
Load policies from the CSV file
Create a plot of height (Centimeters)
Get the mean
Print the mean
## [1] 169.632
Get the standard deviation
Print the standard devition
## [1] 9.567358
Create points along x-axis of the distribution
Compute the y-axis height of each point
Add the distribution to the plot
plot(density(policies$Centimeters))
lines(
x = distributionX,
y = distributionY,
col = "red")
Generate new values from the model
Add distribution of generated values to plot
Get the mean of the generated values
## [1] 169.2888
## [1] 9.590691
Question: What would happen to the mean and standard deviation if we increase n to 1,000,000?
Create a scatterplot of height (Centimeters) vs weight (Kilograms)
Create a linear regression model
Draw the linear regression model on the plot
Get the correlation coefficient
## [1] 0.2467215
##
## Call:
## lm(formula = Kilograms ~ Centimeters, data = policies)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.914 -12.678 -0.038 12.247 35.962
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.12801 6.15165 1.809 0.0706 .
## Centimeters 0.41204 0.03621 11.380 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.49 on 1998 degrees of freedom
## Multiple R-squared: 0.06087, Adjusted R-squared: 0.0604
## F-statistic: 129.5 on 1 and 1998 DF, p-value: < 2.2e-16
Create a table of unseen heights: 150, 175, 200
Predict new unknown weights based new unseen heights
## 1 2 3
## 72.93363 83.23457 93.53550
Question are their any problems with this linear regression model?
Create a scatterplot of Age vs Rate
Get the correlation coefficient
## [1] 0.7387237
Question: Why is a linear model not a good model for these data?