CHAPTER 20
Beginner
Correlation and Regression Analysis
Updated: May 18, 2026
5 min read
# CHAPTER 20
Correlation and Regression Analysis in R
1. Chapter Introduction
Regression is R's killer statistical feature — predict, model, and understand relationships between variables. This chapter covers correlation analysis, simple and multiple linear regression, model evaluation, and builds a house price prediction model.2. Correlation Analysis
r
3. Simple Linear Regression
r
4. Multiple Linear Regression + Mini Project
r
5. Common Mistakes
- Correlation ≠ Causation: Strong correlation between ice cream sales and drownings (both increase in summer) doesn't mean ice cream causes drowning. Always consider confounders.
-
Not checking regression assumptions: Linear regression assumes: linearity, normality of residuals, homoscedasticity, no multicollinearity. Always plot
plot(model)for diagnostics.
6. MCQs
Question 1
Pearson correlation r = -0.9 means?
Question 2
lm(y ~ x, data) fits?
Question 3
R-squared measures?
Question 4
predict(model, newdata, interval="prediction") provides?
Question 5
cor.test() tests?
Question 6
Multiple regression lm(y ~ x1 + x2 + x3)?
Question 7
RMSE measures?
Question 8
residuals(model) extracts?
Question 9
Spearman correlation is preferred when?
Question 10
plot(model) produces?
7. Interview Questions
- Q: What is the difference between R-squared and adjusted R-squared?
- Q: How do you check linear regression assumptions in R?
8. Summary
Correlation:cor() (Pearson, Spearman, Kendall), cor.test() for significance. Simple regression: lm(y ~ x), evaluate with R², RMSE. Multiple regression: lm(y ~ x1 + x2 + ...). Predictions: predict(model, newdata). Model diagnostics: plot(model) for 4 assumption checks. Feature importance via t-values. Always split train/test for unbiased evaluation. Correlation ≠ causation.