CHAPTER 23
Beginner
Machine Learning Basics in R
Updated: May 18, 2026
5 min read
# CHAPTER 23
Machine Learning Basics in R
1. Chapter Introduction
R'scaret (Classification and Regression Training) package provides a unified interface for 200+ ML algorithms — one consistent API for training, tuning, and evaluating any model. This chapter builds the complete ML workflow.
2. ML Workflow in R
text
3. Data Preprocessing and Splitting
r
4. Model Training with caret
r
5. Model Evaluation
r
6. Common Mistakes
-
Data leakage: Scaling with test set statistics (fitting
preProcesson train+test combined) leaks test information into training. Always fit preprocessing on train only, apply to test.
- Accuracy for imbalanced data: If 95% of customers don't churn, a model predicting "No" for everyone achieves 95% accuracy — but is useless. Use F1, AUC, or precision-recall.
7. MCQs
Question 1
createDataPartition() ensures?
Question 2
10-fold cross-validation does?
Question 3
confusionMatrix() requires?
Question 4
AUC of 0.5 means?
Question 5
Feature scaling is critical for?
Question 6
preProcess(train, method=c("center","scale")) applies?
Question 7
varImp(model) shows?
Question 8
Recall (sensitivity) measures?
Question 9
Data leakage in ML occurs when?
Question 10
F1 Score is useful when?
8. Interview Questions
- Q: What is cross-validation and why is it important?
- Q: What is data leakage and how do you prevent it?
9. Summary
ML in R withcaret: createDataPartition() for stratified split, preProcess() for scaling (fit on train only!), trainControl(method="cv") for cross-validation, train() for model fitting. Evaluate with confusionMatrix(): accuracy, precision, recall, F1. For imbalanced data: use AUC-ROC, F1 over accuracy. resamples() + dotplot() for model comparison.