CHAPTER 29
Beginner
Performance Optimization in R
Updated: May 18, 2026
5 min read
# CHAPTER 29
Performance Optimization in R
1. Chapter Introduction
Production R code must handle millions of rows efficiently. This chapter covers profiling to identify bottlenecks, vectorization, data.table for fast aggregation, Rcpp for critical paths, and parallelism for CPU-intensive workloads.2. Profiling: Finding Bottlenecks
r
3. Vectorization vs Loops
r
4. Parallel Computation
r
5. Memory Optimization
r
6. Common Mistakes
-
Growing vectors inside loops is O(n²): Every
c(result, x)creates a new copy of the entire vector. For n=100,000 this causes 5 billion byte copies. Always pre-allocate.
- Parallelizing fast operations: Parallel overhead (forking, IPC) costs ~10ms per job. If the task itself takes 1ms, parallel is 10x SLOWER. Parallelize only when single iteration takes >100ms.
7. MCQs
Question 1
profvis({code}) visualizes?
Question 2
Pre-allocating result <- numeric(n) before loop is faster because?
Question 3
detectCores() returns?
Question 4
fread() in data.table is?
Question 5
gc() in R performs?
Question 6
Parallel is NOT beneficial when?
Question 7
foreach(i=1:n) %dopar% {code} runs iterations?
Question 8
Factor vs character: factor is better when?
Question 9
object.size(x) measures?
Question 10
integer(n) vs numeric(n) pre-allocation?
8. Interview Questions
- Q: How do you identify and fix performance bottlenecks in R?
- Q: When does parallelization NOT help in R?
9. Summary
Profiling:profvis() flame graph, system.time() quick timer. Vectorization speedup: growing vector (slow) < pre-allocated loop < vectorized operations. data.table::fread() for 5-10x faster CSV reading. Parallelism: foreach %dopar% for independent long tasks (>100ms per task). Memory: use factors for categoricals, integers for counts, rm() + gc() to free. Never grow vectors in loops.