Skip to main content
R Programming
CHAPTER 01 Beginner

Introduction to R Programming

Updated: May 18, 2026
5 min read

# CHAPTER 1

Introduction to R Programming

1. Chapter Introduction

R is the language statisticians built for themselves — and then data scientists adopted it to power everything from medical research to Wall Street algorithms. This chapter explains what R is, why it matters, and how it became one of the top 5 data science languages globally.

2. Learning Objectives

  • Define R and understand its statistical computing heritage.
  • Identify R's core applications in data science, research, and BI.
  • Understand the R ecosystem (packages, CRAN, RStudio).
  • Know career pathways where R is essential.

3. What is R?

text
123456789101112131415
R Programming:

R is a free, open-source language and environment for:
  ✅ Statistical computing
  ✅ Data analysis
  ✅ Machine learning
  ✅ Data visualization
  ✅ Bioinformatics
  ✅ Financial modeling

Created by: Ross Ihaka and Robert Gentleman (1993)
Based on:   S programming language (Bell Labs, 1976)
Governed by: R Core Team + R Foundation

Key fact: R has 19,000+ packages on CRAN (Comprehensive R Archive Network)

4. Why R for Data Science?

text
1234567891011121314151617
R vs Other Languages:

+─────────────────────+────────────────────────+──────────────────────+
│ Feature             │ R                      │ Python               │
+─────────────────────+────────────────────────+──────────────────────+
│ Primary strength    │ Statistics, research   │ General ML/AI        │
│ Learning curve      │ Moderate (stats-heavy) │ Moderate (code-heavy)│
│ Visualization       │ ggplot2 (best-in-class)│ Matplotlib/Seaborn   │
│ Statistical tests   │ Built-in (400+ tests)  │ SciPy/Statsmodels    │
│ Academic research   │ Dominant               │ Growing              │
│ Bioinformatics      │ Dominant (Bioconductor)│ Limited              │
│ Finance/quant       │ Strong                 │ Strong               │
│ Report generation   │ R Markdown + knitr     │ Jupyter              │
+─────────────────────+────────────────────────+──────────────────────+

R is preferred when: statistical rigor matters, research reproducibility
is required, or you're working in academic/healthcare/pharma settings.

5. R Ecosystem Overview

text
1234567891011121314151617181920212223
R ECOSYSTEM:

Base R ──────── Core language + built-in functions
   │
   ├── CRAN ──── 19,000+ community packages
   │     ├── tidyverse   (dplyr, ggplot2, tidyr, purrr)
   │     ├── data.table  (blazing fast data manipulation)
   │     ├── caret       (machine learning framework)
   │     ├── forecast    (time series)
   │     ├── shiny       (interactive web apps)
   │     └── Bioconductor(genomics/bioinformatics)
   │
   ├── RStudio ── IDE for R (most popular)
   │
   └── R Markdown── Reproducible research reports

The tidyverse (by Hadley Wickham) is R's most important package collection:
  readr   → data import
  dplyr   → data manipulation
  tidyr   → data reshaping
  ggplot2 → data visualization
  purrr   → functional programming
  stringr → string manipulation

6. Applications of R

text
123456789101112
Industry          Application                     R Tools Used
──────────────────────────────────────────────────────────────────
Healthcare        Clinical trial analysis         survival, ggplot2
Pharma            Drug efficacy modeling          nlme, lme4
Finance           Quantitative modeling           quantmod, TTR
Academia          Statistical research            base R, lme4
Genomics          Gene expression analysis        Bioconductor
Social Science    Survey analysis                 lavaan, psych
Marketing         Customer segmentation           cluster, mclust
Insurance         Actuarial modeling              actuar
Government        Economic data analysis          ggplot2, dplyr
Manufacturing     Quality control (SPC)           qcc

7. Mini Project: Analyze Student Marks Dataset

r
12345678910111213141516171819202122232425262728293031323334353637
# Mini Project: Student Performance Analysis in R

# Sample dataset
students <- data.frame(
  name    = c("Alice", "Bob", "Carol", "David", "Eve", "Frank"),
  math    = c(92, 78, 85, 60, 95, 72),
  science = c(88, 82, 90, 55, 97, 65),
  english = c(76, 88, 84, 70, 89, 80),
  stringsAsFactors = FALSE
)

# Calculate averages
students$average <- rowMeans(students[, c("math", "science", "english")])

# Grade assignment
students$grade <- ifelse(students$average >= 90, "A",
                   ifelse(students$average >= 80, "B",
                   ifelse(students$average >= 70, "C",
                   ifelse(students$average >= 60, "D", "F"))))

# Summary statistics
cat("=== STUDENT PERFORMANCE REPORT ===\n\n")
cat("Class Statistics:\n")
cat("  Mean Math:    ", mean(students$math),    "\n")
cat("  Mean Science: ", mean(students$science), "\n")
cat("  Mean English: ", mean(students$english), "\n")
cat("  Class Average:", round(mean(students$average), 2), "\n\n")

cat("Individual Results:\n")
print(students[, c("name", "average", "grade")])

cat("\nTop Performer:", students$name[which.max(students$average)], "\n")
cat("Needs Support: ", students$name[which.min(students$average)], "\n")

# Grade distribution
cat("\nGrade Distribution:\n")
print(table(students$grade))

Output:

1234567891011121314151617181920212223
=== STUDENT PERFORMANCE REPORT ===

Class Statistics:
  Mean Math:     80.33
  Mean Science:  79.5
  Mean English:  81.17
  Class Average: 80.33

Individual Results:
   name  average grade
1 Alice   85.333     B
2   Bob   82.667     B
3 Carol   86.333     B
4 David   61.667     D
5   Eve   93.667     A
6 Frank   72.333     C

Top Performer: Eve
Needs Support:  David

Grade Distribution:
A B C D
1 3 1 1

8. Career Opportunities

text
1234567891011
R-related Job Roles:
  Data Analyst          → $65,000 - $95,000/year
  Statistician          → $75,000 - $115,000/year
  Data Scientist        → $95,000 - $140,000/year
  Quantitative Analyst  → $100,000 - $160,000/year
  Biostatistician       → $80,000 - $130,000/year
  Research Scientist    → $85,000 - $130,000/year

Top companies using R:
  Google, Facebook, Airbnb, Pfizer, Johnson & Johnson,
  Merck, Goldman Sachs, McKinsey, WHO, FDA

9. Common Mistakes

  • Confusing R with R Studio: R is the language/runtime. RStudio is the IDE (editor). You can run R without RStudio — but you can't run RStudio without R.
  • Treating R like Python: R is vectorized by design. In R, x + 1 adds 1 to every element of vector x. Beginners often write unnecessary loops.

10. MCQs

Question 1

R was created by?

Question 2

CRAN stands for?

Question 3

ggplot2 is used for?

Question 4

tidyverse is?

Question 5

R is primarily designed for?

Question 6

which.max(x) returns?

Question 7

rowMeans() computes?

Question 8

R is based on which earlier language?

Question 9

RStudio is?

Question 10

data.frame() creates?

11. Interview Questions

  • Q: What is the difference between R and Python for data science?
  • Q: What is the tidyverse and why is it important?

12. Summary

R is a free, open-source statistical computing language with 19,000+ CRAN packages. The tidyverse ecosystem (dplyr, ggplot2, tidyr) dominates modern data analysis. R excels in statistical rigor, research reproducibility, and visualization quality. Career paths: data analyst, statistician, data scientist, biostatistician.

13. Next Chapter Recommendation

In Chapter 2: Installing R and RStudio, we install R, configure RStudio, and learn the IDE layout to start coding immediately.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·