Skip to main content
R Programming
CHAPTER 13 Beginner

Data Import and Export

Updated: May 18, 2026
5 min read

# CHAPTER 13

Data Import and Export in R

1. Chapter Introduction

Real-world data arrives in dozens of formats — Excel spreadsheets, JSON APIs, SQL databases, Stata files. This chapter builds a complete data import/export toolkit that handles every major data format.

2. CSV and Text Import (readr)

r
123456789101112131415161718192021222324
library(readr)
library(dplyr)

# Basic CSV import with column type specification
sales <- read_csv("sales_data.csv",
  col_types = cols(
    date     = col_date(format = "%Y-%m-%d"),
    revenue  = col_double(),
    region   = col_character(),
    units    = col_integer(),
    category = col_factor(levels = c("Electronics","Clothing","Food"))
  ))

# Check for parsing problems
problems(sales)  # Shows rows that failed to parse
spec(sales)      # Show the column specification used

# Large file: read in chunks
read_csv_chunked("large_file.csv",
  callback = DataFrameCallback$new(function(chunk, pos) {
    # Process each chunk (1000 rows at a time)
    chunk %>% filter(revenue > 1000)
  }),
  chunk_size = 1000)

3. Excel Import (readxl)

r
12345678910111213141516171819202122232425262728
library(readxl)

# Read Excel file
df <- read_excel("financial_report.xlsx")
df <- read_excel("report.xlsx",
                  sheet = "Sales 2024",    # Sheet by name
                  range = "B2:G50",        # Cell range
                  col_names = TRUE,
                  na = c("", "N/A", "-"))  # NA values

# Multiple sheets
sheets <- excel_sheets("report.xlsx")  # List all sheet names
all_data <- lapply(sheets, function(s) read_excel("report.xlsx", sheet=s))
names(all_data) <- sheets

# Write Excel
library(writexl)
write_xlsx(list(Sales=sales_df, Inventory=inv_df), "output.xlsx")

# openxlsx — formatted Excel output
library(openxlsx)
wb <- createWorkbook()
addWorksheet(wb, "Summary")
writeData(wb, "Summary", data=summary_df, startRow=2)
addStyle(wb, "Summary", style=createStyle(fontColour="#FFFFFF",
                                           fgFill="#1565C0", bold=TRUE),
         rows=2, cols=1:ncol(summary_df), gridExpand=TRUE)
saveWorkbook(wb, "formatted_report.xlsx", overwrite=TRUE)

4. JSON Import (jsonlite)

r
123456789101112131415161718
library(jsonlite)

# Simple JSON
json_str <- &#039;{"name": "Alice", "age": 30, "scores": [85, 92, 78]}'
data <- fromJSON(json_str)
data$name    # "Alice"
data$scores  # 85 92 78

# JSON file or URL
df <- fromJSON("api_response.json", flatten=TRUE)  # Flatten nested JSON
df <- fromJSON("https://api.example.com/data")     # Directly from URL

# Convert R object to JSON
r_list <- list(employees=employees, summary=dept_summary)
json_output <- toJSON(r_list, pretty=TRUE, auto_unbox=TRUE)
write(json_output, "export.json")

cat(json_output)

5. Database Connection (DBI)

r
123456789101112131415161718192021222324252627282930313233343536
library(DBI)
library(RMySQL)  # or RSQLite, RPostgres, odbc

# MySQL connection
con <- dbConnect(RMySQL::MySQL(),
                  host     = "localhost",
                  dbname   = "company_db",
                  user     = "root",
                  password = "password",
                  port     = 3306)

# Query to data frame
employees <- dbGetQuery(con, "
  SELECT e.name, e.salary, d.dept_name
  FROM employees e
  JOIN departments d ON e.dept_id = d.id
  WHERE e.salary > 50000
  ORDER BY e.salary DESC
")

# Parameterized query (prevent SQL injection)
dept_query <- "SELECT * FROM employees WHERE dept = ?"
it_employees <- dbGetQuery(con, dept_query, params=list("IT"))

# Write R data frame to database table
dbWriteTable(con, "r_analysis_results", results_df, overwrite=TRUE)

# List tables
dbListTables(con)

# Close connection
dbDisconnect(con)

# SQLite (no server needed — great for local projects)
library(RSQLite)
con <- dbConnect(SQLite(), "local_data.sqlite")

6. Common Mistakes

  • Excel date import as numbers: Excel stores dates as numbers since 1900. read_excel() handles this automatically, but read.csv() of an Excel-exported CSV may give numbers like 45000. Use as.Date(x, origin="1899-12-30").
  • JSON nested structure not flattening: fromJSON("nested.json") may return a complex nested list. Use flatten=TRUE or tidyr::unnest() to expand into a flat data frame.

7. MCQs

Question 1

read_csv() returns?

Question 2

excel_sheets("file.xlsx") returns?

Question 3

fromJSON(url) can read data from?

Question 4

dbGetQuery(con, sql) returns?

Question 5

problems(df) after read_csv shows?

Question 6

dbWriteTable() with overwrite=TRUE?

Question 7

toJSON(x, pretty=TRUE) formats output?

Question 8

col_factor(levels=c("A","B")) in read_csv?

Question 9

read_excel(range="B2:G50") reads?

Question 10

SQLite advantage over MySQL for analysis?

8. Interview Questions

  • Q: How do you connect R to a MySQL database and run queries?
  • Q: How do you import a multi-sheet Excel file into R?

9. Summary

Data import ecosystem: readr (CSV/TSV, fastest), readxl (Excel), jsonlite (JSON/APIs), DBI + RMySQL/RSQLite (databases). Always check problems() after read_csv(). Use col_types for type safety. JSON: fromJSON(flatten=TRUE) for nested data. Database: dbGetQuery() for queries, dbWriteTable() to export results. Use parameterized queries to prevent SQL injection.

10. Next Chapter Recommendation

In Chapter 14: Data Cleaning in R, we tackle missing data, duplicates, outliers, and build a complete data cleaning pipeline.

Finish this Chapter

Save your progress on your learning path and prepare for coding interview challenges.

Discussion

Join the discussion

Log in or create a free account to participate.

Sort: ·