Book description

This introduction to visualization techniques and statistical models for second language research focuses on three types of data (continuous, binary, and scalar), helping readers to understand regression models fully and to apply them in their work. Garcia offers advanced coverage of Bayesian analysis, simulated data, exercises, implementable script code, and practical guidance on the latest R software packages.

The book, also demonstrating the benefits to the L2 field of this type of statistical work, is a resource for graduate students and researchers in second language acquisition, applied linguistics, and corpus linguistics who are interested in quantitative data analysis.

Access book files files at

Available at Routledge and Amazon. Click here to have 20% off



Highly recommended as an accessible introduction to the use of R for analysis of second language data. Readers will come away with an understanding of why and how to use statistical models and data visualization techniques in their research.

Lydia White, James McGill Professor Emeritus, McGill University.

Curious where the field’s quantitative methods are headed? The answer is in your hands right now! Whether we knew it or not, this is the book that many of us have been waiting for. From scatter plots to standard errors and from beta values to Bayes theorem, Garcia provides us with all the tools we need—both conceptual and practical—to statistically and visually model the complexities of L2 development.

Luke Plonsky, Associate Professor, Northern Arizona University

This volume is a timely and must-have addition to any quantitative SLA researcher’s data analysis arsenal, whether you are downloading R for the first time or a seasoned user ready to dive into Bayesian analysis. Guilherme Garcia’s accessible, conversational writing style and uncanny ability to provide answers to questions right as you’re about to ask them will give new users the confidence to make the move to R and will serve as an invaluable resource for students and instructors alike for years to come.

Jennifer Cabrelli, Associate Professor, University of Illinois at Chicago.

Content and objectives

As the title implies, data visualization is a priority in the book. Different chapters discuss how to plot continuous and categorical data using R—more specifically, the ggplot2 package. Another priority are statistical models (linear, logistic and ordinal regressions). As a bonus, the book will have a chapter on Bayesian models.

Aligned with the priorities above, the goals of the book are:

All three goals rely on the R language (especially on packages such as tidyverse), but no previous experience with R is required. The idea is to provide a user-friendly and comprehensive approach to R with code blocks throughout the book. Crucially, figures and analyses throughout the book will be fully reproducible.

The chapters on statistical analysis will simulate the typical steps involved in data analysis:

  1. Importing your data
  2. Preparing your data for analysis
  3. Exploratory data analysis (figures)
  4. Statistical analysis
  5. Reporting results

All data sets are simulated and will accompany the book. Finally, review exercises will be found at the ends of different chapters. Readers will be able to use the book for self-study or in research methods courses (linguistics, second language acquisition, education).

News & updates

Here are some updates and additional info related to the code used in the book. Some of these are based on questions I get about the code used in the book. This page will change from time to time to reflect updates in relevant packages and functions used in the book (e.g., mutate_...(); see here).

  1. The function mutate_if() has been superseded by across(). For example, the equivalent to ... mutate_if(is.character, as.factor) is ... mutate(across(where(is_character), as_factor)).
  2. Besides using scale_x_discrete(label = abbreviate) to abbreviate axis labels, you can also use scale_x_discrete(labels = c("Beginner" = "Beg", "Intermediate" = "Int", "Advanced" = "Adv")), which allows you to choose how labels are abbreviated.
  3. For guidelines regardings Bayesian analyses, see Kruschke’s recent paper Bayesian Analysis Reporting Guidelines

New in R 4.1.0: The native pipe

R 4.1.0 (May 2021l; Camp Pontanezen) introduces a number of changes. Don’t worry: none of these changes affects the code in the book. Most changes are minor, which is typical. However, R now has a native pipe operator, namely, |>. Throughout the book, I use %>% from the magrittr package (loaded with tidyverse). Now, besides having %>%, you also have |> a native option (so you don’t need to load any packages to use it). Note that |> and %>% are not identical. In fact, the pipe used in the book, %>%, does all |> does and a bit more, so you can keep using %>% (you will likely be loading tidyverse anyway).

Useful packages and functions not mentioned in the book

Errata and clarifications

How to cite


Garcia, G. D. (2021). Data visualization and analysis in second language research. New York, NY: Routledge.


    title = {Data visualization and analysis in second language research},
    author = {Garcia, Guilherme Duarte},
    year = {2021},
    address = {New York, NY},
    publisher = {Routledge},


Part of this project benefited from an ASPiRE Junior Faculty Award, awarded by Ball State University (2020–2021).

Copyright © 2022 Guilherme Duarte Garcia