Guilherme D. Garcia

Just like anything else in R, there are different options to plot vowels. There are, for example, some specific packages you can use (phonR and vowels). But you can easily plot vowels without these packages, simply by using ggplot2—which may be useful if you’re already familiar with the package.

Step 1: Basics

For this example, I’ll create some vowels (random F1 and F2 values taken from a normal distribution), but you can load some existing data, of course (phonR, for example). First, let’s see what a typical ggplot looks like.

library(tidyverse)

set.seed(10)

vowels = data.frame(vowel = rep(c("a", "e", "i", "o", "u"), each = 50),
                    
                    F1 = c(rnorm(50, mean = 800, sd = 100), 
                           rnorm(50, mean = 600, sd = 100), 
                           rnorm(50, mean = 350, sd = 100), 
                           rnorm(50, mean = 600, sd = 100), 
                           rnorm(50, mean = 350, sd = 100)),
                    
                    F2 = c(rnorm(50, mean = 1500, sd = 150), 
                           rnorm(50, mean = 2000, sd = 150), 
                           rnorm(50, mean = 2500, sd = 150), 
                           rnorm(50, mean = 1000, sd = 150), 
                           rnorm(50, mean = 800, sd = 150)))


ggplot(data = vowels, aes(x = F2, y = F1, color = vowel, label = vowel)) + 
    geom_text()

Step 2: Axes

Reversed values

The very first problem with the plot above is that our axes must be reversed. Not only that: ideally, you’d want both F1 and F2 to start at the top-right corner of the plot, just like any typical vowel plot you see in papers. Another issue you probably want to fix is the presence of a legend (key), which is completely redundant given that we’re using geom_text().

ggplot(data = vowels, aes(x = F2, y = F1, color = vowel, label = vowel)) + 
    geom_text() + 
    scale_y_reverse() + 
    scale_x_reverse() +
    theme(legend.position = "none")

Axis position

It’s very easy to shift the axes: simply add a positional argument to scale_x_reverse() and scale_y_reverse().

ggplot(data = vowels, aes(x = F2, y = F1, color = vowel, label = vowel)) + 
    geom_text() + 
    scale_y_reverse(position = "right") + 
    scale_x_reverse(position = "top") +
    theme(legend.position = "none")

Everything else is straightforward. You can now adjust the formatting, add some error bars etc. If you don’t know how to do that, keep reading.

Step 4: Extras

Density plot

You could use the geom_density_2d to highlight the density of the vowels.

ggplot(data = vowels, aes(x = F2, y = F1, color = vowel, label = vowel)) + 
    geom_text() + 
    scale_y_reverse(position = "right") + 
    scale_x_reverse(position = "top") + 
    geom_density_2d() +
    theme(legend.position = "none")

Double error bars

Another thing you could do is use the mean F1 and F2 values along with their standard errors. That would give you two error bars, one for each dimension/variable. There are different ways to do that, but since the idea here is to stick with ggplot2, let’s do it using geom_errorbar() and geom_errorbarh().

# First, create summary table (tibble) with means and standard errors
# I'm using dplyr here (since I loaded tidyverse above)

means = vowels %>% group_by(vowel) %>% summarize(meanF1 = mean(F1),
                                                 meanF2 = mean(F2),
                                                 seF1 = sd(F1)/sqrt(n()),
                                                 seF2 = sd(F2)/sqrt(n()))

means
## # A tibble: 5 x 5
##   vowel meanF1 meanF2  seF1  seF2
##   <fct>  <dbl>  <dbl> <dbl> <dbl>
## 1 a       766.  1507.  12.3  21.8
## 2 e       607.  2043.  13.8  20.8
## 3 i       352.  2517.  13.7  23.3
## 4 o       579.  1009.  13.6  23.8
## 5 u       351.   776.  13.0  24.2

Now that we have all the information we need, we can just go ahead and plot the vowel means and associated standard errors.

ggplot(data = means, aes(x = meanF2, y = meanF1, color = vowel)) + 
    geom_errorbar(aes(ymin = meanF1 - seF1, ymax = meanF1 + seF1), width = 20) + 
    geom_errorbarh(aes(xmin = meanF2 - seF2, xmax = meanF2 + seF2), height = 20) +
    scale_y_reverse(position = "right") + 
    scale_x_reverse(position = "top") +
    theme(legend.position = "none")

Ok, this looks good, but we have to fix one crucial thing: how do we want to signal the vowels…? Right now, we’re using colors, so we could make the key appear again (which would be easy). Another option is to add the vowels themselvs to the plot. We probably don’t want them to be right in the middle of the error bars (since they would need to be big, and could therefore hide the actual bars). You can add the vowels with geom_text() or geom_label(), and then adjust its position so that it doesn’t hide the bars (note that you need an addition aes() argument, namely, label).

ggplot(data = means, aes(x = meanF2, y = meanF1, label = vowel)) + 
    geom_errorbar(aes(ymin = meanF1 - seF1, ymax = meanF1 + seF1), width = 20) + 
    geom_errorbarh(aes(xmin = meanF2 - seF2, xmax = meanF2 + seF2), height = 10) +
    geom_text(position = position_nudge(x = 50, y = 50), size = 5) + 
    scale_y_reverse(position = "right") + 
    scale_x_reverse(position = "top")

This looks better. You can naturally adjust the fontface, color etc. Finally, let’s adjust the labels (note the \n to break a line) and add Hz to our axes. Let’s also use a different theme.

library(scales)

ggplot(data = means, aes(x = meanF2, y = meanF1, label = vowel)) + 
    geom_errorbar(aes(ymin = meanF1 - seF1, ymax = meanF1 + seF1), width = 20) + 
    geom_errorbarh(aes(xmin = meanF2 - seF2, xmax = meanF2 + seF2), height = 10) +
    geom_text(position = position_nudge(x = 50, y = 50), size = 5) + 
    scale_y_reverse(position = "right", labels = unit_format(unit = "Hz", sep = "")) + 
    scale_x_reverse(position = "top", labels = unit_format(unit = "Hz", sep = "")) + 
    labs(x = "F2\n",
         y = "F1\n") + 
    theme_light()

Final details

Finally, let’s revisit the density plot and adjust its formatting as well. Note that I’m changing the font size, adding some transparency to the actual density layer (so it doesn’t get too cluttered), and controlling the axes a bit better (values and breaks).

ggplot(data = vowels, aes(x = F2, y = F1, color = vowel, label = vowel)) + 
    geom_text(size = 6) + # Font size for vowels
    scale_y_reverse(position = "right", 
                    labels = unit_format(unit = "Hz", sep = ""),
                    breaks = seq(100, 1000, 250)) + 
    scale_x_reverse(position = "top", 
                    labels = unit_format(unit = "Hz", sep = ""),
                    breaks = seq(200, 3000, 500)) + 
    labs(x = "F2\n",
         y = "F1\n",
         title = "Final plot (A)") + 
    geom_density_2d(alpha = 0.3) +
    coord_cartesian(xlim = c(200, 3000), 
                    ylim = c(100, 1000)) +
    theme_light() +
    theme(legend.position = "none",
          plot.title = element_text(hjust = 0.5), # Center plot title
          text = element_text(size = 13))         # Font size for plot

Now with semi-transparent ellipses.

ggplot(data = vowels, aes(x = F2, y = F1, color = vowel, label = vowel)) + 
    geom_text(size = 6) + # Font size for vowels
    scale_y_reverse(position = "right", 
                    labels = unit_format(unit = "Hz", sep = ""),
                    breaks = seq(100, 1000, 250)) + 
    scale_x_reverse(position = "top", 
                    labels = unit_format(unit = "Hz", sep = ""),
                    breaks = seq(200, 3000, 500)) + 
    labs(x = "F2\n",
         y = "F1\n",
         title = "Final plot (B)") + 
    stat_ellipse(type = "norm", alpha = 0.3) +
    coord_cartesian(xlim = c(200, 3000), 
                    ylim = c(100, 1000)) +
    theme_classic() +
    theme(legend.position = "none",
          plot.title = element_text(hjust = 0.5), # Center plot title
          text = element_text(size = 13))         # Font size for plot

Finally, let’s keep the ellipses but only show the mean F1-F2 values for each vowel (this will give us a more minimalist plot). To accomplish this, geom_label() will need the means variable created above (but stat_ellipse() will still require vowels, so you’ll need to play around with two separate datasets, as shown below).

ggplot(data = means, aes(x = meanF2, y = meanF1, color = vowel, label = vowel)) + 
    geom_label(size = 6) + # Font size for vowels
    scale_y_reverse(position = "right", 
                    labels = unit_format(unit = "Hz", sep = ""),
                    breaks = seq(100, 1000, 250)) + 
    scale_x_reverse(position = "top", 
                    labels = unit_format(unit = "Hz", sep = ""),
                    breaks = seq(200, 3000, 500)) + 
    labs(x = "F2\n",
         y = "F1\n",
         title = "Final plot (C)") + 
    stat_ellipse(data = vowels, aes(x = F2, y = F1), type = "norm") +
    coord_cartesian(xlim = c(200, 3000), 
                    ylim = c(100, 1000)) +
    theme_classic() +
    theme(legend.position = "none",
          plot.title = element_text(hjust = 0.5), 
          text = element_text(size = 13))



Copyright © 2018 Guilherme Duarte Garcia