In this practice, we’ll be exploring the danish
data set again (and you may have to use dplyr
as well). While thinking about the questions below, it will be helpful to open R and reproduce all the steps in your own computer.
In the first part (1-5), you will see plots, and will have to think about the code that generated them. In the second part (6-10), you will be given the description of what needs to be plotted, and will again have to come up with the code that creates an appropriate plot. Suggested answers are provided, just like practice 1 and 2.
danish
data set (languageR
package). In addition, a trend line has been added which simulates a linear regression model. Hint: when you add the line, you will have to set method
to "lm"
. Also, note that the scatter plot has some transparency.Sex
—which doesn’t seem to make any difference.PrevError
, i.e., whether the participant got the previous word right. No apparent effect of PrevError
on reaction time.Hint: install the ggthemes
package. Then, add theme_fivethirtyeight()
to your plot to get the background in the plot below. Then, to use the colours correctly, define the fill and the line colours as follows:
fill = "#4271AE"
line = "#1F3552"
Click here to check HEX colour codes (this can be quite useful if you plan to create a template and use specific colours).
Finally, when you create your bars, simply tell R to use fill
as the fill colour and line
as the line colour. Remember that in ggplot2
the argument for changing the colour of you lines is colour
. If you find this confusing, don’t worry: it is. As mentioned above, focus on trying to get the bars and the error bars. Then on getting the ascending order right. Only worry about the looks at the very end.
It’s often the case that you need to transform your data before plotting it. You might want to see a pattern that is not directly shown by your data. This is extremely common, and that’s why using dplyr
and ggplot2
together is a great option. Some summarisation can be achieved with ggplot2
alone (e.g., using stat_summary()
), but depending on the complexity of what you want to plot more work will be necessary. In this part we will practice different types of transformation.
PrevError
) by speaker. One way to do it is to use a histogram (or a bar plot) to plot the counts, and then add a layer to your plot that transforms counts into percentages (or include the calculation in your plot). As always, there are several different ways to accomplish this. In this case, let’s practice doing that with dplyr
and then plotting it.filter by type of answer (remember: you want to plot the % of correct responses)
geom_point()
. You may want to angle your x-axis.finally, order your subjects by % of correct answers
As usual, do not worry about the details (like how to use % in the y-axis). Only worry about that if everything else is clear.
LogRT
? Try generating a histogram with backtransforming LogRT
and see how that changes the distribution.Sex
. Are the data (roughly) normally distributed within each level of PrevError
and Sex
?Intro (not exactly your task, but necessary for plotting later)
You may have noticed that danish
does not have a column that tells us whether the participant got the current word correctly (we only have PrevError
). This is unfortunate, so let’s simulate such a column to use it in ggplot
. How can we do that?
Easy: R is great for simulation. In this particular case, we want to randomly assign CORRECT
and ERROR
to our data. Let’s do that first. In the code below, we are telling R to randomly sample CORRECT
and ERROR
x number of times (where x is the number of rows in danish
). This is sampling with replacement, given that we only have two possible values, so we set replace
to TRUE
. Finally, we can even tell R the probabilities of each level in this factor (note that R will treat this as a character column, so we add as.factor()
to the code below). In this case, we want to simulate these values such that CORRECT
is more likely than ERROR
. More specifically, there’s a 70% probability of randomly generating a CORRECT
and a 30% probability of generating an ERROR
.
danish$CurrError = as.factor(sample(c("CORRECT", "ERROR"),
size = nrow(danish),
replace = TRUE,
prob = c(0.7, 0.3)))
# Let's now see if the probabilities are mirrored in the simulated data:
danish %>% group_by(CurrError) %>%
summarise(n = n()) %>%
mutate(Freq = n / sum(n)) %>%
select(-n)
## # A tibble: 2 x 2
## CurrError Freq
## <fctr> <dbl>
## 1 CORRECT 0.6975346
## 2 ERROR 0.3024654
# Great. Now let's move on!
Now, pretend that these data are real, so we know how participants performed in the current trial.
This is your task
The plot in question 6 is quite informative, but let’s say you want to take into account the variation of correct answers by affix. In the plot below, we see the mean % of correct responses in our simulated current trial column, but we also see the standard error when we take into account the different %s across affixes (in this case, we won’t see much variation in standard errors, given that we have not added any noise by affix to our random data imputation above). This can tell us how much certainty we have regarding each subject’s performance considering the different affixes.
Hint: This type of transformation is a complex at first, but you quite often need to do something similar to it. If you have any questions, refer back to the tutorial or email me. More importantly, the code below shows you how flexible dplyr
can be: we’re doing several things through a pipeline of operations, all at once (including ggplot
, which until now was a separate step). More importantly, study this example and use it as a “template”. Hint: This type of transformation is a complex at first, but you quite often need to do something similar to it. If you have any questions, refer back to the tutorial or email me. More importantly, the code below shows you how flexible dplyr
can be: we’re doing several things through a pipeline of operations, all at once (including ggplot
, which until now was a separate step). More importantly, study this example and use it as a “template”.