These are three very useful objects in R that were discussed in our first meeting.
A ………. is the equivalent to a spreadsheet in Excel. It almost always has two dimensions, namely, ………. and ………. This is what you will be working with most of the time.
Each ………. in a ………. is equivalent to a vector. As a result, a typical column contains only one type of data, i.e., a column cannot have numbers and strings (unlike rows): if it does, numbers will be coerced to as.character.
………. are one of the most flexible objects in R. They can have multiple dimensions as well as different types of data. As a result, you can hold both numbers and strings in single variable.
To create a vector, you use the ………. command. To create a data frame from scratch, you use ………. If you want to turn an object into a data frame (e.g., a matrix), you use ………. To create lists, you type ……….
(a) data frame; rows; columns
(b) column; data frame
(c) lists
(d) c(); data.frame(); as.data.frame(); list()
How would you create the following data frame from scratch and assign it to a new variable? (You won’t normally do that in R, but it’s good practice).
item sentence type condition version RT
1 The nurse was nervous filler pauseN2 a 4.628606
2 I walk every day filler fallingInt b 3.510744
3 She never speaks Japanese target risingInt a 2.851694
myData = data.frame(
item = c(1,2,3),
sentence = c("The nurse was nervous", "I walk every day", "She never speaks Japanese"),
type = c("filler", "filler", "target"),
condition = c("pauseN2", "fallintInt", "risingInt"),
version = c("a", "b", "a"),
RT = c(4.628606, 3.510744, 2.851694)
)
myData
## item sentence type condition version RT
## 1 1 The nurse was nervous filler pauseN2 a 4.628606
## 2 2 I walk every day filler fallintInt b 3.510744
## 3 3 She never speaks Japanese target risingInt a 2.851694
In myData, we anticipate that R will treat item as a number, not as a factor. As a result, if you use summary(myData) (or mean(myData$item)), R will return the mean for item, which makes no sense. Using item as an example, check its class and change it into a factor.
class(myData$item)
## [1] "numeric"
# So R will return the mean of the column:
mean(myData$item) # !
## [1] 2
myData$item = as.factor(myData$item)
# Let's check that item is a factor now:
class(myData$item)
## [1] "factor"
mean(myData$item) # Now we won't be able to calculate the mean, of course
## Warning in mean.default(myData$item): argument is not numeric or logical:
## returning NA
## [1] NA
Packages
How do you install and load a package in R?
# In this example, I'm explicitly telling R which repository to use:
install.packages("nameOfPackage", repos = "http://cran.r-project.org")
# To load a package:
library("nameOfPackage")
# Alternatively:
require("nameOfPackage")
# Avoid using require(): library() is preferred
# PS: You don't actually need quotes when loading packages
Loading your own (hypothetical) data
You have just opened R. How do you check which objects are loaded in your workspace?
ls()
Now you want to load your data file, myFile.csv. Assume it’s located in a particular folder in your laptop: /Users/yourName/Documents/files/myFile.csv. Which command(s) could you use to load the file and assign it to the variable myData?
We will use the danish data set, which comes with the languageR package (click here for more info). Load the data and assign danish to a new (shorter) variable, dan.
library(languageR)
data(danish)
dan = danish
This question has three parts. Normally, the first thing you want to do when you load your data file is to have a general sense of its structure and dimensions (so you know the file is correct, for example). How do you: (a) visualize the first 10 rows in your data? (b) check the class of each variable? (c) print basic stats for all variables?
head(dan, n = 10)
str(dan)
summary(dan)
A couple of things: (a) How do you print the number of columns dan has? (b) How do you print the names of all the columns? Finally, (c) create a subset that only contains the following columns: Subject, Word, LogRT, Sex, LogWordFreq, LogUP. Assign this subset to new and visualize the first rows of new.
Note that we have a column for word frequency (LogWordFreq), which has been log-transformed. To backtransform it, we can take the exponential of LogWordFreq using the exp() function. Create a new column called WordFreq that backtransforms LogWordFreq.
new$WordFreq = exp(new$LogWordFreq)
head(new)
## Subject Word LogRT Sex LogWordFreq LogUP WordFreq
## 1 2s14 appetitlig 6.454239 M 2.944439 5.32301 19
## 2 2s17 appetitlig 6.842854 M 2.944439 5.32301 19
## 3 2s15 appetitlig 6.839958 M 2.944439 5.32301 19
## 4 2s04 appetitlig 6.834507 M 2.944439 5.32301 19
## 5 2s06 appetitlig 6.795191 F 2.944439 5.32301 19
## 6 2s11 appetitlig 7.062680 M 2.944439 5.32301 19
One very useful function in R is ifelse(), which has three arguments. The first argument is the condition; the second, the result in case the condition is met; the third refers to what needs to be done if the condition is not met (i.e., the else bit).
For example:
x = 10
ifelse(x > 5, "x is greater than 5", "x is not greater than 5")
## [1] "x is greater than 5"
Because x == 10, the first argument evaluates to TRUE. As a result, the second argument is printed (the third argument is not evaluated in this case).
Now, create a new column in new called isFreq. This column will have two levels: yes if the log-transformed frequency of a word is greater than 5, and no otherwise. Note that R will likely think your new column is a character, not a factor. So you also need to transform it (you can actually do it all at once by embedding functions).
new$isFreq = as.factor(
ifelse(new$LogWordFreq > 5, "yes", "no")
)
head(new)
## Subject Word LogRT Sex LogWordFreq LogUP WordFreq isFreq
## 1 2s14 appetitlig 6.454239 M 2.944439 5.32301 19 no
## 2 2s17 appetitlig 6.842854 M 2.944439 5.32301 19 no
## 3 2s15 appetitlig 6.839958 M 2.944439 5.32301 19 no
## 4 2s04 appetitlig 6.834507 M 2.944439 5.32301 19 no
## 5 2s06 appetitlig 6.795191 F 2.944439 5.32301 19 no
## 6 2s11 appetitlig 7.062680 M 2.944439 5.32301 19 no
# Now let's see how many `yes` and `no` we have
summary(new$isFreq)
## no yes
## 1781 1545