PS 1 (Oct 20, 2023)

Using R as a calculator

# Addition
2 + 2
## [1] 4
# Subtraction
3 - 2
## [1] 1
# Multiplication
3 * 2
## [1] 6
# Division
3 / 2
## [1] 1.5
# Exponentiation
2 ^ 5
## [1] 32
# Order of operations
2 + 3 * 4
## [1] 14
(2 + 3) * 4
## [1] 20

Functions in R

In R, instead of using mathematical operators like this, we will primarily use “functions” that allow us to perform various tasks. Each function takes specific arguments. Arguments are the inputs to the function, i.e., the objects on which the function operates. Some of these arguments may be required to be explicitly specified. If a function requires multiple arguments, the arguments are separated by commas.

Functions are a way to package up and reuse code.

The function below is called “add_two” and it adds two to any number you give it.

add_two <- function(x) {
  x + 2
}

Now we can use the function we just created.

add_two(3)
## [1] 5

Other functions are built into R. For example, the “log” function computes the natural logarithm.

log(10)
## [1] 2.302585
sqrt(4)
## [1] 2
abs(-2)
## [1] 2

You can also use functions inside other functions.

log(sqrt(4))
## [1] 0.6931472

Variables in R

A variable in a computer’s memory can be any object that is defined. We can give it any name and value we want. The computer stores the values we assign to variables in memory, and later, we can access the values within that variable.

In R, we assign variables using the <- operator.

# this code will not produce any output but will assign the value 100 to the variable 'chomsky'
chomsky <- (2*5)^2

# if we want to see the value of the variable, we can just type the name of the variable or print it to the console
chomsky
## [1] 100
print(chomsky)
## [1] 100

Operations with variables

# we can use variables in operations
chomsky + 1
## [1] 101
burhan <- sqrt(16)

burhan + chomsky
## [1] 104
burhan * chomsky
## [1] 400

Logical operators

Using the <, >, <=, >=, ==, !=, |, and & operators, we can perform comparisons between two variables. As a result, these operators will give us either TRUE, meaning the comparison is true, or FALSE, meaning the comparison is false.

chomsky < 105 # smaller than
## [1] TRUE
chomsky > 1 # bigger than
## [1] TRUE
chomsky <= 8 # smaller than or equal to
## [1] FALSE
chomsky >= 8 # bigger than or equal to
## [1] TRUE
chomsky == 8 # equal to
## [1] FALSE
chomsky != 6 # not equal to
## [1] TRUE
chomsky == 4 | 8 # either 4 or 8
## [1] TRUE
chomsky == 4 & 8 # both 4 and 8
## [1] FALSE

Note: You can always get help about a specific function or operator by using the help() command.

help(log)

help("+")

Data types in R

In R, values can have different types. The main data types include integer, double (for real numbers), character, and logical. You can use the typeof() function to determine the data type of a variable.

Here’s an example:

var <- as.integer(2)
var2 <- 2.2
var3 <- "hey learning R is cool"
var4 <- TRUE

typeof(var)
## [1] "integer"
typeof(var2)
## [1] "double"
typeof(var3)
## [1] "character"
typeof(var4)
## [1] "logical"

Vectors

Numeric vectors

A vector is a collection of values of the same type. We can create a vector using the c() function. The c() function takes any number of arguments and combines them into a vector.

# create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)

print(numbers)
## [1] 1 2 3 4 5
# use length() to get the length of a vector
length(numbers)
## [1] 5
# consecutive numbers can be created using the : operator
5:90
##  [1]  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [26] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [51] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
## [76] 80 81 82 83 84 85 86 87 88 89 90
# or use seq() to create a sequence of numbers
seq(5, 90, by = 2)
##  [1]  5  7  9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53
## [26] 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89
# use rep() and seq() to create a vector of repeated numbers
rep(seq(1,10,3),5)
##  [1]  1  4  7 10  1  4  7 10  1  4  7 10  1  4  7 10  1  4  7 10

Some functions that you can use with numeric vectors:

# sum() adds up all the numbers in a vector
sum(numbers)
## [1] 15
# mean() computes the mean of all the numbers in a vector
mean(numbers)
## [1] 3
# max() and min() return the maximum and minimum values in a vector
max(numbers)
## [1] 5
min(numbers)
## [1] 1
# sort() sorts the numbers in a vector in ascending order
sort(numbers)
## [1] 1 2 3 4 5
# you can also sort in descending order
sort(numbers, decreasing = TRUE)
## [1] 5 4 3 2 1
# sd() computes the standard deviation of the numbers in a vector
sd(numbers)
## [1] 1.581139
# median() computes the median of the numbers in a vector
median(numbers)
## [1] 3

Operations with vectors:

# you can add two vectors together
numbers + c(1, 2, 3, 4, 5)
## [1]  2  4  6  8 10
# you can multiply two vectors together
numbers * c(1, 2, 3, 4, 5)
## [1]  1  4  9 16 25

Indexing vectors:

# you can access the elements of a vector using the [] operator
new_vector <- 7:21

new_vector[1]
## [1] 7
new_vector[2:7]
## [1]  8  9 10 11 12 13
new_vector[c(1, 3, 5, 7)]
## [1]  7  9 11 13
new_vector[-1]
##  [1]  8  9 10 11 12 13 14 15 16 17 18 19 20 21
new_vector[-(1:3)]
##  [1] 10 11 12 13 14 15 16 17 18 19 20 21

Logical vectors

Logical vectors are vectors that contain TRUE and FALSE values. You can create logical vectors using the c() function.

# create a logical vector
logical_vector <- c(TRUE, FALSE, TRUE, FALSE, TRUE)

# operators like <, >, <=, >=, ==, !=, |, and & can be used to create logical vectors
new_vector <- 1:8

new_vector < 3
## [1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE
new_vector == 7
## [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
new_vector != 0
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

Filtering vectors:

# you can use logical vectors to filter other vectors

new_vector[new_vector < 3] # returns all values in new_vector that are smaller than 3
## [1] 1 2
new_vector[new_vector == 7] # returns all values in new_vector that are equal to 7
## [1] 7

Character vectors

Character vectors are vectors that contain strings. You can create character vectors using the c() function.

# create a character vector
character_vector <- c("hello", "learning", "R", "is", "cool")
print(character_vector)
## [1] "hello"    "learning" "R"        "is"       "cool"
# you can use the nchar() function to get the number of characters in each string
nchar(character_vector)
## [1] 5 8 1 2 4
# you can use the paste() function to concatenate strings
paste("hello", "learning", "R", "is", "cool")
## [1] "hello learning R is cool"
# you can use the strsplit() function to split a string into a vector of substrings
strsplit("hello learning R is cool", " ")
## [[1]]
## [1] "hello"    "learning" "R"        "is"       "cool"

Data frames

Data frames are used to store tabular data. You can create a data frame using the data.frame() function.

# create a data frame
df <- data.frame(
  name = c("Burhan", "Chomsky", "Kant", "Hume", "İrem"),
  age = c(55, 95, 67, 89, 24),
  height = c(1.78, 1.65, 1.90, 1.45, 1.67)
)

print(df)
##      name age height
## 1  Burhan  55   1.78
## 2 Chomsky  95   1.65
## 3    Kant  67   1.90
## 4    Hume  89   1.45
## 5    İrem  24   1.67
# you can use the str() function to get information about the structure of a data frame
str(df)
## 'data.frame':    5 obs. of  3 variables:
##  $ name  : chr  "Burhan" "Chomsky" "Kant" "Hume" ...
##  $ age   : num  55 95 67 89 24
##  $ height: num  1.78 1.65 1.9 1.45 1.67
# you can use the summary() function to get summary statistics about a data frame
summary(df)
##      name                age         height    
##  Length:5           Min.   :24   Min.   :1.45  
##  Class :character   1st Qu.:55   1st Qu.:1.65  
##  Mode  :character   Median :67   Median :1.67  
##                     Mean   :66   Mean   :1.69  
##                     3rd Qu.:89   3rd Qu.:1.78  
##                     Max.   :95   Max.   :1.90
# you can use the $ operator to access a column in a data frame
df$name
## [1] "Burhan"  "Chomsky" "Kant"    "Hume"    "İrem"
# you can use the [] operator to access a column in a data frame
df["name"]
##      name
## 1  Burhan
## 2 Chomsky
## 3    Kant
## 4    Hume
## 5    İrem

Quick visualization:

# you can use the plot() function to create a scatter plot
plot(df$age, df$height)

# you can use the hist() function to create a histogram
hist(df$age)

# you can use the boxplot() function to create a boxplot
boxplot(df$age)

# you can use the barplot() function to create a barplot
barplot(df$age)

We will learn later how to create more advanced visualizations using the ggplot2 package.

Exercises

Exercise 1

Today is Monday. What day of the week will it be 9, 54, 306, and 8999 days from now?

Note: Create a character vector containing the days of the week and repeat this vector 9000 times. Then, use indexing to find the desired day. Hint: Write the days of the week in the character vector starting from Tuesday.

days <- c("Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday", "Monday")

# you complete...

Exercise 2

Create a vector containing the numbers 1 to 100. Then, find the sum of the numbers that are divisible by 3 or 5.

Tip: Use the %% operator to find the remainder of a division.

# answer:
numbers <- 1:100

# you complete...

Exercise 3

You are taking measurements every 5 days throughout the year. Create a number sequence that shows on which days you take measurements and assign it to a variable named “measurement_days” The result should look like this: 5, 10, 15, 20… 365.

# answer:


# you complete...

PS 2 (Nov 10, 2023)

Here are some exercise questions we covered in this PS. Answers to the bonus questions will be shared in a few days:

  1. Save the ready-made “starwars” data in a data frame named “df”. Remove the rows with NA values. View the data frame using the View() function. Learn about data using different functions.
library(tidyverse)
df <- starwars
df <- na.omit(df)
#str(df)
#summary(df)
#View(df)
  1. Create a new column named “BMI” (body mass index) and enter the BMI values you calculated from the existing mass and height information. Tip: You can learn the formula to calculate BMI from Google.
df <- df %>% mutate(height = height / 100)
df <- df %>% mutate(BMI = mass / height^2)
head(df)
## # A tibble: 6 × 15
##   name      height  mass hair_color skin_color eye_color birth_year sex   gender
##   <chr>      <dbl> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> <chr> 
## 1 Luke Sky…   1.72    77 blond      fair       blue            19   male  mascu…
## 2 Darth Va…   2.02   136 none       white      yellow          41.9 male  mascu…
## 3 Leia Org…   1.5     49 brown      light      brown           19   fema… femin…
## 4 Owen Lars   1.78   120 brown, gr… light      blue            52   male  mascu…
## 5 Beru Whi…   1.65    75 brown      light      blue            47   fema… femin…
## 6 Biggs Da…   1.83    84 black      light      brown           24   male  mascu…
## # … with 6 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>, BMI <dbl>
  1. Get the average height of people using tidyverse. Comment below how many you found.
humans <- df %>% filter(species == "Human")
mean(humans$height)
## [1] 1.78
  1. Using group_by and summarize() functions, look at the relationship between species and BMI (body mass index). Save this result in a vector named summary1. Which species is the tallest?
summary1 <- df %>% 
  group_by(species) %>% 
  summarize(bmi = mean(BMI))

head(summary1)
## # A tibble: 6 × 2
##   species    bmi
##   <chr>    <dbl>
## 1 Cerean    20.9
## 2 Ewok      25.8
## 3 Gungan    17.2
## 4 Human     25.3
## 5 Kel Dor   22.6
## 6 Mirialan  18.8
  1. Create a bar plot with ggplot2. Use summary1 as the dataframe. Plot species on the x-axis, BMI on the y-axis. Give axis name and main title. Save the chart as a .png file.
p1 <- summary1 %>% 
  ggplot(aes(x=species, y=bmi, fill=species)) +
  geom_bar(stat="identity") +
  coord_flip() +
  ggtitle("Relationship bw Species and BMI")
  #ggsave("myfirstplot.png",p1)
p1

  1. Create geom_text() plot with ggplot2. Do not use dots or lines. Enter height on the x-axis and mass on the y-axis. Color-code gender and character names as text (label). Give axis name and main title.
#install.packages("ggrepel")
library(ggrepel)
df %>% ggplot(aes(height,mass,label=name, color=sex)) +
geom_text_repel() 

Bonus 1. How many Star Wars movies are there? How many movies has Luke Skywalker appeared in? Are there more Droids or more humans in the movies?

Bonus 2. Is there a correlation between blond hair and number of movie occurence of a character? Find out and visualize this?

PS 3 (Nov 17 2023)

In this PS, we will work with some real data, yay! We’ll analyze the data from a demo experiment I carried out with Mehtap Güven Çoban and Nazik Dinçtopal Deniz last summer. Here is the details you need to know about the experiment:

This experiment tests the predictions of good-enough parsing (Ferreira et al., 2007) for individuals with ASD. There will also be typically developing controls. This approach to language processing highlights the tendency to rely on real world knowledge and semantic cues (e.g., thematic roles such as agent and patient) rather than syntactic detail and to form (potentially inacurrate) superficial representations.

Materials: For this experiment, we translated and adapted the experimental task materials in Ferreira’s (2003) study into Turkish. The experimental sentences manipulate syntax (i.e., sentence structure) as active voice as in (a,c) or passive voice as in (b,d) and semantic plausibility as plausible as in (a,b) or implausible as in (c,d).

a. Active, plausible: Köpekler adamı ısırdı.

b. Passive, plausible: Adamlar köpek tarafından ısırıldı.

c. Active, implausible: Adamlar köpeği ısırdı.

d. Passive, implausible: Köpekler adam tarafından ısırıldı.

e. Question: Kim ısırdı? e’. Question: Kim ısırıldı?

Each condition is followed by a question asking for either the agent of the predicate as in (e), or the patient as in (e’). There are 24 sentences each in the four experimental and two question conditions above, totaling the number of conditions to eight. The experimental sentences are distributed to eight lists controlling for syntax (active, passive), semantic plausibility (plausible, implausible) and question type (agent, patient). This way each participant sees only one version or for each item. In addition to the 24 experimental item sets there are two other types of sentence sets. The experimental set (Set 1) includes sentences as in (a-d) which are highly biased towards one interpretation (e.g., the dog will do the action of biting). The other sets serve as controls for attention to semantic plausability. Set 2 involves sentences in which the patient is inanimate and thus, reversing the arguments result in a semantic anomaly (cf., the chef wore the apron versus the apron wore the chef). Set 3 includes symmetrical sentences in which the two arguments are equally likely to be the agents (cf., the boy kissed the girl versus the girl kissed the boy).

Here are more items. You can see the whole list of items here.

group item sentence word_order set structure plausible X
1 1 Köpekler adamı ısırdı. nonreversed biased active yes NA
2 2 Adamlar köpeği ısırdı. reversed biased active no NA
3 3 Adamlar köpek tarafından ısırıldı. nonreversed biased passive yes NA
4 4 Adamlar köpek tarafından ısırıldı. reversed biased passive no NA
1 1 Aşçılar yemeği mahvetti. nonreversed biased active yes NA
2 2 Yemekler aşçıyı mahvetti. reversed biased active no NA
3 3 Yemekler aşçı tarafından mahvedildi. nonreversed biased passive yes NA
4 4 Aşçılar yemek tarafından mahvedildi. reversed biased passive no NA
1 1 Kuşlar solucanı yedi. nonreversed biased active yes NA
2 2 Solucanlar kuşu yedi. reversed biased active no NA
3 3 Solucanlar kuş tarafından yendi. nonreversed biased passive yes NA
4 4 Kuşlar solucan tarafından yendi. reversed biased passive no NA
1 1 Kediler fareyi kovaladı. nonreversed biased active yes NA
2 2 Fareler kediyi kovaladı. reversed biased active no NA
3 3 Fareler kedi tarafından kovalandı. nonreversed biased passive yes NA
4 4 Kediler fare tarafından kovaladı. reversed biased passive no NA
1 1 Askerler tutsağı yakaladı. nonreversed biased active yes NA
2 2 Tutsaklar askeri yakaladı. reversed biased active no NA
3 3 Tutsaklar asker tarafından yakalandı. nonreversed biased passive yes NA
4 4 Askerler tutsak tarafından yakalandı. reversed biased passive no NA

To this end, we had a pilot study only with Turkish speakers. In total, 16 native Turkish speakers completed the pilot experiment. We expected that participants would have higher accuracy rates when items are not reversed but accuracy rates will drop when arguments are reversed (when syntax and semantics clash like in (c and d), participants will do good-enough processing and make an error due to their real-world knowledge). In addition to implausability, noncanonical sentence structure along with implausability can decrease accuracy.

We look at accuracy and reaction time.

Here is what our results look like after some preprocessing:

Item Group EventTime WordOrder Set Structure Correct Plausible
69 1 1.655236e+12 nonreversed symmetrical active incorrect yes
33 1 1.655236e+12 nonreversed irreversible active correct yes
30 1 1.655236e+12 reversed irreversible active correct no
57 1 1.655236e+12 nonreversed symmetrical active incorrect yes
39 1 1.655236e+12 nonreversed irreversible passive incorrect yes
19 1 1.655236e+12 nonreversed biased passive correct yes

Our task now is to analyze data. Create a summary table with accuracies and interpret the results. Also visualize the data. You’re totally free on how to do all that.

You can access the csv results by clicking here.

PS 4 (Nov 27 2023)

In this PS, we will fit a linear model to the lexdec dataset described in Assignment 1.

Follow the following steps to complete today’s task:

  • Make a plot with RT plotted against Length and use facet_wrap for participants’ native languages.

  • Fit a model with RT as the response variable and Length as the predictor. What is the interpretation of the effect of Length on RT?

  • Fit another linear regression model with an additional predictor (Native Language).

  • Report the r2 values of both models. Which is a better fit?

  • Visualize the second model. You can use the library jtools’s plot_summs() function to do this. Also try out sjPlot’s tab_model() function to get a table.

PS 5 (Dec 08 2023)

In this PS, we will work on a new dataset from our earlier project:

Keleş, O., Atmaca F., & Gökgöz, K. (September, 2022). Economy principle in signed narratives: Is it age-sensitive? (Poster presentation). The 14th Theoretical Issues in Sign Language Research (TISLR): Osaka, Japan.

To learn about our study, you can visit this page: https://github.com/kelesonur/tislr14/blob/main/TISLR_14_poster.pdf

Follow the following steps to complete today’s task:

  • Import the data from the following link: http://kelesonur.github.io/ling411/TISLR_Data.csv (Tip: you can use the read.csv() function to do this and set the sep parameter to ; to read the data properly).

  • Use group_by() and summarize() to get a summary table for mean Accessibility Score (AS) for each age group (Native and Late) and Discourse Status (Introduction, Maintenance and Reintroduction).

  • Visualize this summary table using ggplot2.

  • Encode the vector types.

  • Use sum contrasts for the predictors Nativeness and Discourse Status.

  • Fit a linear model with AS as the response variable and Nativeness and Discourse Status as predictors. What is the interpretation of the effect of Nativeness and Discourse Status on AS?

R Assignment 1

Please consider this assignment a type of translation task. Translate each of the statements below from English to R. Each instruction should correspond to 1-3 lines of code (usually one line). Please note that there may be multiple, equally valid solutions to each instruction.

Create a new .r file and write your answers there. After you finish the assignment, please click on “Session”->“Restart R” in your R Studio and run the code again, to make sure it executes properly in exactly the order in which you have written it. (The most common error is that people don’t load packages later than they should be loaded, or leave install.packages() calls in this R code.)

Q1: Using the c() function, the : operator, and the seq() function, create a vector of numbers [11, 12, 13, 14, 15, 16, 17, 18, 19, 20] in three different ways and assign it to a variable (1 point).

Q2: Create a vector containing 50 random numbers with a normal (Gaussian) distribution, mean 20 and standard deviation 2. You can do this with the rnorm() function. Then assigns the numbers to a variable and use that variable as an argument to the sample() function to randomly select 10 samples from that vector. Run ?rnorm() ?sample() to see how the functions work and what arguments they take. (2 points).

Q3: Download and load “LearnBayes” package and take a look at the first few columns of the data set called “studentdata”. (3 points).

Answer the following questions:

3.1. Remove rows that include NA observations.

3.2. Get the number of female students.

3.3. Number of students who are taller than 180 cm (tip: the height is given in inches. please first turn them to cm by multiplying the observations with 2.54)

3.4. Plot the relationship between height and sex in a line graph.

Q4: Download and load “languageR” package and take a look at the first few columns of the data set called “lexdec”. (4 points).

Here is the definition of the columns:

Answer the following questions:

4.1. Use the function help() to look up the documentation.

4.2. How many unique participants are there?

4.3. What is the mean, min, and max reaction time?

4.4. Load the package dplyr, and compute the average value of the column RT (reaction time) and percentage of correct answers (from “Correct”), by participants’ native language (look up the actual name of the columns in the dataset documentation).

4.5. Load the package ggplot2 and create a line graph with geom_smooth() with RT in the y-axis, frequency in the x-axis and color-code participants’ native languages to visualize the relationship between reaction time, frequency and native language.

GOOD LUCK! c:

R Assignment 2

Please consider this assignment a type of translation task. Translate each of the statements below from English to R. Each instruction should correspond to 1-3 lines of code (usually one line). Please note that there may be multiple, equally valid solutions to each instruction.

Create a new .r file and write your answers there. After you finish the assignment, please click on “Session”->“Restart R” in your R Studio and run the code again, to make sure it executes properly in exactly the order in which you have written it.

Part 1 (5 points)

In this part, you will work with the following dataset:

country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134
  • Import the data frame with the following code:
#install.packages("gapminder")
library(gapminder)
df <- gapminder
  • Scale the continuous variables.

  • Create geom_text() plot with ggplot2 only for the last 40 years of data. Remove older observations for this plot. Do not use dots or lines. Enter life expectancy on the y-axis and gdp on the x-axis. Color-code by continent, and also insert country names as text (label). Give axis name and main title. Optionally, you can use facet_wrap for years. Use a different color palette than the default one.

  • Show the population increase in Turkey between 1967 and 2007 by creating a line plot with ggplot2. What about the population increase in Europe overall between the same years? Plot it for Europe. Optionally, you can merge both plots. Hint: to merge two plots, install and load the gridExtra package and use the function grid.arrange(). You can look up the documentation of the function with ?grid.arrange().

  • Create a linear regression model with life expectancy as the dependent variable and gdp as the predictor. What is the interpretation of the effect of gdp on life expectancy? Write your interpretation as a comment.

  • Visualize the regression model with geom_smooth().

Part 2 (5 points)

In this part, you will analyze the results of the demo Negative Polarity Item (NPI) illusion experiment that I conducted with Kübra Yıldırım.

In linguistics, a Negative Polarity Item (hereafter ‘NPI’) refers to a lexical item that requires another item to be licensed within a construction. Usually, what licenses NPIs is either a sentential negation or a negative quantifier cross-linguistically (Görgülü, 2017:51). Apart from that an NPI can also be licensed by questions, comparatives, or conditionals. In this regard, it is evident that there are restrictions as to where an NPI can occur; in other words, they need a licensing environment in which another element licenses them (Kayabaşı & Özgen, 2018:83). The examples in (a-d) below show the NPIs (in italics) in different licensing environments in English;

a. John did not have any friends.

b. If you see anyone there…

c. Did you see anything?

d. She is taller than he ever imagined.

In an experiment, Kübra and I used a self-paced acceptability judgment test for Turkish NPIs (hiçkimse). Native Turkish speakers were given sentences and asked to evaluate the naturalness of them from 1 (sounds very bad) to 5 (sounds perfectly okay). The experimental sentences included NPI items in the embedded verb where the NPI is licensed by the negation in the embedded verb (licensed), by the negation in the matrix verb (long-distance licensing), and cases where there was no negation (unlicensed). The type of the nominalizer also differed (‘-DIK’ and ’ -mA’). Ratings were used as the dependent variable whereas license type and nominalizer type were the continuous predictors.

The NPI could be (i) grammatically licensed, (ii) long-distance licensed, or (iii) unlicensed. In addition, the nominalizer was either a ‘-DIK’ type or a ‘-mA’ type. You can actually view and do the demo experiment here: https://farm.pcibex.net/r/aepqUx/.

Conditions:

Your Task:

  • Load the packages dplyr, magrittr, ggplot2, and janitor (install packages that are missing, but don’t include the installation code into your R script).

  • Load the NPI illusion dataset with the following code:

    df <- read_csv("http://kelesonur.github.io/ling411/results_npi.csv")
  • Only keep the relevant columns (License Type, Nominalizer, Value, Item, Group, Participant ID).

  • Optionally, you can use the function make_clean_names() from the package ‘janitor’ to bring the column names into a more r-friendly format (e.g., underscores instead of spaces in column names). Use colnames() and the assignment pipe in doing do. (To find out how make_clean_names() works, you can use the help() function.)

  • Group by License Type and Nominalizer, and compute the mean of the Value column.

  • Check the data type of the Value column. If it is not numeric, convert it to numeric.

  • Below you see a plot. Write the code to create a plot similar to the one you see below. Optionally, you can change axis names or colors of the bars. (Hint: You can use the scale_fill_manual() function to change the colors of the bars.)

  • Scale (z-scoring) the values for the depedent variable (Value), and use sum contrasts for the two categorical predictors.

  • Assume that the acceptability judgement values are continuous. Run a linear regression model with the predictors with only Nominalizer. Then run another linear regression model, this time with two predictors: Nominalizer and Licensing Type. Which model is a better to the data? Why?

  • Use glance() or summary() to view the model results. Optionally, you can use sjPlot’s tab_model() function to get a table for the model results.