# Addition
2 + 2
## [1] 4
# Subtraction
3 - 2
## [1] 1
# Multiplication
3 * 2
## [1] 6
# Division
3 / 2
## [1] 1.5
# Exponentiation
2 ^ 5
## [1] 32
# Order of operations
2 + 3 * 4
## [1] 14
(2 + 3) * 4
## [1] 20
In R, instead of using mathematical operators like this, we will primarily use “functions” that allow us to perform various tasks. Each function takes specific arguments. Arguments are the inputs to the function, i.e., the objects on which the function operates. Some of these arguments may be required to be explicitly specified. If a function requires multiple arguments, the arguments are separated by commas.
Functions are a way to package up and reuse code.
The function below is called “add_two” and it adds two to any number you give it.
add_two <- function(x) {
x + 2
}
Now we can use the function we just created.
add_two(3)
## [1] 5
Other functions are built into R. For example, the “log” function computes the natural logarithm.
log(10)
## [1] 2.302585
sqrt(4)
## [1] 2
abs(-2)
## [1] 2
You can also use functions inside other functions.
log(sqrt(4))
## [1] 0.6931472
A variable in a computer’s memory can be any object that is defined. We can give it any name and value we want. The computer stores the values we assign to variables in memory, and later, we can access the values within that variable.
In R, we assign variables using the <- operator.
# this code will not produce any output but will assign the value 100 to the variable 'chomsky'
chomsky <- (2*5)^2
# if we want to see the value of the variable, we can just type the name of the variable or print it to the console
chomsky
## [1] 100
print(chomsky)
## [1] 100
# we can use variables in operations
chomsky + 1
## [1] 101
burhan <- sqrt(16)
burhan + chomsky
## [1] 104
burhan * chomsky
## [1] 400
Using the <, >, <=, >=, ==, !=, |, and & operators, we can perform comparisons between two variables. As a result, these operators will give us either TRUE, meaning the comparison is true, or FALSE, meaning the comparison is false.
chomsky < 105 # smaller than
## [1] TRUE
chomsky > 1 # bigger than
## [1] TRUE
chomsky <= 8 # smaller than or equal to
## [1] FALSE
chomsky >= 8 # bigger than or equal to
## [1] TRUE
chomsky == 8 # equal to
## [1] FALSE
chomsky != 6 # not equal to
## [1] TRUE
chomsky == 4 | 8 # either 4 or 8
## [1] TRUE
chomsky == 4 & 8 # both 4 and 8
## [1] FALSE
Note: You can always get help about a specific function or operator by using the help() command.
help(log)
help("+")
In R, values can have different types. The main data types include integer, double (for real numbers), character, and logical. You can use the typeof() function to determine the data type of a variable.
Here’s an example:
var <- as.integer(2)
var2 <- 2.2
var3 <- "hey learning R is cool"
var4 <- TRUE
typeof(var)
## [1] "integer"
typeof(var2)
## [1] "double"
typeof(var3)
## [1] "character"
typeof(var4)
## [1] "logical"
A vector is a collection of values of the same type. We can create a vector using the c() function. The c() function takes any number of arguments and combines them into a vector.
# create a vector of numbers
numbers <- c(1, 2, 3, 4, 5)
print(numbers)
## [1] 1 2 3 4 5
# use length() to get the length of a vector
length(numbers)
## [1] 5
# consecutive numbers can be created using the : operator
5:90
## [1] 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
## [26] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## [51] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
## [76] 80 81 82 83 84 85 86 87 88 89 90
# or use seq() to create a sequence of numbers
seq(5, 90, by = 2)
## [1] 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53
## [26] 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89
# use rep() and seq() to create a vector of repeated numbers
rep(seq(1,10,3),5)
## [1] 1 4 7 10 1 4 7 10 1 4 7 10 1 4 7 10 1 4 7 10
Some functions that you can use with numeric vectors:
# sum() adds up all the numbers in a vector
sum(numbers)
## [1] 15
# mean() computes the mean of all the numbers in a vector
mean(numbers)
## [1] 3
# max() and min() return the maximum and minimum values in a vector
max(numbers)
## [1] 5
min(numbers)
## [1] 1
# sort() sorts the numbers in a vector in ascending order
sort(numbers)
## [1] 1 2 3 4 5
# you can also sort in descending order
sort(numbers, decreasing = TRUE)
## [1] 5 4 3 2 1
# sd() computes the standard deviation of the numbers in a vector
sd(numbers)
## [1] 1.581139
# median() computes the median of the numbers in a vector
median(numbers)
## [1] 3
# you can add two vectors together
numbers + c(1, 2, 3, 4, 5)
## [1] 2 4 6 8 10
# you can multiply two vectors together
numbers * c(1, 2, 3, 4, 5)
## [1] 1 4 9 16 25
# you can access the elements of a vector using the [] operator
new_vector <- 7:21
new_vector[1]
## [1] 7
new_vector[2:7]
## [1] 8 9 10 11 12 13
new_vector[c(1, 3, 5, 7)]
## [1] 7 9 11 13
new_vector[-1]
## [1] 8 9 10 11 12 13 14 15 16 17 18 19 20 21
new_vector[-(1:3)]
## [1] 10 11 12 13 14 15 16 17 18 19 20 21
Logical vectors are vectors that contain TRUE and FALSE values. You can create logical vectors using the c() function.
# create a logical vector
logical_vector <- c(TRUE, FALSE, TRUE, FALSE, TRUE)
# operators like <, >, <=, >=, ==, !=, |, and & can be used to create logical vectors
new_vector <- 1:8
new_vector < 3
## [1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
new_vector == 7
## [1] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
new_vector != 0
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
# you can use logical vectors to filter other vectors
new_vector[new_vector < 3] # returns all values in new_vector that are smaller than 3
## [1] 1 2
new_vector[new_vector == 7] # returns all values in new_vector that are equal to 7
## [1] 7
Character vectors are vectors that contain strings. You can create character vectors using the c() function.
# create a character vector
character_vector <- c("hello", "learning", "R", "is", "cool")
print(character_vector)
## [1] "hello" "learning" "R" "is" "cool"
# you can use the nchar() function to get the number of characters in each string
nchar(character_vector)
## [1] 5 8 1 2 4
# you can use the paste() function to concatenate strings
paste("hello", "learning", "R", "is", "cool")
## [1] "hello learning R is cool"
# you can use the strsplit() function to split a string into a vector of substrings
strsplit("hello learning R is cool", " ")
## [[1]]
## [1] "hello" "learning" "R" "is" "cool"
Data frames are used to store tabular data. You can create a data frame using the data.frame() function.
# create a data frame
df <- data.frame(
name = c("Burhan", "Chomsky", "Kant", "Hume", "İrem"),
age = c(55, 95, 67, 89, 24),
height = c(1.78, 1.65, 1.90, 1.45, 1.67)
)
print(df)
## name age height
## 1 Burhan 55 1.78
## 2 Chomsky 95 1.65
## 3 Kant 67 1.90
## 4 Hume 89 1.45
## 5 İrem 24 1.67
# you can use the str() function to get information about the structure of a data frame
str(df)
## 'data.frame': 5 obs. of 3 variables:
## $ name : chr "Burhan" "Chomsky" "Kant" "Hume" ...
## $ age : num 55 95 67 89 24
## $ height: num 1.78 1.65 1.9 1.45 1.67
# you can use the summary() function to get summary statistics about a data frame
summary(df)
## name age height
## Length:5 Min. :24 Min. :1.45
## Class :character 1st Qu.:55 1st Qu.:1.65
## Mode :character Median :67 Median :1.67
## Mean :66 Mean :1.69
## 3rd Qu.:89 3rd Qu.:1.78
## Max. :95 Max. :1.90
# you can use the $ operator to access a column in a data frame
df$name
## [1] "Burhan" "Chomsky" "Kant" "Hume" "İrem"
# you can use the [] operator to access a column in a data frame
df["name"]
## name
## 1 Burhan
## 2 Chomsky
## 3 Kant
## 4 Hume
## 5 İrem
# you can use the plot() function to create a scatter plot
plot(df$age, df$height)
# you can use the hist() function to create a histogram
hist(df$age)
# you can use the boxplot() function to create a boxplot
boxplot(df$age)
# you can use the barplot() function to create a barplot
barplot(df$age)
We will learn later how to create more advanced visualizations using the ggplot2 package.
Today is Monday. What day of the week will it be 9, 54, 306, and 8999 days from now?
Note: Create a character vector containing the days of the week and repeat this vector 9000 times. Then, use indexing to find the desired day. Hint: Write the days of the week in the character vector starting from Tuesday.
days <- c("Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday", "Monday")
# you complete...
Create a vector containing the numbers 1 to 100. Then, find the sum of the numbers that are divisible by 3 or 5.
Tip: Use the %% operator to find the remainder of a division.
# answer:
numbers <- 1:100
# you complete...
You are taking measurements every 5 days throughout the year. Create a number sequence that shows on which days you take measurements and assign it to a variable named “measurement_days” The result should look like this: 5, 10, 15, 20… 365.
# answer:
# you complete...
Here are some exercise questions we covered in this PS. Answers to the bonus questions will be shared in a few days:
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.7
## ✔ tidyr 1.1.3 ✔ stringr 1.4.0
## ✔ readr 1.4.0 ✔ forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.2
## Warning: package 'tibble' was built under R version 4.1.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
df <- starwars
df <- na.omit(df)
#str(df)
#summary(df)
#View(df)
df <- df %>% mutate(height = height / 100)
df <- df %>% mutate(BMI = mass / height^2)
head(df)
## # A tibble: 6 × 15
## name height mass hair_color skin_color eye_color birth_year sex gender
## <chr> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 Luke Sky… 1.72 77 blond fair blue 19 male mascu…
## 2 Darth Va… 2.02 136 none white yellow 41.9 male mascu…
## 3 Leia Org… 1.5 49 brown light brown 19 fema… femin…
## 4 Owen Lars 1.78 120 brown, gr… light blue 52 male mascu…
## 5 Beru Whi… 1.65 75 brown light blue 47 fema… femin…
## 6 Biggs Da… 1.83 84 black light brown 24 male mascu…
## # … with 6 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>, BMI <dbl>
humans <- df %>% filter(species == "Human")
mean(humans$height)
## [1] 1.78
summary1 <- df %>%
group_by(species) %>%
summarize(bmi = mean(BMI))
head(summary1)
## # A tibble: 6 × 2
## species bmi
## <chr> <dbl>
## 1 Cerean 20.9
## 2 Ewok 25.8
## 3 Gungan 17.2
## 4 Human 25.3
## 5 Kel Dor 22.6
## 6 Mirialan 18.8
p1 <- summary1 %>%
ggplot(aes(x=species, y=bmi, fill=species)) +
geom_bar(stat="identity") +
coord_flip() +
ggtitle("Relationship bw Species and BMI")
#ggsave("myfirstplot.png",p1)
p1
#install.packages("ggrepel")
library(ggrepel)
df %>% ggplot(aes(height,mass,label=name, color=sex)) +
geom_text_repel()
## Warning: ggrepel: 18 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps
Bonus 1. How many starwars movies are there? How many movies has Luke Skywalker appeared in? Are there more Droids or more humans in the movies?
Bonus 2. Is there a correlation between blond hair and number of movie occurence of a character? Find out and visualize this?
Please consider this assignment a type of translation task. Translate each of the statements below from English to R. Each instruction should correspond to 1-3 lines of code (usually one line). Please note that there may be multiple, equally valid solutions to each instruction.
Create a new .r file and copy&paste the questions to your file. Write your answers under the questions. After you finish the assignment, please click on “Session”->“Restart R” in your R Studio and run the code again, to make sure it executes properly in exactly the order in which you have written it. (The most common error is that people don’t load packages later than they should be loaded, or leave install.packages() calls in this R code.)
Q1: Using the {r}c()
function,
the “:” operator, and the {r}seq()
function, create a
vector of numbers [11, 12, 13, 14, 15, 16, 17, 18, 19, 20] in three
different ways and assign it to a variable (1
point).
Q2: Create a vector containing 50 random
numbers with a normal (Gaussian) distribution, mean 20 and standard
deviation 2. You can do this with the rnorm() function. Then assigns the
numbers to a variable and use that variable as an argument to the
sample() function to randomly select 10 samples from that vector. Run
{r}?rnorm()
{r}?sample()
to see how the
functions work and what arguments they take. (2
points).
Q3: Download and load “LearnBayes” package and take a look at the first few columns of the data set called “studentdata”. Answer the following questions: (3 points).
3.1. Remove rows that include NA observations.
3.2. Get the number of female students.
3.3. Number of students who are taller than 180 cm (tip: the height is given in inches. please first turn them to cm by multiplying the observations with 2.54)
3.4. Plot the relationship between height and sex in a line graph.
Q4: Download and load “languageR” package and take a look at the first few columns of the data set called “lexdec”. Here is the definition of the columns:
Answer the following questions: (4 points).
4.1. Use the function help() to look up the documentation
4.2. How many unique participants are there?
4.3. What is the mean, min, and max reaction time?
4.4. Load the package dplyr, and compute the average value of the column RT (reaction time) and percentage of correct answers (from “Correct”), by participants’ native language (look up the actual name of the columns in the dataset documentation).
4.5. Load the package ggplot2 and create a line graph with “geom_smooth” with RT in the y-axis, frequency in the x-axis and color-code by participants’ native language to visualize the relationship between reaction time and frequency.
GOOD LUCK! c: