This is a long tutorial, but the material is fairly straightforward. If you run into any trouble feel free to post on Piazza.

The most crucial piece of advice for learning a programming language is to recognize it requires the same approach as learning a foreign language – you’ll benefit most from being actively engaged in learning. That means not just reading along with these tutorials, but actively processing what it says and running the code yourself.

Part 1: Installing R

Carry out the following two steps in order

1. Go to http://cran.r-project.org/ and install the version of R for your operating system.

To make sure this worked, open the program RStudio and go to File > New > R Script. This will open a blank text document. In the document, type the text given in the box below and then click and drag to highlight both lines of code and click the button marked “Run.” If everything is working correctly, the console should display TRUE.

x = 5
x == 5

Congratulations: you’ve just written your first R script! To save it, go to File > Save As, and choose a name. NOTE: Always save your scripts as .R files so they’ll open in RStudio by default.

Note that you can run one line of your script at a time by moving your cursor to that line and pressing CONTROL-ENTER or COMMAND-RETURN depending on whether you’re running Mac OSX, Linux or Windows. Another helpful shortcut is CONTROL-A (COMMAND-A on Mac) which highlights all of the lines of code in the text editor.

Part 2: The Absolute Basics

Here are some of the most fundamental things you can do with R.

Arithmetic

#add numbers
1 + 1
## [1] 2
#subtract them
8 - 4
## [1] 4
#divide
13/2
## [1] 6.5
#multiply
4*pi
## [1] 12.56637
#exponentiate
2^10
## [1] 1024

Logical Comparison

3 < 4
## [1] TRUE
3 > 4
## [1] FALSE
#contrast with 3 = 4; see section about variables below
3 == 4
## [1] FALSE
#!= means "not equal to"
3 != 4
## [1] TRUE
4 >= 5
## [1] FALSE
4 <= 5
## [1] TRUE
2 + 2 == 5
## [1] FALSE
10 - 6 == 4
## [1] TRUE

Strings (text)

Numbers are bread and butter for computers, but text is what will facilitate understanding for us mere mortals.

'Econometrics is awesome'
## [1] "Econometrics is awesome"
#R delimits strings with EITHER double or single quotes.
#  There is only a very minimal difference
"Econometrics is still awesome"
## [1] "Econometrics is still awesome"

Variables

Just like in algebra, variables are a great form of shorthand. Instead of writing 3.1415926… all the time, we can just write pi.

Assignment to a variable happens from right to left – the value on the right side gets assigned to the name on the left side. You can use nearly anything as a variable name in R. The only rules are:

1. . and _ are OK, but no other symbols.
2. Your variable name must not start with a number or _ (2squared and _one are illegal).

[A note for those of you who have programming experience: while R supports object-oriented programming, periods . do not have a special meaning in the language. For historical reasons, R programmers often use periods in place of underscores in variable names, but either works. Just be consistent to keep your code readable.]

x = 42
x / 2
## [1] 21
#if we assign something else to x,
#  the old value is deleted
x = "Melody to Funkytown!"
x
## [1] "Melody to Funkytown!"
x = 5
x == 5
## [1] TRUE
foo = 3
bar = 5
foo.bar = foo + bar
foo.bar
## [1] 8
foo.bar2 = 2 * foo.bar
foo.bar2
## [1] 16
foo_bar = foo - bar
foo_bar
## [1] -2

Note: In programmer speak, = here is an “assignment operator” – it’s the thing used to assign values to a variable name. R also has a second assignment operator that you’re bound to see sooner or later, <-. So x <- 42 and x = 42 are identical, and both accomplish the task of assigning the value of 42 to the name x. We’ll try to stick with using = since it’s easier to type and in some ways more intuitive. See this wonderful post for some more history and a very subtle difference between the two operators that you needn’t concern yourself with for now.

Vectors & Types

In R, a vector is just a (ordered) set of related things. You should basically think of it like a column in Excel.

x = c(4, 7, 9)
x
## [1] 4 7 9
y = c('a', 'b', 'c')
y
## [1] "a" "b" "c"

4, 7, and 9 are “related” because they’re all numbers; a, b and c are all letters. Having variables is becoming more convenient – instead of having to write c(4, 7, 9) all the time, we can just write x.

What happens when we try and combine things that aren’t so obviously related?

x = c(1, TRUE, "three")
x
## [1] "1"     "TRUE"  "three"

Note the quotation marks. R has converted 1 and TRUE into text representations. That’s because 1 and TRUE are different _type_s than "three". There are four basic types of variables your likely to encounter in this class, listed here in heirarchical order:

1. logical: TRUE or FALSE
2. integer: 0L, -1L, 1L, etc. A (real) number without a decimal part. Technical note: they take up less space in the computer than numbers with decimals.
3. numeric: pi, 0.34, 1.4043, etc. A real number.
4. character: "some words", "more words", etc.

Vectors are converted to the highest number on this list present – x above has "three" so the whole vector becomes a character.

Vector Arithmetic and Functions

Vectors make it easy to do many computations all at once – adding one to a list of numbers, dividing all of them by 3, etc. And as long as two vectors are the same length, we can combine them in natural ways:

x = c(1, 2, 3)
x + 4
## [1] 5 6 7
x/3
## [1] 0.3333333 0.6666667 1.0000000
-x
## [1] -1 -2 -3
x^3
## [1]  1  8 27
y = c(3, 2, 1)
x - y
## [1] -2  0  2
x * y
## [1] 3 4 3
x/y
## [1] 0.3333333 1.0000000 3.0000000
x > 2
## [1] FALSE FALSE  TRUE
x >= 2
## [1] FALSE  TRUE  TRUE

Just like in math, a function is a way of mapping input to output, and just like in most math classes, you can spot functions since they use parentheses: (). We’ve already seen the _c_oncatenate function c used (for example) to create vectors.

We can also apply any number of ubiquitous functions to our vector input. Just a small taste:

x = c(1, 2, 3)
#sum: add up the elements of a vector
sum(x)
## [1] 6
#Just like you can use the command sum to add up the
#  elements of a numeric vector, you can use
#  prod to take their product:
prod(x)
## [1] 6
sqrt(x)
## [1] 1.000000 1.414214 1.732051
y = c(-1, 2, 4)
#abs: absolute value
abs(y)
## [1] 1 2 4
#exp: exponential. exp(x) is e^x
exp(y)
## [1]  0.3678794  7.3890561 54.5981500
#log: _natural_ logarithm (base e)
log(x)
## [1] 0.0000000 0.6931472 1.0986123
#Note that these functions interpret their input
#  as *radians* rather than degrees.
sin(x) + cos(y)
## [1]  1.3817733  0.4931506 -0.5125236
max(y)
## [1] 4
min(y)
## [1] -1
range(y)
## [1] -1  4
mean(x)
## [1] 2
median(x)
## [1] 2

Another thing that we will do all the time is use regularly-spaced sequences of numbers. These are created in R with : or seq:

x = 1:10
x
##  [1]  1  2  3  4  5  6  7  8  9 10
y = 10:1
y
##  [1] 10  9  8  7  6  5  4  3  2  1
#some times the gap is not 1
z = seq(0, 1, by = .02)
z
##  [1] 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 0.26
## [15] 0.28 0.30 0.32 0.34 0.36 0.38 0.40 0.42 0.44 0.46 0.48 0.50 0.52 0.54
## [29] 0.56 0.58 0.60 0.62 0.64 0.66 0.68 0.70 0.72 0.74 0.76 0.78 0.80 0.82
## [43] 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00
#other times we care less about the gap and more
#  more about how many points we get out
w = seq(0, 1, length.out = 20)

In addition to math/arithmetic functions, there is a litany of basic programming functions that you’re likely to use all of the time:

x = 99:32
#length: how many elements (items) are there in x?
length(x)
## [1] 68
y = c("hey you!", "out there in the cold")
#what TYPE of variable does R think this is?
class(y)
## [1] "character"
#rep: repeat/reproduce
rep(y, 4)
## [1] "hey you!"              "out there in the cold" "hey you!"
## [4] "out there in the cold" "hey you!"              "out there in the cold"
## [7] "hey you!"              "out there in the cold"
#head/tail: display only the beginning/end
#  of an object -- very useful for very
#  large objects
x = 1:100000
head(x)
## [1] 1 2 3 4 5 6
tail(x)
## [1]  99995  99996  99997  99998  99999 100000

Subsetting Vectors: [

Often we want to examine only part of a vector, most commonly the part of a vector that satisfies some condition, but also looking at the first or last few elements. To do this we extract or subset those elements by using [:

x = c(5, 4, 1)
x[1]
## [1] 5
x[3]
## [1] 1
x[1:2]
## [1] 5 4
x[2:3]
## [1] 4 1

In the syntax x[something], note that something is itself a vector! So the above is all short-hand for the more complicated types of subsets:

x = 20:30
x
##  [1] 20 21 22 23 24 25 26 27 28 29 30
x[c(1, 3, 5)]
## [1] 20 22 24
x[c(5, 9)]
## [1] 24 28
x[seq(1, 10, by = 2)]
## [1] 20 22 24 26 28

Besides being an integer, something can be a logical vector of the same length as the vector itself:

x = c(5, 6, 7)
x[c(TRUE, TRUE, FALSE)]
## [1] 5 6
x[c(FALSE, TRUE, FALSE)]
## [1] 6
x[c(FALSE, FALSE, TRUE)]
## [1] 7

Most commonly we’ll do something that’s identical to the above but reads more naturally:

x = c(-1, 0, 1)
x > 0
## [1] FALSE FALSE  TRUE
x[x > 0]
## [1] 1
x[x <= 0]
## [1] -1  0

We can also replace parts of a vector by subsetting:

x = c(-1, 5, 10)
x[3] = 4
x
## [1] -1  5  4
x[x < 0] = 0

Named Vectors

It’s also often useful to name our vectors to help organize the information. Suppose we were keeping track of the ages of the Trumps:

trump_ages = c(70, 46, 38, 34, 32, 22, 9)

This is nice, but much more useful if we keep track of who each element represents:

trump_ages = c(Donald = 70, Melania = 46, Donald_Jr = 38, Ivanka = 34,
Eric = 32, Tiffany = 22, Barron = 9)
trump_ages
##    Donald   Melania Donald_Jr    Ivanka      Eric   Tiffany    Barron
##        70        46        38        34        32        22         9

We can also use the names function to assign names; this is sometimes easier, e.g., if the names have spaces:

names(trump_ages) = c("Donald", "Melania", "Donald, Jr.", "Ivanka", "Eric", "Tiffany", "Barron")
trump_ages
##      Donald     Melania Donald, Jr.      Ivanka        Eric     Tiffany
##          70          46          38          34          32          22
##      Barron
##           9

This also makes code for subsetting much easier to read, since we can subset by the names:

trump_ages["Donald"]
## Donald
##     70
trump_ages[c("Donald", "Barron")]
## Donald Barron
##     70      9

Getting Help: Documentation

If you’re unsure of how something works in R – what the arguments are to a function, how it works, etc. – your first step is to check the documentation:

?sum
?cos
?"="

Lists

We saw above that R doesn’t like vectors to have different types: c(TRUE, 1, "Frank") becomes c("TRUE", "1", "Frank"). But storing objects with different types is absolutely fundamental to data analysis.

R has a different type of object besides a vector used to store data of different types side-by-side: a list:

x = list(TRUE, 1, "Frank")
x
## [[1]]
## [1] TRUE
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] "Frank"

Note how different the output looks, as compared to using c!! The quotation marks are gone except for the last component. You can ignore the mess of [[ and [ for now, but as an intimation, consider some more complicated lists:

x = list(c(1, 2), c("a", "b"), c(TRUE, FALSE), c(5L, 6L))
x
## [[1]]
## [1] 1 2
##
## [[2]]
## [1] "a" "b"
##
## [[3]]
## [1]  TRUE FALSE
##
## [[4]]
## [1] 5 6
y = list(list(1, 2, 3), list(4:5), 6)
y
## [[1]]
## [[1]][[1]]
## [1] 1
##
## [[1]][[2]]
## [1] 2
##
## [[1]][[3]]
## [1] 3
##
##
## [[2]]
## [[2]][[1]]
## [1] 4 5
##
##
## [[3]]
## [1] 6

x is a list which has 4 components, each of which is a vector with 2 components. This gives the first hint at how R treats a dataset with many variables of different types – at core, R stores a data set in a list!

y is a nested list – it’s a list that has lists for some of its components. This is very useful for more advanced operations, but probably won’t come up for quite some time, so don’t worry if you haven’t wrapped your head around this yet.

Packages

One of the things that makes R truly exceptional is its vast library of user-contributed packages.

R comes pre-loaded with a boat-load of the most common functions / methods of analysis. But in no way is this congenital library complete.

Complementing this core of the most common operations are external packages, which are basically sets of functions designed to accomplish specific tasks.

Best of all, unlike some super-expensive programming languages, all of the thousands of packages available to R users (most importantly through CRAN, the Comprehensive R Archive Network) are completely free of charge.

The two most important things to know about packages for now is where to find them, how to install them, and how to load them.

We’ll work extensively with the data.table package, which was built for working with huge data sets.

Where to find packages

Long story short: Google. Got a particular statistical technique in mind? The best R package for this is almost always the top Google result if asked correctly.

How to install packages

Just use install.packages!

install.packages("data.table")

This will download the code from the package to your computer to a place that R understands.

We do not yet have access to the functions in the package. We have to load it first.

library(data.table)

Et voila! You’ll now have access to all of the awesome functions in the data.table package. You can also Google “tutorial data.table” (or in general “tutorial [package name]”) and you’re very likely to find a trove of sites trying to help you learn the package.

data.tables

Data sets are the lifeblood of a data lover!

As mentioned above, data sets in R basically lists where every element has the same length. In basic R, this is done with a data.frame, but it’ll be easier for a beginner to understand the syntax of a data.table, so you can forget about data.frames for now.

We can build a data.table from scratch with the data.table command. This command lets you build up a data.table from several vectors of the same length:

foo = 1:5
bar = 2 * foo
foo.bar = data.table(foo, bar)
foo.bar
##    foo bar
## 1:   1   2
## 2:   2   4
## 3:   3   6
## 4:   4   8
## 5:   5  10

In the preceding example I built a data.table with only two columns, but you can add as many as you like. Just separate them by commas:

y = -4:0
data.table(foo, bar, y)
##    foo bar  y
## 1:   1   2 -4
## 2:   2   4 -3
## 3:   3   6 -2
## 4:   4   8 -1
## 5:   5  10  0

Subsetting data

When you’re working with data, you’ll often want to look at subsets that satisfy a particular condition. First we’ll set up a simple data.table:

location = c("New York", "Chicago", "Boston", "Boston", "New York")
salary = c(70000, 80000, 60000, 50000, 45000)
title = c("Office Manager", "Research Assistant", "Analyst", "Office Manager", "Analyst")
hours = c(50, 56, 65, 40, 50)
jobsearch = data.table(location, salary, title, hours)
jobsearch
##    location salary              title hours
## 1: New York  70000     Office Manager    50
## 2:  Chicago  80000 Research Assistant    56
## 3:   Boston  60000            Analyst    65
## 4:   Boston  50000     Office Manager    40
## 5: New York  45000            Analyst    50

Now, suppose you wanted to see only the jobs in New York. You could select them as follows:

jobsearch[location == 'New York']
##    location salary          title hours
## 1: New York  70000 Office Manager    50
## 2: New York  45000        Analyst    50

Notice the use of the double equal sign. This command is testing a logical condition. If you use a single equals sign, this won’t work since = is what is used to name the arguments to a function in R. The preceding command looks at the data.table jobsearch and then the column location and checks which entries satisfy the condition that the location is "New York". Finally, the function returns only these rows of the data.table.

my.data.table$weight ## [1] 40 25 50 1 Both of the preceding methods are limited in that they only allow us to reference a single column. We can reference multiple columns as follows: my.data.table[ , c("person", "weight")] ## person weight ## 1: Linus 40 ## 2: Snoopy 25 ## 3: Lucy 50 ## 4: Woodstock 1 Since we left the part before the comma blank, this gave us all the rows. We could get the same thing by accessing these columns by position (though this is generally not recommended) my.data.table[ , 2] ## age ## 1: 5 ## 2: 8 ## 3: 6 ## 4: 2 my.data.table[ , c(1,2)] ## person age ## 1: Linus 5 ## 2: Snoopy 8 ## 3: Lucy 6 ## 4: Woodstock 2 my.data.table[ , 1:2] ## person age ## 1: Linus 5 ## 2: Snoopy 8 ## 3: Lucy 6 ## 4: Woodstock 2 In some cases it’s easier to access columns of a data.table by name and in others it’s easier to access them by position. Part 3: Exercises If you can’t get R and RStudio to work on your computer, you can do the exercises on the R Fiddle website http://www.r-fiddle.org/#/ 1. Calculate how many minutes there are in a January 60 * 24 * 7 * 31 ## [1] 312480 1. Add up the numbers 3 1 4 1 5 9 2 6 without using any plus signs sum(c(3,1,4,1,5,9,2,6)) ## [1] 31 1. Load the help file for the function summary, and use summary on an object. ?summary summary(iris) ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 ## 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 ## Median :5.800 Median :3.000 Median :4.350 Median :1.300 ## Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199 ## 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800 ## Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500 ## Species ## setosa :50 ## versicolor:50 ## virginica :50 ## ## ##  1. Suppose I ran the following R commands in order. What result would I get after the fourth command? Do not use R to answer this: think it through and then check your answer. • x = 5 • y = 7 • z = x + y • z + 3 == 15 x = 5 y = 7 z = x + y z + 3 == 15 ## [1] TRUE 1. How can I get R to print out "Go Penn!" thirty times without repeatedly typing this by hand? rep("Go Penn", times = 30) ## [1] "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" ## [8] "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" ## [15] "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" ## [22] "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" "Go Penn" ## [29] "Go Penn" "Go Penn" 1. Create a vector called x containing the sequence -1, -0.9, … 0, 0.1, …, 0.9, 1 and then display the result x = seq(-1, 1, 0.1) x ## [1] -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 ## [15] 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1. Create two vectors: wizards and ranking. The vector wizards should contain the following names: Harry, Ron, Fred, George, Sirius. The vector ranking should contain the following numbers: 4, 2, 5, 1, 3 in it. Make sure to put these in order. #Remember that the elements of character vectors need to be enclosed in quotation marks. Either single or double quotes will work. wizards = c("Harry", "Ron", "Fred", "George", "Sirius") ranking = c(4, 2, 5, 1, 3) 1. Extract the second element of the vector wizards. wizards[2] ## [1] "Ron" 1. Replace the names Fred, George and Sirius in the vector wizards with Hermione, Ginny and Malfoy, respectively. #There are several different ways to do this. Here are two possibilities. wizards[c(3, 4, 5)] = c("Hermione", "Ginny", "Malfoy") wizards[3:5] = c("Hermione", "Ginny", "Malfoy") 1. Someone who hasn’t read Harry Potter needs labels to determine who these characters are. Assign names to the elements of the vector wizards: Lead, Friend, Friend, Wife, Rival. Display the result. names(wizards) = c("Lead", "Friend", "Friend", "Wife", "Rival") wizards ## Lead Friend Friend Wife Rival ## "Harry" "Ron" "Hermione" "Ginny" "Malfoy" 1. An avid reader of Harry Potter argues that Malfoy is not Harry’s rival by the end of the series. Change Rival to Ex-Rival. names(wizards)[5] = "Ex-Rival" names(wizards) ## [1] "Lead" "Friend" "Friend" "Wife" "Ex-Rival" 1. In 2009 Steve’s income was$50,000 and his total expenses were $35,000. In 2010 his income was$52,000 and his expenses were $34,000. In 2011, his income was$52,500 and his expenses were $38,000. Finally, in 2012 Steve’s earnings were$48,000 and his expenses were $40,000. Create three vectors to store this information in parallel: years, income and expenses. years = c(2009, 2010, 2011, 2012) income = c(50000, 52000, 52500, 48000) expenses = c(35000, 34000, 38000, 40000) 1. Following on from the previous question, calculate Steve’s annual savings and store this in a vector called savings. savings = income - expenses 1. Assuming zero interest on bank deposits (roughly accurate at the moment), calculate the total amount that Steve has saved over all the years for which we have data. sum(savings) ## [1] 55500 1. Create a vector called z that lists the numbers from 12 to 23 in descending order. z = 23:12 z ## [1] 23 22 21 20 19 18 17 16 15 14 13 12 1. Replace the number 13 with the number 7 in z. z[z == 13] = 7 z ## [1] 23 22 21 20 19 18 17 16 15 14 7 12 1. Twenty-six students took the midterm. Here are their scores: 18, 95, 76, 90, 84, 83, 80, 79, 63, 76, 55, 78, 90, 81, 88, 89, 92, 73, 83, 72, 85, 66, 77, 82, 99, 87. Assign these values to a vector called scores. scores = c(18, 95, 76, 90, 84, 83, 80, 79, 63, 76, 55, 78, 90, 81, 88, 89, 92, 73, 83, 72, 85, 66, 77, 82, 99, 87) 1. Calculate the mean, median, and range of the scores. mean(scores) ## [1] 78.5 median(scores) ## [1] 81.5 range(scores) ## [1] 18 99 1. Create three vectors. First store the numeric values 21, 26, 51, 22, 160, 160, 160 in a vector called age. Next, store the names Achilles, Hector, Priam, Paris, Apollo, Athena, Aphrodite in a character vector called person. Finally store the words Aggressive, Loyal, Regal, Cowardly, Proud, Wise, Conniving in a vector called description age = c(21, 26, 51, 22, 160, 160, 160) person = c("Achilles", "Hector", "Priam", "Paris", "Apollo", "Athena", "Aphrodite") description = c("Aggressive", "Loyal", "Regal", "Cowardly", "Proud", "Wise", "Conniving") 1. Create a data.table called trojan.war whose columns contain the vectors from the previous question. trojan.war = data.table(person, age, description) 1. Suppose you wanted to display only the column of trojan.war that contains each person’s description. What command would you use? #There are many different ways to do this: trojan.war[, 3]  ## description ## 1: Aggressive ## 2: Loyal ## 3: Regal ## 4: Cowardly ## 5: Proud ## 6: Wise ## 7: Conniving trojan.war$description 
## [1] "Aggressive" "Loyal"      "Regal"      "Cowardly"   "Proud"
## [6] "Wise"       "Conniving"
trojan.war[ , "description"]
##    description
## 1:  Aggressive
## 2:       Loyal
## 3:       Regal
## 4:    Cowardly
## 5:       Proud
## 6:        Wise
## 7:   Conniving
trojan.war[["description"]]
## [1] "Aggressive" "Loyal"      "Regal"      "Cowardly"   "Proud"
## [6] "Wise"       "Conniving"
1. What command would you use to show information for Achilles and Hector only?
#There are several ways to do this. Here are a few:
trojan.war[c(1,2)]
##      person age description
## 1: Achilles  21  Aggressive
## 2:   Hector  26       Loyal
trojan.war[1:2]
##      person age description
## 1: Achilles  21  Aggressive
## 2:   Hector  26       Loyal
#A more advanced way that doesn't require knowing the order of the rows:
trojan.war[person %in% c("Achilles", "Hector")]
##      person age description
## 1: Achilles  21  Aggressive
## 2:   Hector  26       Loyal
1. What command would you use to display the person and description columns for Apollo, Athena and Aphrodite only?
#There are many ways to do this. Here are a few:
trojan.war[c(5, 6, 7), c(1, 3)]
##       person description
## 1:    Apollo       Proud
## 2:    Athena        Wise
## 3: Aphrodite   Conniving
trojan.war[5:7, c("person", "description")]
##       person description
## 1:    Apollo       Proud
## 2:    Athena        Wise
## 3: Aphrodite   Conniving
#advanced method
trojan.war[person %in% c("Apollo", "Athena", "Aphrodite"),
c("person", "description")]
##       person description
## 1:    Apollo       Proud
## 2:    Athena        Wise
## 3: Aphrodite   Conniving
1. By now you’re probably tired of this data set. A passenger manifest for the Titanic is stored at http://www.ditraglia.com/econ103/titanic3.csv. Read this file and store it in a dataframe called titanic.
titanic = fread("http://www.ditraglia.com/econ103/titanic3.csv")
1. Calculate the product of all the even numbers between 2 and 18, inclusive.
x = seq(2, 18, 2)
x
## [1]  2  4  6  8 10 12 14 16 18
prod(x)
## [1] 185794560
1. The column survived in the titanic data has a value of “1” to indicate that the passenger in that row survived the disaster. Display only the rows of titanic corresponding to passengers that survived.
titanic[survived == 1]
##      pclass survived                                            name
##   1:      1        1                   Allen, Miss. Elisabeth Walton
##   2:      1        1                  Allison, Master. Hudson Trevor
##   3:      1        1                             Anderson, Mr. Harry
##   4:      1        1               Andrews, Miss. Kornelia Theodosia
##   5:      1        1   Appleton, Mrs. Edward Dale (Charlotte Lamson)
##  ---
## 496:      3        1                          Turkula, Mrs. (Hedwig)
## 497:      3        1                            Vartanian, Mr. David
## 498:      3        1 Whabee, Mrs. George Joseph (Shawneene Abi-Saab)
## 499:      3        1                Wilkes, Mrs. James (Ellen Needs)
## 500:      3        1         Yasbeck, Mrs. Antoni (Selini Alexander)
##         sex   age sibsp parch ticket     fare   cabin embarked  boat body
##   1: female 29.00     0     0  24160 211.3375      B5        S     2
##   2:   male  0.92     1     2 113781 151.5500 C22 C26        S    11
##   3:   male 48.00     0     0  19952  26.5500     E12        S     3
##   4: female 63.00     1     0  13502  77.9583      D7        S    10
##   5: female 53.00     2     0  11769  51.4792    C101        S     D
##  ---
## 496: female 63.00     0     0   4134   9.5875                S    15
## 497:   male 22.00     0     0   2658   7.2250                C 13 15
## 498: female 38.00     0     0   2688   7.2292                C     C
## 499: female 47.00     1     0 363272   7.0000                S
## 500: female 15.00     1     0   2659  14.4542                C
##                            home.dest
##   1:                    St Louis, MO
##   2: Montreal, PQ / Chesterville, ON
##   3:                    New York, NY
##   4:                      Hudson, NY
##   5:             Bayside, Queens, NY
##  ---
## 496:
## 497:
## 498:
## 499:
## 500:
1. The column sex in the titanic data indicates each passenger’s sex. Display only the rows of titanic corresponding to men.
titanic[sex == 'male']
##      pclass survived                                 name  sex   age sibsp
##   1:      1        1       Allison, Master. Hudson Trevor male  0.92     1
##   2:      1        0 Allison, Mr. Hudson Joshua Creighton male 30.00     1
##   3:      1        1                  Anderson, Mr. Harry male 48.00     0
##   4:      1        0               Andrews, Mr. Thomas Jr male 39.00     0
##   5:      1        0              Artagaveytia, Mr. Ramon male 71.00     0
##  ---
## 839:      3        0                    Yousif, Mr. Wazli male    NA     0
## 840:      3        0                Yousseff, Mr. Gerious male    NA     0
## 841:      3        0            Zakarian, Mr. Mapriededer male 26.50     0
## 842:      3        0                  Zakarian, Mr. Ortin male 27.00     0
## 843:      3        0                   Zimmerman, Mr. Leo male 29.00     0
##      parch   ticket     fare   cabin embarked boat body
##   1:     2   113781 151.5500 C22 C26        S   11
##   2:     2   113781 151.5500 C22 C26        S       135
##   3:     0    19952  26.5500     E12        S    3
##   4:     0   112050   0.0000     A36        S
##   5:     0 PC 17609  49.5042                C        22
##  ---
## 839:     0     2647   7.2250                C
## 840:     0     2627  14.4583                C
## 841:     0     2656   7.2250                C       304
## 842:     0     2670   7.2250                C
## 843:     0   315082   7.8750                S
##                            home.dest
##   1: Montreal, PQ / Chesterville, ON
##   2: Montreal, PQ / Chesterville, ON
##   3:                    New York, NY
##   4:                     Belfast, NI
##   5:             Montevideo, Uruguay
##  ---
## 839:
## 840:
## 841:
## 842:
## 843: