read.csv()
. Use the help function to learn what arguments this function takes. Once you have the necessary input, load the data set into R and make it a data frame called iowa.df
.iowa.df
have?iowa.df
?iowa.df
?iowa.df
in its entirety.# a
iowa.df<-read.csv("data/iowa.csv",header=T,sep = ";")
# b
nrow(iowa.df)
## [1] 33
ncol(iowa.df)
## [1] 10
# c
colnames(iowa.df)
## [1] "Year" "Rain0" "Temp1" "Rain1" "Temp2" "Rain2" "Temp3" "Rain3" "Temp4"
## [10] "Yield"
# d
iowa.df[5,7]
## [1] 79.7
# e
iowa.df[2,]
## Year Rain0 Temp1 Rain1 Temp2 Rain2 Temp3 Rain3 Temp4 Yield
## 2 1931 14.76 57.5 3.83 75 2.72 77.2 3.3 72.6 32.9
vector1 <- c("5", "12", "7", "32")
max(vector1)
sort(vector1)
sum(vector1)
Create a character vector. The elements are strings.
Maximum element in lexicographical order. The comparison is based on the Unicode value of each character in the concatenation.
Sort in lexicographical order.
The arguments of sum should be numeric or complex or logical vectors.
vector2 <- c("5",7,12)
vector2[2] + vector2[3]
dataframe3 <- data.frame(z1="5",z2=7,z3=12)
dataframe3[1,2] + dataframe3[1,3]
list4 <- list(z1="6", z2=42, z3="49", z4=126)
list4[[2]]+list4[[4]]
list4[2]+list4[4]
It is important to remember that a vector can contain elements of only one type. When types are missed, R modifies the entire vector to be one of the more complicated type. In this example, vector2 has been inserted a character among the numeric, so the entire vector gets converted to character.
The data type is character.
Create a dataframe, the elements are of data types character, numeric, numeric separately.
Numeric variable can be added.
Create a list, which keeps the data type.
Numeric variable can be added.
Indexing a list is done using double bracket.
seq()
which you saw earlier in this assignment. Using the help command ?seq
to learn about the function, design an expression that will give you the sequence of numbers from 1 to 10000 in increments of 372. Design another that will give you a sequence between 1 and 10000 that is exactly 50 numbers in length.rep()
repeats a vector some number of times. Explain the difference between `rep(1:3, times=3) and rep(1:3, each=3).seq(1,10000,by = 372)
## [1] 1 373 745 1117 1489 1861 2233 2605 2977 3349 3721 4093 4465 4837 5209
## [16] 5581 5953 6325 6697 7069 7441 7813 8185 8557 8929 9301 9673
seq(1,10000,length.out = 50)
## [1] 1.0000 205.0612 409.1224 613.1837 817.2449 1021.3061
## [7] 1225.3673 1429.4286 1633.4898 1837.5510 2041.6122 2245.6735
## [13] 2449.7347 2653.7959 2857.8571 3061.9184 3265.9796 3470.0408
## [19] 3674.1020 3878.1633 4082.2245 4286.2857 4490.3469 4694.4082
## [25] 4898.4694 5102.5306 5306.5918 5510.6531 5714.7143 5918.7755
## [31] 6122.8367 6326.8980 6530.9592 6735.0204 6939.0816 7143.1429
## [37] 7347.2041 7551.2653 7755.3265 7959.3878 8163.4490 8367.5102
## [43] 8571.5714 8775.6327 8979.6939 9183.7551 9387.8163 9591.8776
## [49] 9795.9388 10000.0000
MB.Ch1.2. The orings data frame gives data on the damage that had occurred in US space shuttle launches prior to the disastrous Challenger launch of 28 January 1986. The observations in rows 1, 2, 4, 11, 13, and 18 were included in the pre-launch charts used in deciding whether to proceed with the launch, while remaining rows were omitted.
Create a new data frame by extracting these rows from orings, and plot total incidents against temperature for this new data frame. Obtain a similar plot for the full data set.
data(orings)
prelaunch <- orings[c(1,2,4,11,13,18),]
plot(prelaunch$Temperature, prelaunch$Total)
plot(orings$Temperature, orings$Total)
MB.Ch1.4. For the data frame ais (DAAG package)
data(ais)
str(ais)
## 'data.frame': 202 obs. of 13 variables:
## $ rcc : num 3.96 4.41 4.14 4.11 4.45 4.1 4.31 4.42 4.3 4.51 ...
## $ wcc : num 7.5 8.3 5 5.3 6.8 4.4 5.3 5.7 8.9 4.4 ...
## $ hc : num 37.5 38.2 36.4 37.3 41.5 37.4 39.6 39.9 41.1 41.6 ...
## $ hg : num 12.3 12.7 11.6 12.6 14 12.5 12.8 13.2 13.5 12.7 ...
## $ ferr : num 60 68 21 69 29 42 73 44 41 44 ...
## $ bmi : num 20.6 20.7 21.9 21.9 19 ...
## $ ssf : num 109.1 102.8 104.6 126.4 80.3 ...
## $ pcBfat: num 19.8 21.3 19.9 23.7 17.6 ...
## $ lbm : num 63.3 58.5 55.4 57.2 53.2 ...
## $ ht : num 196 190 178 185 185 ...
## $ wt : num 78.9 74.4 69.1 74.9 64.6 63.7 75.2 62.3 66.5 62.9 ...
## $ sex : Factor w/ 2 levels "f","m": 1 1 1 1 1 1 1 1 1 1 ...
## $ sport : Factor w/ 10 levels "B_Ball","Field",..: 1 1 1 1 1 1 1 1 1 1 ...
complete.cases(t(ais))
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## # A tibble: 10 x 5
## sport cnt_f cnt_m ratio isbalance
## <fct> <int> <int> <dbl> <lgl>
## 1 B_Ball 13 12 1.08 TRUE
## 2 Field 7 12 0.583 TRUE
## 3 Gym 4 0 Inf FALSE
## 4 Netball 23 0 Inf FALSE
## 5 Row 22 15 1.47 TRUE
## 6 Swim 9 13 0.692 TRUE
## 7 T_400m 11 18 0.611 TRUE
## 8 T_Sprnt 4 11 0.364 FALSE
## 9 Tennis 7 4 1.75 TRUE
## 10 W_Polo 0 17 0 FALSE
data.frame(t(table(ais[,12:13]))) %>% spread(sex,Freq) %>%
mutate(ratio = f/m) %>%
mutate(isbalance = (ratio < 2) & (ratio > 0.5))
## sport f m ratio isbalance
## 1 B_Ball 13 12 1.0833333 TRUE
## 2 Field 7 12 0.5833333 TRUE
## 3 Gym 4 0 Inf FALSE
## 4 Netball 23 0 Inf FALSE
## 5 Row 22 15 1.4666667 TRUE
## 6 Swim 9 13 0.6923077 TRUE
## 7 T_400m 11 18 0.6111111 TRUE
## 8 T_Sprnt 4 11 0.3636364 FALSE
## 9 Tennis 7 4 1.7500000 TRUE
## 10 W_Polo 0 17 0.0000000 FALSE
MB.Ch1.6.Create a data frame called Manitoba.lakes that contains the lake’s elevation (in meters above sea level) and area (in square kilometers) as listed below. Assign the names of the lakes using the row.names() function. elevation area Winnipeg 217 24387 Winnipegosis 254 5374 Manitoba 248 4624 SouthernIndian 254 2247 Cedar 253 1353 Island 227 1223 Gods 178 1151 Cross 207 755 Playgreen 217 657
Manitoba.lakes <- data.frame(elevation = c(217, 254, 248, 254, 253, 227, 178, 207, 217),
area = c(24387, 5374, 4624, 2247, 1353, 1223, 1151, 755, 657),
row.names = c("Winnipeg","Winnipegosis","Manitoba",
"SouthernIndian","Cedar","Island",
"Gods","Cross","Playgreen"))
Devise captions that explain the labeling on the points and on the y-axis. It will be necessary to explain how distances on the scale relate to changes in area.
plot(area ~ elevation, pch=16, xlim=c(170,280),log="y")
text(area ~ elevation, labels=row.names(Manitoba.lakes), pos=4)
text(area ~ elevation, labels=area, pos=2)
title("Manitoba’s Largest Lakes")
MB.Ch1.7. Look up the help page for the R function dotchart(). Use this function to display the areas of the Manitoba lakes (a) on a linear scale, and (b) on a logarithmic scale. Add, in each case, suitable labeling information.
dotchart(area, labels = rownames(Manitoba.lakes),
main = "Manitoba’s Largest Lakes",
xlab = "area")
dotchart(log2(area), labels = rownames(Manitoba.lakes),
main = "Manitoba’s Largest Lakes",
xlab = "log2(area)")
MB.Ch1.8. Using the sum() function, obtain a lower bound for the area of Manitoba covered by water.
sum(area)
## [1] 41771