Homework 3: The Death and Life of Great American City Scaling Laws

Background: In the previous lectures and lab, we began to look at user-written functions. For this assignment we will continue with a look at fitting models by optimizing error functions, and making user-written functions parts of larger pieces of code.

In lecture, we saw how to estimate the parameter $a$ in a nonlinear model,

\[ Y = y_0 N^a + \mathrm{noise} \] by minimizing the mean squared error \[ \frac{1}{n}\sum_{i=1}^{n}{(Y_i - y_0 N_i^a)^2}. \]

We did this by approximating the derivative of the MSE, and adjusting $a$ by an amount proportional to that, stopping when the derivative became small. Our procedure assumed we knew $y_0$. In this assignment, we will use a built-in R function to estimate both parameters at once; it uses a fancier version of the same idea.

Because the model is nonlinear, there is no simple formula for the parameter estimates in terms of the data. Also unlike linear models, there is no simple formula for the standard errors of the parameter estimates. We will therefore use a technique called the jackknife to get approximate standard errors.

Here is how the jackknife works:

Get a set of $n$ data points and get an estimate $\hat{\theta}$ for the parameter of interest $\theta$.
For each data point $i$, remove $i$ from the data set, and get an estimate $\hat{\theta}_{(-i)}$ from the remaining $n-1$ data points. The $\hat{\theta}_{(-i)}$ are sometimes called the “jackknife estimates”.
Find the mean $\overline{\theta}$ of the $n$ values of $\hat{\theta}_{(-i)}$
The jackknife variance of $\hat{\theta}$ is \[ \frac{n-1}{n}\sum_{i=1}^{n}{(\hat{\theta}_{(-i)} - \overline{\theta})^2} = \frac{(n-1)^2}{n}\mathrm{var}{[\hat{\theta}_{(-i)}]} \] where $\mathrm{var}$ stands for the sample variance. (Challenge: can you explain the factor of $(n-1)^2/n$? Hint: think about what happens when $n$ is large so $(n-1)/n \approx 1$.)
The jackknife standard error of $\hat{\theta}$ is the square root of the jackknife variance.

You will estimate the power-law scaling model, and its uncertainty, using the data alluded to in lecture, available in the file gmp.dat from lecture, which contains data for 2006.

gmp <- read.table("./data/gmp.dat")
gmp$pop <- round(gmp$gmp/gmp$pcgmp)

First, plot the data as in lecture, with per capita GMP on the y-axis and population on the x-axis. Add the curve function with the default values provided in lecture. Add two more curves corresponding to $a=0.1$ and $a=0.15$; use the col option to give each curve a different color (of your choice).

gmp <- gmp %>% mutate(a_0.125 = 6611*(gmp/pcgmp)^(1/8),
                      a_0.1 = 6611*(gmp/pcgmp)^0.1, a_0.15 = 6611*(gmp/pcgmp)^0.15)
gmp_tidy <- gmp[,3:7] %>% gather(para_a_value, nlmfit, -pcgmp, -pop)

gmp_tidy %>% ggplot() + geom_point(aes(x = pop, y = pcgmp))+
  labs(x = "Population", y = "Per-Capita Economic Output ($/person-year)",
  title = "US Metropolitan Areas, 2006")+
  geom_line(aes(x = pop, y = nlmfit, color = para_a_value), size = 1.5) +
  scale_x_continuous(trans = 'log10') + # take the logarithm of the population
  scale_color_manual(values=c("red", "blue", "green")) # set color manually

Write a function, called mse(), which calculates the mean squared error of the model on a given data set. mse() should take three arguments: a numeric vector of length two, the first component standing for $y_0$ and the second for $a$; a numerical vector containing the values of $N$; and a numerical vector containing the values of $Y$. The function should return a single numerical value. The latter two arguments should have as the default values the columns pop and pcgmp (respectively) from the gmp data frame from lecture. Your function may not use for() or any other loop. Check that, with the default data, you get the following values.

> mse(c(6611,0.15))
[1] 207057513
> mse(c(5000,0.10))
[1] 298459915

mse <- function(para, N = gmp$pop, Y = gmp$pcgmp){
  return(mean((Y - para[1]*N^para[2])^2))
}
mse(c(6611,0.15))

## [1] 207057513

mse(c(5000,0.10))

## [1] 298459914

R has several built-in functions for optimization, which we will meet as we go through the course. One of the simplest is nlm(), or non-linear minimization. nlm() takes two required arguments: a function, and a starting value for that function. Run nlm() three times with your function mse() and three starting value pairs for $y0$ and $a$ as in

nlm(mse, c(y0=6611,a=1/8))

## $minimum
## [1] 61857060
## 
## $estimate
## [1] 6611.0000000    0.1263177
## 
## $gradient
## [1] 50.048639 -9.983778
## 
## $code
## [1] 2
## 
## $iterations
## [1] 3

What do the quantities minimum and estimate represent? What values does it return for these?

t1 <- nlm(mse, c(y0=6611,a=1/8))
t2 <- nlm(mse, c(y0=6600,a=0.1))
t3 <- nlm(mse, c(y0=6620,a=0.15))

minimum represents the the value of the estimated minimum of f.
estimate represents the point at which the minimum value of f is obtained.

Using nlm(), and the mse() function you wrote, write a function, plm(), which estimates the parameters $y_0$ and $a$ of the model by minimizing the mean squared error. It should take the following arguments: an initial guess for $y_0$; an initial guess for $a$; a vector containing the $N$ values; a vector containing the $Y$ values. All arguments except the initial guesses should have suitable default values. It should return a list with the following components: the final guess for $y_0$; the final guess for $a$; the final value of the MSE. Your function must call those you wrote in earlier questions (it should not repeat their code), and the appropriate arguments to plm() should be passed on to them.
What parameter estimate do you get when starting from $y_0 = 6611$ and $a = 0.15$? From $y_0 = 5000$ and $a = 0.10$? If these are not the same, why do they differ? Which estimate has the lower MSE?

plm <- function(para, N = gmp$pop, Y = gmp$pcgmp){
  t <- nlm(mse, c(para[1], para[2]), N, Y)
  return(list(parameters = c(t$estimate[1], t$estimate[2]), MSE = t$minimum))
}
plm(c(6611,0.15))

## $parameters
## [1] 6610.9999997    0.1263182
## 
## $MSE
## [1] 61857060

plm(c(5000,0.10))

## $parameters
## [1] 5000.0000008    0.1475913
## 
## $MSE
## [1] 62521484

There are multiple local optimal values. The first group of parameters has the lower MSE.

Convince yourself the jackknife can work.
1. Calculate the mean per-capita GMP across cities, and the standard error of this mean, using the built-in functions mean() and sd(), and the formula for the standard error of the mean you learned in your intro. stats. class (or looked up on Wikipedia…).
```
mean(gmp$pcgmp)
```
```
## [1] 32922.53
```
```
sd(gmp$pcgmp)/sqrt(nrow(gmp)) # standard error = sd/sqrt(n)
```
```
## [1] 481.9195
```
The standard error of the mean $(\sigma_{\bar x}) = \sqrt{\frac{\Sigma_{i=1}^n (x_i - \bar x)^2}{n(n-1)}}$
1. Write a function which takes in an integer i, and calculate the mean per-capita GMP for every city except city number i.
```
mean.jackknife <- function(i, pc = gmp$pcgmp){
  return(mean(pc[-i]))
}
# mean.jackknife(1)
```
1. Using this function, create a vector, jackknifed.means, which has the mean per-capita GMP where every city is held out in turn. (You may use a for loop or sapply().)
```
n = nrow(gmp)
jackknifed.means <- sapply(1:n,mean.jackknife)
```
1. Using the vector jackknifed.means, calculate the jack-knife approximation to the standard error of the mean. How well does it match your answer from part (a)?
```
sqrt((n-1)^2/n*var(jackknifed.means))
```
```
## [1] 481.9195
```
They are the same. Furthermore, it is easy to prove the equivalence that \[\frac{n-1}{n} \sum_{i=1}^{n}\left(\hat{\theta}_{(-i)}-\hat{\theta}_{(\cdot)}\right)^{2}=\frac{1}{n(n-1)} \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}\]when $\hat\theta_{(-i)} = \displaystyle\frac{\sum_{j=1}^n x_j - x_i}{n-1}$ and $\hat{\theta}_{(\cdot)}=\frac{1}{n} \sum_{i=1}^{n} \hat{\theta}_{(-i)}$.
Write a function, plm.jackknife(), to calculate jackknife standard errors for the parameters $y_0$ and $a$. It should take the same arguments as plm(), and return standard errors for both parameters. This function should call your plm() function repeatedly. What standard errors do you get for the two parameters?

plm.jackknife <- function(para, N = gmp$pop, Y = gmp$pcgmp){
  # para = c(6611, 0.125)
  jackknife <- function(i){
      return(plm(para,N[-i],Y[-i])$parameters)
  }
  n = length(N)
  jackknifed.para <- sapply(1:n, jackknife)
  y.sd <- sqrt((n-1)^2/n*var(jackknifed.para[1,]))
  a.sd <- sqrt((n-1)^2/n*var(jackknifed.para[2,]))
  sd <- c(y.sd,a.sd) 
  names(sd) = c('y.sd','a.sd')
  return(sd)
}

plm.jackknife(c(6611, 0.125))

##         y.sd         a.sd 
## 1.136653e-08 9.901003e-04

The file gmp-2013.dat contains measurements for for 2013. Load it, and use plm() and plm.jackknife to estimate the parameters of the model for 2013, and their standard errors. Have the parameters of the model changed significantly?

gmp_2013 <- read.csv("data/gmp-2013.dat", header=T, sep = " ") %>%
  mutate(pop = round(gmp/pcgmp))
plm(c(6611,1/8),N = gmp_2013$pop, Y = gmp_2013$pcgmp)

## $parameters
## [1] 6611.0000002    0.1433688
## 
## $MSE
## [1] 135210524

plm.jackknife(c(6611, 0.125),N = gmp_2013$pop, Y = gmp_2013$pcgmp)

##         y.sd         a.sd 
## 2.692652e-08 1.098548e-03

We can see that parameters don’t change significiantly.

Homework 3: The Death and Life of Great American City Scaling Laws

Sisi Huang