---
title: "Kalman smoothing MaddisonData"
author: "Spencer Graves"
date: "`r Sys.Date()`"
output: html_document
vignette: >
  %\VignetteIndexEntry{Kalman smoothing MaddisonData}
  %\VignetteEngine{knitr::rmarkdown}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(RefManageR)
library(MaddisonData)
bibFile <- path_package2("MaddisonData.bib", 
                         subdirs='vignettes')
bib <- ReadBib(bibFile, check = FALSE)
BibOptions(check.entries = FALSE, style = "markdown", 
           cite.style = "authoryear", bib.style = "numeric")
```

## Abstract

We use the [KFAS (Kalman Filtering and Smoothing)](https://github.com/helske/KFAS) R Package for Exponential Family State Space Models to obtain separate estimates of GDP per capita and growth rate for each country in `MaddisonData`. This approach supports alternative models of economic growth including the impact of changes in heads of state, wars, and human rights including media policies. The methodology can be easily adapted to model population growth and changes on other variables. 

## Basic setup 

It is common to work with log(dollars) rather than dollars directly.  This is obvious to many who work with finance but foreign to many others.  For the latter group, it's worth noting that a $500 drop in annual income would have been catastrophic in 1790, representing almost half the average income at the time.  However a similar $500 drop around 2015 would be hardly noticeable, being just under 1 percent of the 2015 average income.  Moreover, a small change in logarithms can be interpreted as percentage, since log(1+x) is approximately x when x is small (using [natural logarithms](https://en.wikipedia.org/wiki/Natural_logarithm)).  

The models used here are all state space models, which we discuss in general here, giving specific examples later.  State space modeling is also known as Kalman (Kalman filtering, Kalman smoothing), and dynamic (linear) modeling.  For present purposes, we use the [KFAS](https://rweb.crmda.ku.edu/cran/web/packages/KFAS/KFAS.pdf) package described by `r AutoCite(bib, "Helske17")`, though other R packages provide some support for this class of models.  

The theory behind these models can be described in terms of a 3-step Bayesian updating cycle to add new observations to a probability distribution summarizing the available knowledge about the state of the system under study `r AutoCite(bib, "Graves07")`:

  1.  Receive and store new data.  
  
  2.  Create a prior distribution for the new data by modifying the posterior computed after the last observation to allow for possible changes in state in the time between the last and current observations. The "migration variance" is assumed to be proportional to the time since the last observation. That's important with irregularly sampled series like GDP per capita and population for some countries. Also, the rate of growth in GDP per capita may change substantially due to events like a change in head of state, as between [Charles I](https://en.wikipedia.org/wiki/Charles_I_of_England) and [English Council of State](https://en.wikipedia.org/wiki/English_Council_of_State) that succeeded Charles I in 1649 or between [Calvin Coolidge](https://en.wikipedia.org/wiki/Calvin_Coolidge), [Herbert Hoover](https://en.wikipedia.org/wiki/Herbert_Hoover), [Franklin Roosevelt](https://en.wikipedia.org/wiki/Franklin_D._Roosevelt) and [Harry Truman](https://en.wikipedia.org/wiki/Harry_S._Truman) in the US between 1929 and 1945. 
  
  3.  Combine the new data with the prior to produce a posterior distribution of the state.  
  
In the present discussion, the system under study can be GDP per capita (at purchasing power parity adjusted for inflation) and / or population at one or multiple countries. We summarize this system in a vector of one or more numbers that represent the state or condition of the system at a particular point in time.  (The term "state" in "state space" refers to this type of representation.)

The first model of this type discussed here considers univariate observations $y_t$ = log(real GDP per capita) on a hidden univariate state $\alpha_t$, which represents the portion of the annual number that persists.  The differences between the observations and predictions based on the hidden state are assumed to be uncorrelated across time $t$;  any serial correlation in $y_t$ is assumed to be due to the hidden state, $\alpha_t$.  

With univariate observations on a univariate state, the result is almost a traditional [exponentially weighted moving average (EWMA)](https://en.wikipedia.org/wiki/Exponential_smoothing), except that the weight on the last observation is adjusted with each observation consistent with the Bayesian sequential updating cycle outlined above. With annual observations with no recent missing values, this weight converges to an asymptote determined by the ratio of the migration and observation error variances `r AutoCite(bib, "Graves02")`. 

In the more general [Kalman filtering](https://en.wikipedia.org/wiki/Kalman_filter) terminology, the weight on the last observation (or the weight on the deviation of that observation from prediction) is called the Kalman gain.  Bayesian sequential updating produces a varying Kalman gain, though in many applications the Kalman gain converges to a constant.  Before fitting any model, we first describe the general [KFAS](https://rweb.crmda.ku.edu/cran/web/packages/KFAS/KFAS.pdf) model and show how the Bayesian EWMA described above is a special case.  This formulation allows the system to vary over time supporting asynchronous observations on different variables at irregular points in time.  

The next generalization beyond an EWMA is has a two-dimensional state vector with the first dimension being the level and the second being the rate of growth.  

## 1.  The simplest Kalman Filter:  an Exponentially Weighted Moving Average (EWMA)

Consider a univariate noisy measurement, $y_t$, of a univariate state, $\alpha_t$:

$$y_t = \alpha_t + \epsilon_t$$

where the state, $\alpha_t$, follows a random walk:  

$$\alpha_{t+1} = \alpha_t + \eta_t.$$

In this model, we assume that observation error, $\epsilon_t$, and the random migration, $\eta_t$, are normally distributed with mean 0 and variance, $H$ and $Q_t$, respectively, and the initial state $\alpha_1$ is normal with mean $a_1$ and variance $P_1$.  We sometimes write $a_1$ and $P_1$ as $a_{1|0}$ and $P_{1|0}$ to make it clear that these are the prior mean and variance of $\alpha_1$ before $y_1$ is available (using the conditional subscript notation, following, e.g., `r Citet(bib, "Harvey")`).  We may also use $a_{t|t}$ and $P_{t|t}$ to denote the posterior mean and variance given $D_t = \{y_t, y_{t-1}, ...\}$.  The prior distribution for $\alpha_t$ given $D_{t-1}$ and the posterior for $\alpha_t$ given $D_t$ will be computed recursively. (See `r Citet(bib, "DurbinKoopman2")` for more detail on this model.)

These equations are called the measurement and state transition equations, respectively.  

We will estimate $H$ and $Q$ to maximize the likelihood.  We will also estimate the series $\alpha_t$ conditional on these estimates $H$, $Q$, $a_1$ and $P_1$.  

With this notation, we can provide more details in discussing the 3-step Bayesian sequential updating process outlined above: 

  1.  A new observation, $y_t$, arrives.  
  
  2.  The prior distribution $f(\alpha_t | D_{t-1})$ is computed from the previous posterior $f(\alpha_{t-1} | D_{t-1})$, where $D_{t-1}$ is all the information available prior to the arrival of $y_t$;  for $t$ = 0, this is already assumed as $\alpha_1 \tilde\ N(a_{1|0}, P_{1|0})$.  Otherwise, we have by recursion that the posterior for $(\alpha_{t-1} | D_{t-1})$ is $N(a_{t-1|t-1}, P_{t-1|t-1})$.  From this using a linear state transition equation, we get that the next prior $(\alpha_t | D_{t-1})$ is $N(a_{t|t-1}, P_{t|t-1})$.  In particular, the state transition equation above gives us $a_{t|t-1} = a_{t-1|t-1}$ and $P_{t|t-1} = P_{t-1|t-1} + Q.$  (The double subscripts help clarify the math, which can be confusing otherwise.)  

  3.  Bayes' theorem is then used to update this prior to incorporate the information in the new observation.  This converts $f(\alpha_t | D_{t-1})$ into $f(\alpha_t | D_t)$.    
  
The distribution of the state vector $\alpha_t$ is governed by certain hyperparameters, suppressed in the notation $f(\alpha_t | D_t)$.  These hyperparameters are estimated by maximum likelihood. 

## 1a. KFAS package for Kalman Filtering and Smoothing

State space modeling with the [KFAS](https://rweb.crmda.ku.edu/cran/web/packages/KFAS/KFAS.pdf) package assumes the following generalization of the above measurement and state transition equations:  

$$y_t = Z_t \alpha_t + \epsilon_t$$

$$\alpha_{t+1} = T_t \alpha_t + R_t \eta_t.$$

where $\epsilon_t \tilde\ N(0, H_t)$, $\eta_t \tilde\ N(0, Q_t)$ and $\alpha_1 \tilde\ N(a_1, P_1)$, independent of each other.  In this, $y_t$ is a $p$-vector (where $p$ may actually vary with $t$), $\alpha_t$ is $m$-dimensional, and $\eta_t$ has dimension $k$.  (See `r Citet(bib, "DurbinKoopman")` for more detail on this model.)   

We initially assume that $y_t$ is `log(realGDPperCapita)` for a single country. Generalizations could assume that $y_t$ could include `log(pop)` for that country or even for multiple countries. 

The dimensionality of $\alpha_t$ for different models, but the first component will always be the level.  Thus, with a univariate $y_t$, $Z_t$ will be a row vector with 1 in the first position and 0 elsewhere. This will force $Q_t$, $T_t$, and $P_1$ to change between models.  In addition, some models have diffuse initialization for the state vector.  These are specified by a model component $P_{1,\infty}$;  see `r Citet(bib, "KFASarticle")` or [SSmodel](http://rpackages.ianhowson.com/cran/KFAS/man/logLik.SSModel.html).  And the different models require estimating different parameters.  

## 1b.  Estimate the Bayesian EWMA 

[KFAS](https://rweb.crmda.ku.edu/cran/web/packages/KFAS/KFAS.pdf) modeling begins with a [formula](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html) that is converted to an "SSModel" object using the [SSModel](http://rpackages.ianhowson.com/cran/KFAS/man/SSModel.html) function.  This is passed to [fitSSM](http://rpackages.ianhowson.com/cran/KFAS/man/fitSSM.html) to estimate unknown parameters to maximize [logLik.SSModel](http://rpackages.ianhowson.com/cran/KFAS/man/logLik.SSModel.html).  Confidence and prediction intervals can be obtained using [predict.SSModel](http://rpackages.ianhowson.com/cran/KFAS/man/predict.SSModel.html).  Function [KFS](http://rpackages.ianhowson.com/cran/KFAS/man/KFS.html) will produce filtered and smoothed estimates of the state vector in components "a" and "alphahat", respectively, of an object of class "KFS":

<center> 
[formula](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/formula.html) -> [SSModel](http://rpackages.ianhowson.com/cran/KFAS/man/SSModel.html) -> [fitSSM](http://rpackages.ianhowson.com/cran/KFAS/man/fitSSM.html) -> [KFS](http://rpackages.ianhowson.com/cran/KFAS/man/KFS.html)  
</center>

Let's write a suitable formula for our Bayesian EWMA:

```{r KFAS0}
suppressMessages(library(KFAS)) # cite Helske (2017)
EWMA1fmla <- (log(gdppc)~
  SSMtrend(1,Q=array(NA, c(1, 1, length(gdppc))),
           a1=log(gdppc[1]),ynames='lnGDPperCap'))
```

We use this to create an SSModel as follows:  

```{r GBR0}
library(MaddisonData)
selMadd <- subset(MaddisonData, (ISO=='GBR') & !is.na(gdppc))
EWMA1mdl <-SSModel(EWMA1fmla, selMadd, H=matrix(NA) )
is.SSModel(EWMA1mdl) # TRUE
```

This model has 2 unknowns, the NA's in EWMA1fmla and SSModel: 

  * Q = the state migration variance  
  
  * H = observation error variance.  
  
The default behavior of [`fitSSM`](http://rpackages.ianhowson.com/cran/KFAS/man/fitSSM.html) is to estimate the logarithms of these two variance. `fitSSM` requires us to provide starting values for log(Q) and log(H), in that order. 

However, these two parameters may not be independently estimable. If so, that could create convergence problems with nonlinear estimation. Therefore, let's first start by forcing the two to be equal. 

```{r GBR1}
EWMA1updateFn1 <- function(pars, model){
# 1 parameter = variances of both 1-year migration and observation.
  dYr. <- diff(selMadd$year)
  dYr <- c(5*max(dYr.), dYr.)
  v1 <- exp(pars[1])
  Q <- array(dYr*v1, dim=c(1, 1, length(dYr)))
  model$Q <- Q
  model$H[1,1,1] <- v1 # observation variance 
  model
}
# check 
EWMA1mdl1 <- EWMA1updateFn1(rep(0, 2), EWMA1mdl)
class(EWMA1mdl1)
is.SSModel(EWMA1mdl1)
```

Now fit. 

```{r GBR1f}
EWMA1fit1 <- fitSSM(EWMA1mdl1, inits=c(lnQ_H=0), 
                   updatefn = EWMA1updateFn1)
```

We get, 'Warning in `optim(par = inits, fn = likfn, model = model, ...)` : one-dimensional optimization by Nelder-Mead is unreliable: use "Brent" or optimize() directly'. 

`KFS(EWMA1fit1$model)` gave a small "Std. Error" relative to the "Estimate", so it's probably good. However, we'll try "Brent" just in case.

```{r GBR1f2}
EWMA1fit1b <- fitSSM(EWMA1mdl, inits=EWMA1fit1$optim.out$par, 
                   updatefn = EWMA1updateFn1, method="Brent", 
                   lower=-70, upper=70)
logLik(EWMA1fit1$model)
(EWMA1logLik1 <- logLik(EWMA1fit1b$model))
EWMA1fit1$optim.out$par
EWMA1fit1b$optim.out$par
```

They are nearly identical: `logLik`s are the same, and `par` estimates are quite close. 

What about the plot of the state? 

```{r GBR1kfs}
EWMA1kfs <- KFS(EWMA1fit1b$model)
plot(selMadd$year, EWMA1kfs$a[-1], type='l')
```

We need to remove the first observation for the year 1000 as it does not provide enough information to justify the use of the space in the plot. Similarly, we want to identify and label apparent changes including isolated spikes and changes in slope. 

```{r GBR1kfs1}
plot(selMadd$year[22:70], EWMA1kfs$a[23:71], type='l')
abline(v=c(1291, 1296), col='grey', lty='dotted')
```

These data suggest changes in 1291 and 1296. [The Welsh rebel leader Rhys ap Maredudd](https://en.wikipedia.org/wiki/Rhys_ap_Maredudd) was captured in 1291, and Scots defeated England in the [Battle of Stirling Bridge](https://en.wikipedia.org/wiki/Battle_of_Stirling_Bridge) in 1297. 

The [Black Death](https://en.wikipedia.org/wiki/Black_Death) devastated Europe from 1346 to 1353 and [England](https://en.wikipedia.org/wiki/Black_Death_in_England) in 1348 and 1349, returning again in 1361 and 1362. It ended the [Hundred Years' War](https://en.wikipedia.org/wiki/Hundred_Years%27_War) between England and France in 1353, though another major campaign in France was launched in 1355.  

```{r GBR1kfs2}
sel1330_60 <- which(with(selMadd, (1329<year) & (year<1361)))
plot(selMadd$year[sel1330_60], EWMA1kfs$a[sel1330_60+1], type='l')
abline(v=c(1348, 1353), col='grey', lty='dotted')
```

What about the later fourteenth century? 

```{r GBR1kfs3}
sel1360_1400 <- which(with(selMadd, (1359<year) & (year<1401)))
plot(selMadd$year[sel1360_1400], EWMA1kfs$a[sel1360_1400+1], type='l')
abline(v=c(1375, 1392), col='grey', lty='dotted')
```

This suggests turning points in the slope in 1375 and 1392. The Wikipedia ["Timeline of English history"](https://en.wikipedia.org/wiki/Timeline_of_English_history#13th_century) does not suggest any obvious events associated during that period that might have driven the apparent increase in `GDPpc` during that period. We will not mark this on the plot. 

The next major turning point was in 1649 when the English decapitated [King Charles I](https://en.wikipedia.org/wiki/Charles_I_of_England) during the [English Civil War](https://en.wikipedia.org/wiki/English_Civil_War) (1642 to 1651). There was economic instability during the reign of [Queen Ann](https://en.wikipedia.org/wiki/Anne,_Queen_of_Great_Britain) (1702 to 1714). And the slope increased after the [Napoleonic Wars](https://en.wikipedia.org/wiki/Napoleonic_Wars)

```{r GBR1kfs4}
sel1790_1830 <- which(with(selMadd, (1789<year) & (year<1831)))
plot(selMadd$year[sel1790_1830], EWMA1kfs$a[sel1790_1830+1], type='l')
abline(v=1819, col='grey', lty='dotted')
```

The plot indicates that the rate of economic growth of the United Kingeom began to increase after 1819. That's the year of the [Peterloo Massacre](https://en.wikipedia.org/wiki/Peterloo_Massacre#Reaction_and_aftermath) during which a crowd of roughly 60,000 humans assembled nonviolently to  demand a reform of parliamentary representation and were attacked by a regiment of the English cavalry. Eighteen were killed and between 500 and 700 were injured. While the authorities initially cracked down on dissent and freedom of the press, Peterloo marked "the point of final conversion of provincial England to the struggle for enfranchisement of the working class", as described by `r Citet(bib, "White57")`. 

```{r idPts}
turningPts <- c(1291, 1296, 1348, 1353, 1642, 1651, 1702, 1714, 1819, 
                1914, 1918, 1939, 1945, 2007, 2009, 2016, 2020) 
GBRlbls <- data.frame(x=c(mean(turningPts[1:2]),
                          mean(turningPts[3:4]), 
                          mean(turningPts[5:6]), 
                          mean(turningPts[7:8]), 
                               turningPts[9], 
                          mean(turningPts[10:11]), 
                          mean(turningPts[12:13]), 
                          mean(turningPts[14:15]), 
                          mean(turningPts[16:17]) ), 
                       y=c(6.5, 4, 7, 6, 7, 3, 5, 5, 23), 
          label=c('Conflicts w Welsh, Scots', 'Black Death', 
                  'English Civil War', 'Queen Ann', 'Peterloo', 'WW1', 'WW2', 
                  'Great Recession', 'Brexit'),
          srt=90)
(GBR1 <- ggplotPath('year', 'gdppc', data=selMadd[-1, ], scaley=1000,
      ylab='GBR GDPpc (2011 K$ppp)', vlines=turningPts,
      labels=GBRlbls))
```

What happens when we try to get separate estimates of Q and H? 

```{r GBR2}
EWMA1updateFn2 <- function(pars, model){
# 2-parameters = variances of [1] 1-year migration and [2]observation.
  dYr. <- diff(selMadd$year)
  dYr <- c(5*max(dYr.), dYr.)
  Q <- array(dYr*exp(pars[1]), dim=c(1, 1, length(dYr)))
  model$Q <- Q
  model$H[1,1,1] <- exp(pars[2]) # observation variance 
  model
}
# check 
class(EWMA1mdl)

EWMA1init2 <- rep(EWMA1fit1b$optim.out$par, 2)
names(EWMA1init2) <- c('lnQ', 'lnH')
EWMA1mdl2 <- EWMA1updateFn2(EWMA1init2, EWMA1mdl)
is.SSModel(EWMA1mdl2)
```

Now fit. 

```{r GBR2f}
#fit_drivers <- fitSSM(model_drivers, inits = rep(-1, 6),
#                      updatefn = ownupdatefn, method = "BFGS")
EWMA1fit2 <- fitSSM(EWMA1mdl, inits=EWMA1init2, method = "BFGS", 
                   updatefn = EWMA1updateFn2)
logLik(EWMA1fit2$model)
logLik(EWMA1fit1b$model)
(EWMA1chisq2 <- (logLik(EWMA1fit2$model)-logLik(EWMA1fit1b$model)))
```

Conclusions: 

1. The one-year migration variance is not significantly different from the observation variance. 

2. We can select the ratio between the two over some reasonable range to give the best image in smoothing the series. 

3. However, let's first consider modeling the series with a two-dimensional state of `(level, growthRate)` or `(level, slope)`. 

## 2.  Estimate mean and slope 

```{r EWMA2f}
GBRgrowthMdl1 <- growthModel(exp(EWMA1fit1$optim.out$par/2), 
                                       selMadd$gdppc)
#EWMA2f. <- (log(gdppc)~ -1 + SSMbespoke(GBRgrowthMdl1, selMadd$gdppc) ) #doesn't wk
growthFormula <- (log(gdppc)~ -1 + SSMbespoke(
             growthModel(exp(EWMA1fit1$optim.out$par/2), selMadd$gdppc) )) 
```

```{r EWMA2m}
# EWMA2m. <-SSModel(EWMA2f., selMadd, H=matrix(NA) )#doesn't work
selMadd2m <-SSModel(growthFormula, selMadd, H=matrix(NA) )
class(selMadd2m)
```

Next: 

```{r GBRgrowthFit}
library(MaddisonData)
is.SSModel(selMadd2m)
GBRgrowthMdl2 <- growthUpdateFn(EWMA1fit1$optim.out$par, selMadd2m)
class(GBRgrowthMdl2)
is.SSModel(GBRgrowthMdl2)

GBRgrowthFit1 <- fitSSM(selMadd2m, inits=EWMA1fit1$optim.out$par, 
                        method = "BFGS", updatefn = growthUpdateFn)
(GBRgrowthLogLik1 <- logLik(GBRgrowthFit1$model))
EWMA1logLik1
```

Problem: `GBRgrowthLogLik1` = `r GBRgrowthLogLik1` is much than `EWMA1logLik1` = `r EWMA1logLik1`: It should be greater, unless I misunderstand the likelihood computation. 

If we only allow `Q[1, 1]` to be nonzero, then `fitSSM(selMadd2m, ...)` should be equivalent to `EWMA1fit1` ... but ignore that for the moment. 

```{r GBRgrowthFit0}
is.SSModel(selMadd2m)

GBRgrowthLogLik1
EWMA1logLik1
```

Let's plot the level and slope from `GBRgrowthLogLik1`. 

```{r GBRgrowthLogLik1plot}
GBRgrowthPlt1 <- ggplotPath2(GBRgrowthFit1)
```

The default value for `a1[2]` is too high, expanding the scale for `growthRate`. Make it smaller and refit. 

```{r GBRa1}
selMadd2m2 <- selMadd2m
selMadd2m2$a1[2] <- .05
#GBRgrowthMdl2a <- growthUpdateFn(GBRgrowthFit1$optim.out$par, selMadd2m2)
GBRgrowthFit1a <- fitSSM(selMadd2m2, inits=GBRgrowthFit1$optim.out$par, 
                        method = "BFGS", updatefn = growthUpdateFn)

GBRlbls2 <- GBRlbls
GBRlbls2$component <- 1
GBRgrowthPlt1a <- ggplotPath2(GBRgrowthFit1a, Time=selMadd$year, 
        object2=log(selMadd$gdppc), vlines=turningPts, 
        hlines =list(c(1, 3, 10, 30), 0), labels=GBRlbls2)
```

Increase the fontsize for the axis labels. 

```{r fontsize}
GBRgrowthPlt1b <- ggplotPath2(GBRgrowthFit1a, Time=selMadd$year, 
        object2=log(selMadd$gdppc), vlines=turningPts, 
        hlines =list(c(1, 3, 10, 30), 0), labels=GBRlbls2, 
        fontsize=20)
```

Write to `svg` file. 

```{r svg}
svg('GBRgrowthRate.svg')
GBRgrowthPlt1b
dev.off()
```

## References
```{r results = "asis", echo = FALSE}
PrintBibliography(bib)
```