Kalman smoothing MaddisonData

Abstract

We use the KFAS (Kalman Filtering and Smoothing) R Package for Exponential Family State Space Models to obtain separate estimates of GDP per capita and growth rate for each country in MaddisonData. This approach supports alternative models of economic growth including the impact of changes in heads of state, wars, and human rights including media policies. The methodology can be easily adapted to model population growth and changes on other variables.

Basic setup

It is common to work with log(dollars) rather than dollars directly. This is obvious to many who work with finance but foreign to many others. For the latter group, it’s worth noting that a $500 drop in annual income would have been catastrophic in 1790, representing almost half the average income at the time. However a similar $500 drop around 2015 would be hardly noticeable, being just under 1 percent of the 2015 average income. Moreover, a small change in logarithms can be interpreted as percentage, since log(1+x) is approximately x when x is small (using natural logarithms).

The models used here are all state space models, which we discuss in general here, giving specific examples later. State space modeling is also known as Kalman (Kalman filtering, Kalman smoothing), and dynamic (linear) modeling. For present purposes, we use the KFAS package described by (Helske, 2017), though other R packages provide some support for this class of models.

The theory behind these models can be described in terms of a 3-step Bayesian updating cycle to add new observations to a probability distribution summarizing the available knowledge about the state of the system under study (Graves, 2007):

  1. Receive and store new data.

  2. Create a prior distribution for the new data by modifying the posterior computed after the last observation to allow for possible changes in state in the time between the last and current observations. The “migration variance” is assumed to be proportional to the time since the last observation. That’s important with irregularly sampled series like GDP per capita and population for some countries. Also, the rate of growth in GDP per capita may change substantially due to events like a change in head of state, as between Charles I and English Council of State that succeeded Charles I in 1649 or between Calvin Coolidge, Herbert Hoover, Franklin Roosevelt and Harry Truman in the US between 1929 and 1945.

  3. Combine the new data with the prior to produce a posterior distribution of the state.

In the present discussion, the system under study can be GDP per capita (at purchasing power parity adjusted for inflation) and / or population at one or multiple countries. We summarize this system in a vector of one or more numbers that represent the state or condition of the system at a particular point in time. (The term “state” in “state space” refers to this type of representation.)

The first model of this type discussed here considers univariate observations \(y_t\) = log(real GDP per capita) on a hidden univariate state \(\alpha_t\), which represents the portion of the annual number that persists. The differences between the observations and predictions based on the hidden state are assumed to be uncorrelated across time \(t\); any serial correlation in \(y_t\) is assumed to be due to the hidden state, \(\alpha_t\).

With univariate observations on a univariate state, the result is almost a traditional exponentially weighted moving average (EWMA), except that the weight on the last observation is adjusted with each observation consistent with the Bayesian sequential updating cycle outlined above. With annual observations with no recent missing values, this weight converges to an asymptote determined by the ratio of the migration and observation error variances (Graves, Bisgaard, and Kulahci, 2002).

In the more general Kalman filtering terminology, the weight on the last observation (or the weight on the deviation of that observation from prediction) is called the Kalman gain. Bayesian sequential updating produces a varying Kalman gain, though in many applications the Kalman gain converges to a constant. Before fitting any model, we first describe the general KFAS model and show how the Bayesian EWMA described above is a special case. This formulation allows the system to vary over time supporting asynchronous observations on different variables at irregular points in time.

The next generalization beyond an EWMA is has a two-dimensional state vector with the first dimension being the level and the second being the rate of growth.

1. The simplest Kalman Filter: an Exponentially Weighted Moving Average (EWMA)

Consider a univariate noisy measurement, \(y_t\), of a univariate state, \(\alpha_t\):

\[y_t = \alpha_t + \epsilon_t\]

where the state, \(\alpha_t\), follows a random walk:

\[\alpha_{t+1} = \alpha_t + \eta_t.\]

In this model, we assume that observation error, \(\epsilon_t\), and the random migration, \(\eta_t\), are normally distributed with mean 0 and variance, \(H\) and \(Q_t\), respectively, and the initial state \(\alpha_1\) is normal with mean \(a_1\) and variance \(P_1\). We sometimes write \(a_1\) and \(P_1\) as \(a_{1|0}\) and \(P_{1|0}\) to make it clear that these are the prior mean and variance of \(\alpha_1\) before \(y_1\) is available (using the conditional subscript notation, following, e.g., Harvey (1989)). We may also use \(a_{t|t}\) and \(P_{t|t}\) to denote the posterior mean and variance given \(D_t = \{y_t, y_{t-1}, ...\}\). The prior distribution for \(\alpha_t\) given \(D_{t-1}\) and the posterior for \(\alpha_t\) given \(D_t\) will be computed recursively. (See Durbin and Koopman (2012b) for more detail on this model.)

These equations are called the measurement and state transition equations, respectively.

We will estimate \(H\) and \(Q\) to maximize the likelihood. We will also estimate the series \(\alpha_t\) conditional on these estimates \(H\), \(Q\), \(a_1\) and \(P_1\).

With this notation, we can provide more details in discussing the 3-step Bayesian sequential updating process outlined above:

  1. A new observation, \(y_t\), arrives.

  2. The prior distribution \(f(\alpha_t | D_{t-1})\) is computed from the previous posterior \(f(\alpha_{t-1} | D_{t-1})\), where \(D_{t-1}\) is all the information available prior to the arrival of \(y_t\); for \(t\) = 0, this is already assumed as \(\alpha_1 \tilde\ N(a_{1|0}, P_{1|0})\). Otherwise, we have by recursion that the posterior for \((\alpha_{t-1} | D_{t-1})\) is \(N(a_{t-1|t-1}, P_{t-1|t-1})\). From this using a linear state transition equation, we get that the next prior \((\alpha_t | D_{t-1})\) is \(N(a_{t|t-1}, P_{t|t-1})\). In particular, the state transition equation above gives us \(a_{t|t-1} = a_{t-1|t-1}\) and \(P_{t|t-1} = P_{t-1|t-1} + Q.\) (The double subscripts help clarify the math, which can be confusing otherwise.)

  3. Bayes’ theorem is then used to update this prior to incorporate the information in the new observation. This converts \(f(\alpha_t | D_{t-1})\) into \(f(\alpha_t | D_t)\).

The distribution of the state vector \(\alpha_t\) is governed by certain hyperparameters, suppressed in the notation \(f(\alpha_t | D_t)\). These hyperparameters are estimated by maximum likelihood.

1a. KFAS package for Kalman Filtering and Smoothing

State space modeling with the KFAS package assumes the following generalization of the above measurement and state transition equations:

\[y_t = Z_t \alpha_t + \epsilon_t\]

\[\alpha_{t+1} = T_t \alpha_t + R_t \eta_t.\]

where \(\epsilon_t \tilde\ N(0, H_t)\), \(\eta_t \tilde\ N(0, Q_t)\) and \(\alpha_1 \tilde\ N(a_1, P_1)\), independent of each other. In this, \(y_t\) is a \(p\)-vector (where \(p\) may actually vary with \(t\)), \(\alpha_t\) is \(m\)-dimensional, and \(\eta_t\) has dimension \(k\). (See Durbin and Koopman (2012a) for more detail on this model.)

We initially assume that \(y_t\) is log(realGDPperCapita) for a single country. Generalizations could assume that \(y_t\) could include log(pop) for that country or even for multiple countries.

The dimensionality of \(\alpha_t\) for different models, but the first component will always be the level. Thus, with a univariate \(y_t\), \(Z_t\) will be a row vector with 1 in the first position and 0 elsewhere. This will force \(Q_t\), \(T_t\), and \(P_1\) to change between models. In addition, some models have diffuse initialization for the state vector. These are specified by a model component \(P_{1,\infty}\); see or SSmodel. And the different models require estimating different parameters.

1b. Estimate the Bayesian EWMA

KFAS modeling begins with a formula that is converted to an “SSModel” object using the SSModel function. This is passed to fitSSM to estimate unknown parameters to maximize logLik.SSModel. Confidence and prediction intervals can be obtained using predict.SSModel. Function KFS will produce filtered and smoothed estimates of the state vector in components “a” and “alphahat”, respectively, of an object of class “KFS”:

formula -> SSModel -> fitSSM -> KFS

Let’s write a suitable formula for our Bayesian EWMA:

suppressMessages(library(KFAS)) # cite Helske (2017)
EWMA1fmla <- (log(gdppc)~
  SSMtrend(1,Q=array(NA, c(1, 1, length(gdppc))),
           a1=log(gdppc[1]),ynames='lnGDPperCap'))

We use this to create an SSModel as follows:

library(MaddisonData)
selMadd <- subset(MaddisonData, (ISO=='GBR') & !is.na(gdppc))
EWMA1mdl <-SSModel(EWMA1fmla, selMadd, H=matrix(NA) )
is.SSModel(EWMA1mdl) # TRUE
## [1] TRUE

This model has 2 unknowns, the NA’s in EWMA1fmla and SSModel:

  • Q = the state migration variance

  • H = observation error variance.

The default behavior of fitSSM is to estimate the logarithms of these two variance. fitSSM requires us to provide starting values for log(Q) and log(H), in that order.

However, these two parameters may not be independently estimable. If so, that could create convergence problems with nonlinear estimation. Therefore, let’s first start by forcing the two to be equal.

EWMA1updateFn1 <- function(pars, model){
# 1 parameter = variances of both 1-year migration and observation.
  dYr. <- diff(selMadd$year)
  dYr <- c(5*max(dYr.), dYr.)
  v1 <- exp(pars[1])
  Q <- array(dYr*v1, dim=c(1, 1, length(dYr)))
  model$Q <- Q
  model$H[1,1,1] <- v1 # observation variance 
  model
}
# check 
EWMA1mdl1 <- EWMA1updateFn1(rep(0, 2), EWMA1mdl)
class(EWMA1mdl1)
## [1] "SSModel"
is.SSModel(EWMA1mdl1)
## [1] TRUE

Now fit.

EWMA1fit1 <- fitSSM(EWMA1mdl1, inits=c(lnQ_H=0), 
                   updatefn = EWMA1updateFn1)
## Warning in optim(par = inits, fn = likfn, model = model, ...): one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly

We get, ‘Warning in optim(par = inits, fn = likfn, model = model, ...) : one-dimensional optimization by Nelder-Mead is unreliable: use “Brent” or optimize() directly’.

KFS(EWMA1fit1$model) gave a small “Std. Error” relative to the “Estimate”, so it’s probably good. However, we’ll try “Brent” just in case.

EWMA1fit1b <- fitSSM(EWMA1mdl, inits=EWMA1fit1$optim.out$par, 
                   updatefn = EWMA1updateFn1, method="Brent", 
                   lower=-70, upper=70)
logLik(EWMA1fit1$model)
## [1] 1109.352
(EWMA1logLik1 <- logLik(EWMA1fit1b$model))
## [1] 1109.352
EWMA1fit1$optim.out$par
##     lnQ_H 
## -6.692188
EWMA1fit1b$optim.out$par
## [1] -6.692147

They are nearly identical: logLiks are the same, and par estimates are quite close.

What about the plot of the state?

EWMA1kfs <- KFS(EWMA1fit1b$model)
plot(selMadd$year, EWMA1kfs$a[-1], type='l')

We need to remove the first observation for the year 1000 as it does not provide enough information to justify the use of the space in the plot. Similarly, we want to identify and label apparent changes including isolated spikes and changes in slope.

plot(selMadd$year[22:70], EWMA1kfs$a[23:71], type='l')
abline(v=c(1291, 1296), col='grey', lty='dotted')

These data suggest changes in 1291 and 1296. The Welsh rebel leader Rhys ap Maredudd was captured in 1291, and Scots defeated England in the Battle of Stirling Bridge in 1297.

The Black Death devastated Europe from 1346 to 1353 and England in 1348 and 1349, returning again in 1361 and 1362. It ended the Hundred Years’ War between England and France in 1353, though another major campaign in France was launched in 1355.

sel1330_60 <- which(with(selMadd, (1329<year) & (year<1361)))
plot(selMadd$year[sel1330_60], EWMA1kfs$a[sel1330_60+1], type='l')
abline(v=c(1348, 1353), col='grey', lty='dotted')

What about the later fourteenth century?

sel1360_1400 <- which(with(selMadd, (1359<year) & (year<1401)))
plot(selMadd$year[sel1360_1400], EWMA1kfs$a[sel1360_1400+1], type='l')
abline(v=c(1375, 1392), col='grey', lty='dotted')

This suggests turning points in the slope in 1375 and 1392. The Wikipedia “Timeline of English history” does not suggest any obvious events associated during that period that might have driven the apparent increase in GDPpc during that period. We will not mark this on the plot.

The next major turning point was in 1649 when the English decapitated King Charles I during the English Civil War (1642 to 1651). There was economic instability during the reign of Queen Ann (1702 to 1714). And the slope increased after the Napoleonic Wars

sel1790_1830 <- which(with(selMadd, (1789<year) & (year<1831)))
plot(selMadd$year[sel1790_1830], EWMA1kfs$a[sel1790_1830+1], type='l')
abline(v=1819, col='grey', lty='dotted')

The plot indicates that the rate of economic growth of the United Kingeom began to increase after 1819. That’s the year of the Peterloo Massacre during which a crowd of roughly 60,000 humans assembled nonviolently to demand a reform of parliamentary representation and were attacked by a regiment of the English cavalry. Eighteen were killed and between 500 and 700 were injured. While the authorities initially cracked down on dissent and freedom of the press, Peterloo marked “the point of final conversion of provincial England to the struggle for enfranchisement of the working class”, as described by White (1957).

turningPts <- c(1291, 1296, 1348, 1353, 1642, 1651, 1702, 1714, 1819, 
                1914, 1918, 1939, 1945, 2007, 2009, 2016, 2020) 
GBRlbls <- data.frame(x=c(mean(turningPts[1:2]),
                          mean(turningPts[3:4]), 
                          mean(turningPts[5:6]), 
                          mean(turningPts[7:8]), 
                               turningPts[9], 
                          mean(turningPts[10:11]), 
                          mean(turningPts[12:13]), 
                          mean(turningPts[14:15]), 
                          mean(turningPts[16:17]) ), 
                       y=c(6.5, 4, 7, 6, 7, 3, 5, 5, 23), 
          label=c('Conflicts w Welsh, Scots', 'Black Death', 
                  'English Civil War', 'Queen Ann', 'Peterloo', 'WW1', 'WW2', 
                  'Great Recession', 'Brexit'),
          srt=90)
(GBR1 <- ggplotPath('year', 'gdppc', data=selMadd[-1, ], scaley=1000,
      ylab='GBR GDPpc (2011 K$ppp)', vlines=turningPts,
      labels=GBRlbls))

What happens when we try to get separate estimates of Q and H?

EWMA1updateFn2 <- function(pars, model){
# 2-parameters = variances of [1] 1-year migration and [2]observation.
  dYr. <- diff(selMadd$year)
  dYr <- c(5*max(dYr.), dYr.)
  Q <- array(dYr*exp(pars[1]), dim=c(1, 1, length(dYr)))
  model$Q <- Q
  model$H[1,1,1] <- exp(pars[2]) # observation variance 
  model
}
# check 
class(EWMA1mdl)
## [1] "SSModel"
EWMA1init2 <- rep(EWMA1fit1b$optim.out$par, 2)
names(EWMA1init2) <- c('lnQ', 'lnH')
EWMA1mdl2 <- EWMA1updateFn2(EWMA1init2, EWMA1mdl)
is.SSModel(EWMA1mdl2)
## [1] TRUE

Now fit.

#fit_drivers <- fitSSM(model_drivers, inits = rep(-1, 6),
#                      updatefn = ownupdatefn, method = "BFGS")
EWMA1fit2 <- fitSSM(EWMA1mdl, inits=EWMA1init2, method = "BFGS", 
                   updatefn = EWMA1updateFn2)
logLik(EWMA1fit2$model)
## [1] 1109.561
logLik(EWMA1fit1b$model)
## [1] 1109.352
(EWMA1chisq2 <- (logLik(EWMA1fit2$model)-logLik(EWMA1fit1b$model)))
## [1] 0.2086988

Conclusions:

  1. The one-year migration variance is not significantly different from the observation variance.

  2. We can select the ratio between the two over some reasonable range to give the best image in smoothing the series.

  3. However, let’s first consider modeling the series with a two-dimensional state of (level, growthRate) or (level, slope).

2. Estimate mean and slope

GBRgrowthMdl1 <- growthModel(exp(EWMA1fit1$optim.out$par/2), 
                                       selMadd$gdppc)
#EWMA2f. <- (log(gdppc)~ -1 + SSMbespoke(GBRgrowthMdl1, selMadd$gdppc) ) #doesn't wk
growthFormula <- (log(gdppc)~ -1 + SSMbespoke(
             growthModel(exp(EWMA1fit1$optim.out$par/2), selMadd$gdppc) )) 
# EWMA2m. <-SSModel(EWMA2f., selMadd, H=matrix(NA) )#doesn't work
selMadd2m <-SSModel(growthFormula, selMadd, H=matrix(NA) )
class(selMadd2m)
## [1] "SSModel"

Next:

library(MaddisonData)
is.SSModel(selMadd2m)
## [1] TRUE
GBRgrowthMdl2 <- growthUpdateFn(EWMA1fit1$optim.out$par, selMadd2m)
class(GBRgrowthMdl2)
## [1] "SSModel"
is.SSModel(GBRgrowthMdl2)
## [1] TRUE
GBRgrowthFit1 <- fitSSM(selMadd2m, inits=EWMA1fit1$optim.out$par, 
                        method = "BFGS", updatefn = growthUpdateFn)
(GBRgrowthLogLik1 <- logLik(GBRgrowthFit1$model))
## [1] 967.3482
EWMA1logLik1
## [1] 1109.352

Problem: GBRgrowthLogLik1 = 967.3481526 is much than EWMA1logLik1 = 1109.3522611: It should be greater, unless I misunderstand the likelihood computation.

If we only allow Q[1, 1] to be nonzero, then fitSSM(selMadd2m, ...) should be equivalent to EWMA1fit1 … but ignore that for the moment.

is.SSModel(selMadd2m)
## [1] TRUE
GBRgrowthLogLik1
## [1] 967.3482
EWMA1logLik1
## [1] 1109.352

Let’s plot the level and slope from GBRgrowthLogLik1.

GBRgrowthPlt1 <- ggplotPath2(GBRgrowthFit1)

The default value for a1[2] is too high, expanding the scale for growthRate. Make it smaller and refit.

selMadd2m2 <- selMadd2m
selMadd2m2$a1[2] <- .05
#GBRgrowthMdl2a <- growthUpdateFn(GBRgrowthFit1$optim.out$par, selMadd2m2)
GBRgrowthFit1a <- fitSSM(selMadd2m2, inits=GBRgrowthFit1$optim.out$par, 
                        method = "BFGS", updatefn = growthUpdateFn)

GBRlbls2 <- GBRlbls
GBRlbls2$component <- 1
GBRgrowthPlt1a <- ggplotPath2(GBRgrowthFit1a, Time=selMadd$year, 
        object2=log(selMadd$gdppc), vlines=turningPts, 
        hlines =list(c(1, 3, 10, 30), 0), labels=GBRlbls2)

Increase the fontsize for the axis labels.

GBRgrowthPlt1b <- ggplotPath2(GBRgrowthFit1a, Time=selMadd$year, 
        object2=log(selMadd$gdppc), vlines=turningPts, 
        hlines =list(c(1, 3, 10, 30), 0), labels=GBRlbls2, 
        fontsize=20)

Write to svg file.

svg('GBRgrowthRate.svg')
GBRgrowthPlt1b
dev.off()
## png 
##   2

References

[1] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods, 2nd ed. New York: Oxford U. Pr., 2012.

[2] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods, 2nd ed. New York: Oxford U. Pr., 2012. Chap. 2.

[3] S. Graves. “Bayes’ Rule of Information and Monitoring in Manufacturing Integrated Circuits”. In: Bayesian Process Monitoring, Control and Optimization. Ed. by B. M. Colosimo and E. del Castillo. Boca Raton, FL: Chapman and Hall, 2007, pp. 187-244.

[4] S. Graves, S. Bisgaard, and M. Kulahci. Designing Bayesian EWMA Monitors Using Gage R & R and Reliability Data. Tech. rep. Productive Systems Engineering, 2002.

[5] A. C. Harvey. Forecasting, Structural Time Series Models and the Kalman Filter. New York: Cambridge U. Pr., 1989.

[6] J. Helske. “KFAS: Exponential Family State Space Models in R”. In: Journal of Statistical Software 78 (2017), pp. 1-39. DOI: 10.18637/jss.v078.i10.

[7] R. J. White. Waterloo to Peterloo. cited from the Wikipedia article on ’Peterloo Massacre. https://en.wikipedia.org/wiki/Peterloo_Massacre, accessed 2026-02-05. William Heinemann Ltd, 1957.