We use the KFAS (Kalman
Filtering and Smoothing) R Package for Exponential Family State
Space Models to obtain separate estimates of GDP per capita and growth
rate for each country in MaddisonData. This approach
supports alternative models of economic growth including the impact of
changes in heads of state, wars, and human rights including media
policies. The methodology can be easily adapted to model population
growth and changes on other variables.
It is common to work with log(dollars) rather than dollars directly. This is obvious to many who work with finance but foreign to many others. For the latter group, it’s worth noting that a $500 drop in annual income would have been catastrophic in 1790, representing almost half the average income at the time. However a similar $500 drop around 2015 would be hardly noticeable, being just under 1 percent of the 2015 average income. Moreover, a small change in logarithms can be interpreted as percentage, since log(1+x) is approximately x when x is small (using natural logarithms).
The models used here are all state space models, which we discuss in general here, giving specific examples later. State space modeling is also known as Kalman (Kalman filtering, Kalman smoothing), and dynamic (linear) modeling. For present purposes, we use the KFAS package described by (Helske, 2017), though other R packages provide some support for this class of models.
The theory behind these models can be described in terms of a 3-step Bayesian updating cycle to add new observations to a probability distribution summarizing the available knowledge about the state of the system under study (Graves, 2007):
Receive and store new data.
Create a prior distribution for the new data by modifying the posterior computed after the last observation to allow for possible changes in state in the time between the last and current observations. The “migration variance” is assumed to be proportional to the time since the last observation. That’s important with irregularly sampled series like GDP per capita and population for some countries. Also, the rate of growth in GDP per capita may change substantially due to events like a change in head of state, as between Charles I and English Council of State that succeeded Charles I in 1649 or between Calvin Coolidge, Herbert Hoover, Franklin Roosevelt and Harry Truman in the US between 1929 and 1945.
Combine the new data with the prior to produce a posterior distribution of the state.
In the present discussion, the system under study can be GDP per capita (at purchasing power parity adjusted for inflation) and / or population at one or multiple countries. We summarize this system in a vector of one or more numbers that represent the state or condition of the system at a particular point in time. (The term “state” in “state space” refers to this type of representation.)
The first model of this type discussed here considers univariate observations \(y_t\) = log(real GDP per capita) on a hidden univariate state \(\alpha_t\), which represents the portion of the annual number that persists. The differences between the observations and predictions based on the hidden state are assumed to be uncorrelated across time \(t\); any serial correlation in \(y_t\) is assumed to be due to the hidden state, \(\alpha_t\).
With univariate observations on a univariate state, the result is almost a traditional exponentially weighted moving average (EWMA), except that the weight on the last observation is adjusted with each observation consistent with the Bayesian sequential updating cycle outlined above. With annual observations with no recent missing values, this weight converges to an asymptote determined by the ratio of the migration and observation error variances (Graves, Bisgaard, and Kulahci, 2002).
In the more general Kalman filtering terminology, the weight on the last observation (or the weight on the deviation of that observation from prediction) is called the Kalman gain. Bayesian sequential updating produces a varying Kalman gain, though in many applications the Kalman gain converges to a constant. Before fitting any model, we first describe the general KFAS model and show how the Bayesian EWMA described above is a special case. This formulation allows the system to vary over time supporting asynchronous observations on different variables at irregular points in time.
The next generalization beyond an EWMA is has a two-dimensional state vector with the first dimension being the level and the second being the rate of growth.
Consider a univariate noisy measurement, \(y_t\), of a univariate state, \(\alpha_t\):
\[y_t = \alpha_t + \epsilon_t\]
where the state, \(\alpha_t\), follows a random walk:
\[\alpha_{t+1} = \alpha_t + \eta_t.\]
In this model, we assume that observation error, \(\epsilon_t\), and the random migration, \(\eta_t\), are normally distributed with mean 0 and variance, \(H\) and \(Q_t\), respectively, and the initial state \(\alpha_1\) is normal with mean \(a_1\) and variance \(P_1\). We sometimes write \(a_1\) and \(P_1\) as \(a_{1|0}\) and \(P_{1|0}\) to make it clear that these are the prior mean and variance of \(\alpha_1\) before \(y_1\) is available (using the conditional subscript notation, following, e.g., Harvey (1989)). We may also use \(a_{t|t}\) and \(P_{t|t}\) to denote the posterior mean and variance given \(D_t = \{y_t, y_{t-1}, ...\}\). The prior distribution for \(\alpha_t\) given \(D_{t-1}\) and the posterior for \(\alpha_t\) given \(D_t\) will be computed recursively. (See Durbin and Koopman (2012b) for more detail on this model.)
These equations are called the measurement and state transition equations, respectively.
We will estimate \(H\) and \(Q\) to maximize the likelihood. We will also estimate the series \(\alpha_t\) conditional on these estimates \(H\), \(Q\), \(a_1\) and \(P_1\).
With this notation, we can provide more details in discussing the 3-step Bayesian sequential updating process outlined above:
A new observation, \(y_t\), arrives.
The prior distribution \(f(\alpha_t | D_{t-1})\) is computed from the previous posterior \(f(\alpha_{t-1} | D_{t-1})\), where \(D_{t-1}\) is all the information available prior to the arrival of \(y_t\); for \(t\) = 0, this is already assumed as \(\alpha_1 \tilde\ N(a_{1|0}, P_{1|0})\). Otherwise, we have by recursion that the posterior for \((\alpha_{t-1} | D_{t-1})\) is \(N(a_{t-1|t-1}, P_{t-1|t-1})\). From this using a linear state transition equation, we get that the next prior \((\alpha_t | D_{t-1})\) is \(N(a_{t|t-1}, P_{t|t-1})\). In particular, the state transition equation above gives us \(a_{t|t-1} = a_{t-1|t-1}\) and \(P_{t|t-1} = P_{t-1|t-1} + Q.\) (The double subscripts help clarify the math, which can be confusing otherwise.)
Bayes’ theorem is then used to update this prior to incorporate the information in the new observation. This converts \(f(\alpha_t | D_{t-1})\) into \(f(\alpha_t | D_t)\).
The distribution of the state vector \(\alpha_t\) is governed by certain hyperparameters, suppressed in the notation \(f(\alpha_t | D_t)\). These hyperparameters are estimated by maximum likelihood.
State space modeling with the KFAS package assumes the following generalization of the above measurement and state transition equations:
\[y_t = Z_t \alpha_t + \epsilon_t\]
\[\alpha_{t+1} = T_t \alpha_t + R_t \eta_t.\]
where \(\epsilon_t \tilde\ N(0, H_t)\), \(\eta_t \tilde\ N(0, Q_t)\) and \(\alpha_1 \tilde\ N(a_1, P_1)\), independent of each other. In this, \(y_t\) is a \(p\)-vector (where \(p\) may actually vary with \(t\)), \(\alpha_t\) is \(m\)-dimensional, and \(\eta_t\) has dimension \(k\). (See Durbin and Koopman (2012a) for more detail on this model.)
We initially assume that \(y_t\) is
log(realGDPperCapita) for a single country. Generalizations
could assume that \(y_t\) could include
log(pop) for that country or even for multiple
countries.
The dimensionality of \(\alpha_t\) for different models, but the first component will always be the level. Thus, with a univariate \(y_t\), \(Z_t\) will be a row vector with 1 in the first position and 0 elsewhere. This will force \(Q_t\), \(T_t\), and \(P_1\) to change between models. In addition, some models have diffuse initialization for the state vector. These are specified by a model component \(P_{1,\infty}\); see or SSmodel. And the different models require estimating different parameters.
KFAS modeling begins with a formula that is converted to an “SSModel” object using the SSModel function. This is passed to fitSSM to estimate unknown parameters to maximize logLik.SSModel. Confidence and prediction intervals can be obtained using predict.SSModel. Function KFS will produce filtered and smoothed estimates of the state vector in components “a” and “alphahat”, respectively, of an object of class “KFS”:
Let’s write a suitable formula for our Bayesian EWMA:
suppressMessages(library(KFAS)) # cite Helske (2017)
EWMA1fmla <- (log(gdppc)~
SSMtrend(1,Q=array(NA, c(1, 1, length(gdppc))),
a1=log(gdppc[1]),ynames='lnGDPperCap'))We use this to create an SSModel as follows:
library(MaddisonData)
selMadd <- subset(MaddisonData, (ISO=='GBR') & !is.na(gdppc))
EWMA1mdl <-SSModel(EWMA1fmla, selMadd, H=matrix(NA) )
is.SSModel(EWMA1mdl) # TRUE## [1] TRUE
This model has 2 unknowns, the NA’s in EWMA1fmla and SSModel:
Q = the state migration variance
H = observation error variance.
The default behavior of fitSSM
is to estimate the logarithms of these two variance. fitSSM
requires us to provide starting values for log(Q) and log(H), in that
order.
However, these two parameters may not be independently estimable. If so, that could create convergence problems with nonlinear estimation. Therefore, let’s first start by forcing the two to be equal.
EWMA1updateFn1 <- function(pars, model){
# 1 parameter = variances of both 1-year migration and observation.
dYr. <- diff(selMadd$year)
dYr <- c(5*max(dYr.), dYr.)
v1 <- exp(pars[1])
Q <- array(dYr*v1, dim=c(1, 1, length(dYr)))
model$Q <- Q
model$H[1,1,1] <- v1 # observation variance
model
}
# check
EWMA1mdl1 <- EWMA1updateFn1(rep(0, 2), EWMA1mdl)
class(EWMA1mdl1)## [1] "SSModel"
## [1] TRUE
Now fit.
## Warning in optim(par = inits, fn = likfn, model = model, ...): one-dimensional optimization by Nelder-Mead is unreliable:
## use "Brent" or optimize() directly
We get, ‘Warning in
optim(par = inits, fn = likfn, model = model, ...) :
one-dimensional optimization by Nelder-Mead is unreliable: use “Brent”
or optimize() directly’.
KFS(EWMA1fit1$model) gave a small “Std. Error” relative
to the “Estimate”, so it’s probably good. However, we’ll try “Brent”
just in case.
EWMA1fit1b <- fitSSM(EWMA1mdl, inits=EWMA1fit1$optim.out$par,
updatefn = EWMA1updateFn1, method="Brent",
lower=-70, upper=70)
logLik(EWMA1fit1$model)## [1] 1109.352
## [1] 1109.352
## lnQ_H
## -6.692188
## [1] -6.692147
They are nearly identical: logLiks are the same, and
par estimates are quite close.
What about the plot of the state?
We need to remove the first observation for the year 1000 as it does not provide enough information to justify the use of the space in the plot. Similarly, we want to identify and label apparent changes including isolated spikes and changes in slope.
plot(selMadd$year[22:70], EWMA1kfs$a[23:71], type='l')
abline(v=c(1291, 1296), col='grey', lty='dotted')These data suggest changes in 1291 and 1296. The Welsh rebel leader Rhys ap Maredudd was captured in 1291, and Scots defeated England in the Battle of Stirling Bridge in 1297.
The Black Death devastated Europe from 1346 to 1353 and England in 1348 and 1349, returning again in 1361 and 1362. It ended the Hundred Years’ War between England and France in 1353, though another major campaign in France was launched in 1355.
sel1330_60 <- which(with(selMadd, (1329<year) & (year<1361)))
plot(selMadd$year[sel1330_60], EWMA1kfs$a[sel1330_60+1], type='l')
abline(v=c(1348, 1353), col='grey', lty='dotted')What about the later fourteenth century?
sel1360_1400 <- which(with(selMadd, (1359<year) & (year<1401)))
plot(selMadd$year[sel1360_1400], EWMA1kfs$a[sel1360_1400+1], type='l')
abline(v=c(1375, 1392), col='grey', lty='dotted')This suggests turning points in the slope in 1375 and 1392. The
Wikipedia “Timeline
of English history” does not suggest any obvious events associated
during that period that might have driven the apparent increase in
GDPpc during that period. We will not mark this on the
plot.
The next major turning point was in 1649 when the English decapitated King Charles I during the English Civil War (1642 to 1651). There was economic instability during the reign of Queen Ann (1702 to 1714). And the slope increased after the Napoleonic Wars
sel1790_1830 <- which(with(selMadd, (1789<year) & (year<1831)))
plot(selMadd$year[sel1790_1830], EWMA1kfs$a[sel1790_1830+1], type='l')
abline(v=1819, col='grey', lty='dotted')The plot indicates that the rate of economic growth of the United Kingeom began to increase after 1819. That’s the year of the Peterloo Massacre during which a crowd of roughly 60,000 humans assembled nonviolently to demand a reform of parliamentary representation and were attacked by a regiment of the English cavalry. Eighteen were killed and between 500 and 700 were injured. While the authorities initially cracked down on dissent and freedom of the press, Peterloo marked “the point of final conversion of provincial England to the struggle for enfranchisement of the working class”, as described by White (1957).
turningPts <- c(1291, 1296, 1348, 1353, 1642, 1651, 1702, 1714, 1819,
1914, 1918, 1939, 1945, 2007, 2009, 2016, 2020)
GBRlbls <- data.frame(x=c(mean(turningPts[1:2]),
mean(turningPts[3:4]),
mean(turningPts[5:6]),
mean(turningPts[7:8]),
turningPts[9],
mean(turningPts[10:11]),
mean(turningPts[12:13]),
mean(turningPts[14:15]),
mean(turningPts[16:17]) ),
y=c(6.5, 4, 7, 6, 7, 3, 5, 5, 23),
label=c('Conflicts w Welsh, Scots', 'Black Death',
'English Civil War', 'Queen Ann', 'Peterloo', 'WW1', 'WW2',
'Great Recession', 'Brexit'),
srt=90)
(GBR1 <- ggplotPath('year', 'gdppc', data=selMadd[-1, ], scaley=1000,
ylab='GBR GDPpc (2011 K$ppp)', vlines=turningPts,
labels=GBRlbls))What happens when we try to get separate estimates of Q and H?
EWMA1updateFn2 <- function(pars, model){
# 2-parameters = variances of [1] 1-year migration and [2]observation.
dYr. <- diff(selMadd$year)
dYr <- c(5*max(dYr.), dYr.)
Q <- array(dYr*exp(pars[1]), dim=c(1, 1, length(dYr)))
model$Q <- Q
model$H[1,1,1] <- exp(pars[2]) # observation variance
model
}
# check
class(EWMA1mdl)## [1] "SSModel"
EWMA1init2 <- rep(EWMA1fit1b$optim.out$par, 2)
names(EWMA1init2) <- c('lnQ', 'lnH')
EWMA1mdl2 <- EWMA1updateFn2(EWMA1init2, EWMA1mdl)
is.SSModel(EWMA1mdl2)## [1] TRUE
Now fit.
#fit_drivers <- fitSSM(model_drivers, inits = rep(-1, 6),
# updatefn = ownupdatefn, method = "BFGS")
EWMA1fit2 <- fitSSM(EWMA1mdl, inits=EWMA1init2, method = "BFGS",
updatefn = EWMA1updateFn2)
logLik(EWMA1fit2$model)## [1] 1109.561
## [1] 1109.352
## [1] 0.2086988
Conclusions:
The one-year migration variance is not significantly different from the observation variance.
We can select the ratio between the two over some reasonable range to give the best image in smoothing the series.
However, let’s first consider modeling the series with a
two-dimensional state of (level, growthRate) or
(level, slope).
GBRgrowthMdl1 <- growthModel(exp(EWMA1fit1$optim.out$par/2),
selMadd$gdppc)
#EWMA2f. <- (log(gdppc)~ -1 + SSMbespoke(GBRgrowthMdl1, selMadd$gdppc) ) #doesn't wk
growthFormula <- (log(gdppc)~ -1 + SSMbespoke(
growthModel(exp(EWMA1fit1$optim.out$par/2), selMadd$gdppc) )) # EWMA2m. <-SSModel(EWMA2f., selMadd, H=matrix(NA) )#doesn't work
selMadd2m <-SSModel(growthFormula, selMadd, H=matrix(NA) )
class(selMadd2m)## [1] "SSModel"
Next:
## [1] TRUE
## [1] "SSModel"
## [1] TRUE
GBRgrowthFit1 <- fitSSM(selMadd2m, inits=EWMA1fit1$optim.out$par,
method = "BFGS", updatefn = growthUpdateFn)
(GBRgrowthLogLik1 <- logLik(GBRgrowthFit1$model))## [1] 967.3482
## [1] 1109.352
Problem: GBRgrowthLogLik1 = 967.3481526 is much than
EWMA1logLik1 = 1109.3522611: It should be greater, unless I
misunderstand the likelihood computation.
If we only allow Q[1, 1] to be nonzero, then
fitSSM(selMadd2m, ...) should be equivalent to
EWMA1fit1 … but ignore that for the moment.
## [1] TRUE
## [1] 967.3482
## [1] 1109.352
Let’s plot the level and slope from
GBRgrowthLogLik1.
The default value for a1[2] is too high, expanding the
scale for growthRate. Make it smaller and refit.
selMadd2m2 <- selMadd2m
selMadd2m2$a1[2] <- .05
#GBRgrowthMdl2a <- growthUpdateFn(GBRgrowthFit1$optim.out$par, selMadd2m2)
GBRgrowthFit1a <- fitSSM(selMadd2m2, inits=GBRgrowthFit1$optim.out$par,
method = "BFGS", updatefn = growthUpdateFn)
GBRlbls2 <- GBRlbls
GBRlbls2$component <- 1
GBRgrowthPlt1a <- ggplotPath2(GBRgrowthFit1a, Time=selMadd$year,
object2=log(selMadd$gdppc), vlines=turningPts,
hlines =list(c(1, 3, 10, 30), 0), labels=GBRlbls2)Increase the fontsize for the axis labels.
GBRgrowthPlt1b <- ggplotPath2(GBRgrowthFit1a, Time=selMadd$year,
object2=log(selMadd$gdppc), vlines=turningPts,
hlines =list(c(1, 3, 10, 30), 0), labels=GBRlbls2,
fontsize=20)Write to svg file.
## png
## 2
[1] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods, 2nd ed. New York: Oxford U. Pr., 2012.
[2] J. Durbin and S. J. Koopman. Time Series Analysis by State Space Methods, 2nd ed. New York: Oxford U. Pr., 2012. Chap. 2.
[3] S. Graves. “Bayes’ Rule of Information and Monitoring in Manufacturing Integrated Circuits”. In: Bayesian Process Monitoring, Control and Optimization. Ed. by B. M. Colosimo and E. del Castillo. Boca Raton, FL: Chapman and Hall, 2007, pp. 187-244.
[4] S. Graves, S. Bisgaard, and M. Kulahci. Designing Bayesian EWMA Monitors Using Gage R & R and Reliability Data. Tech. rep. Productive Systems Engineering, 2002.
[5] A. C. Harvey. Forecasting, Structural Time Series Models and the Kalman Filter. New York: Cambridge U. Pr., 1989.
[6] J. Helske. “KFAS: Exponential Family State Space Models in R”. In: Journal of Statistical Software 78 (2017), pp. 1-39. DOI: 10.18637/jss.v078.i10.
[7] R. J. White. Waterloo to Peterloo. cited from the Wikipedia article on ’Peterloo Massacre. https://en.wikipedia.org/wiki/Peterloo_Massacre, accessed 2026-02-05. William Heinemann Ltd, 1957.