Title: | Data Sets for Econometrics |
---|---|
Description: | Data sets for econometrics, including political science. |
Authors: | Yves Croissant [aut, cre], Spencer Graves [aut, ctb] |
Maintainer: | Spencer Graves <[email protected]> |
License: | GPL (>=2) |
Version: | 0.4-3 |
Built: | 2024-11-16 15:20:40 UTC |
Source: | https://github.com/sbgraves237/ecdat |
a cross-section
number of observations : 40
data(Accident)
data(Accident)
A dataframe containing :
ship type, a factor with levels (A,B,C,D,E)
year constructed, a factor with levels (C6064,C6569,C7074,C7579)
year operated, a factor with levels (O6074,O7579)
measure of service amount
accidents
McCullagh, P. and J. Nelder (1983) Generalized Linear Models, New York:Chapman and Hall.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F21.3.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
Accountants and auditors as a percent of the US labor force 1850 to 2016 updating the analysis in Wyatt and Hecker (2006).
data(AccountantsAuditorsPct)
data(AccountantsAuditorsPct)
a numeric vector of length 30 giving the percent of the US labor force in "Accounting and Auditing" each decade from 1850 to 2010 except for 1940 plus each year between 2011 and 2016.
This is based primarily on data extracted from the
Integrated
Public Use Microdata Series on 2018-09-01 with
the computations documented in a vignette by this
title in the Ecfun
package.
This updates the data on Accountants and Auditors in Wyatt and Hecker (2006). They relied primarily on data extracted from the Integrated Public Use Microdata Series. This follows the same methodology with two modifications:
1. IPUMS provided no data for 1940. Wyatt and Hecker (2006) used Historical Statistics of the United States, Colonial Times to 1970, Bicentennial Edition, part 1 (U.S. Department of Commerce, Bureau of the Census, 1975) for 1910-1940. The current data set uses that source only for 1040.
2. The IPUMS numbers showed an extreme jump from 1850 to 1860 followed by an even more extreme drop to 1870. The numbers in Sobek (2006) showed essentially the same behavior. Specifically, Sobek (2006) estimated the number of accountants and auditors in the US in those three years as 700, 1700, and 1200, and the labor force as 5277000, 8160800, and 12004200. These numbers give accountants and auditors as 0.013, 0.021, and 0.010 percent of the labor force, respectively for those three years. These numbers portray an incredible increase in the employment of accountants and auditors between 1850 and 1860 followed by a shocking decline the following decade. If, however, we swap the 1700 and 1200 between 1860 and 1870, the percentages become quite stable: 0.013, 0.015, and 0.014 percent, respectively.
We use these latter numbers, even though the uncorrected numbers seem more consistent with the numbers obtained from IPUMS.
Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek (2018) IPUMS USA: Version 8.0 [dataset]. Minneapolis, MN: IPUMS. doi:10.18128/D010.V8.0
Matthew Sobek (2006) Chapter Ba. "Labor Occupations" in Susan B. Carter, ed., Historical Statistics of the United States, Cambridge U. Pr.
Ian D. Wyatt and Daniel E. Hecker (2006) "Occupational changes during the 20th century", Monthly Labor Review, March 2006, pp. 35-57
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
data(AccountantsAuditorsPct) plot(names(AccountantsAuditorsPct), AccountantsAuditorsPct, type='l', log='y', cex.axis=1.8) # for the version of this contributed to Wikimedia Commons
data(AccountantsAuditorsPct) plot(names(AccountantsAuditorsPct), AccountantsAuditorsPct, type='l', log='y', cex.axis=1.8) # for the version of this contributed to Wikimedia Commons
Countries and codes used by the
Armed Conflict Location and Event Data
project with population and Gross Domestic Project
(GDP
) numbers for recent years. Population and
GDP
data are from the World Bank when available
and from other sources otherwise. When no World Bank
data are available, numbers may be reported from the
closest year conveniently available, as noted in
Comments
; in those cases, the data may not be
as accurate as the numbers from the World Bank.
NOTE: This code will be offered to the maintainer of the
acled.api
package. If they like it, it may not stay in Ecdat
.
data(ACLEDpopGDP)
data(ACLEDpopGDP)
A dataframe with rownames
=
ACLEDcountry
containing :
A character vector of the country names used by ACLED in the monthly totals of events and deaths between 2021-01 and 2024-09 extracted 2024-10-24.
3-character ISO 3166-1 code for Country
.
A character vector of the country names used by the World Bank (WB) in data extracted 2024-11-06.
World Bank population and nominal Gross Domestic Product per capita (GDPpcn) in constant 2015 US$ plus GDP per capita, PPP (constant 2021 international $) extracted 2024-11-13 for the indicated years unless otherwise specified in "Comments". For country subdivions like Jersey, the World Bank extract used did not include such numbers. For those "countries", numbers were taken from Wikipedia and assigned to the nearest year in the 2020:2023 range and noted in "Comments".
Blank (”) if the data is from the World Bank. Otherwise, this lists the source of the population and GDP data, the applicable year, and other anomonlies.
ACLED Explorer was used 2024-10-24 to download monthly totals between 2021-01 and 2024-09 of events and death in two files: one for events and another for deaths. Both had data on 234 "countries", though some were actually subdivisions. For example, ACLED "countries" includes the "Bailiwick of Jersey", which is a "British Crown" dependency, and the World Bank does not provide data on them as they do on souvereign countries.
However, the country names used by ACLED Explorer do not match the country names used by the World Bank.
This ACLEDpopGDP
data.frame
was
created to facilitate merging ACLED data with data
on population and GDP
... from the World Bank
when avaialable and from other sources when not.
I got most of the ISO 3166-1 3-character country codes
using findCountry
. That function
was NOT able to find country codes for the
Caribbean Netherlands,
Christmas Island,
eSwatini, and
North Macedonia,
which have 3-letter ISO 3166-1 codes of BES
,
CXR
, SWZ
, and MKD
, respectively.
From the World Bank website, I got something by clicking DataBank. From there, I clicked on "Population, total". This displayed numbers by country and year from 2008 to 2015. I clicked, "Add Time". From there I clicked "Unselect all" then selected 2020, 2021, 2022, and 2023. Then I clicked "x" in the upper right and "Apply Changes".
Then I clicked "Add Series". From there I found that many series I did not want were selected, so I clicked "Unselect all", then selected "GDP (constant 2015 US$)" and "Population, total". Then I clicked "x" in the upper right and "Apply Changes" as before.
Then I clicked "Download options" and selected "Excel".
That downloaded a file named 'P_Popular Indicators.xlsx',
which I moved to the working directory, read into
R and merged in the obvious way to create most of
ACLEDpopGDP
.
For "Countries" not in the World Bank data I extracted,
I got numbers from relevant Wikipedia articles and
documented the source in
ACLEDpopGDP[, "Comments"]
.
Armed Conflict Location and Event Data
Index.Source
, Index.Economics
,
Index.Econometrics
, Index.Observations
# Country in World Bank data ACLEDpopGDP['China', ] # Country NOT in World Bank data ACLEDpopGDP['Taiwan', ] # Partial matching works if unique ACLEDpopGDP['Czech',] # Partial matching does NOT work if not unique ACLEDpopGDP['Saint', ] # Instead use, e.g., grep ACLEDpopGDP[grep('Saint', ACLEDpopGDP[, 'ACLEDcountry']), ] # If you know the ISO 3166-1 3-letter code: ACLEDpopGDP['CPV'==ACLEDpopGDP[, 'ISO3'], ] # NOTE: In this example, ACLEDcountry != # WBcountry. # No NAs in pop all.equal(length(which(is.na(ACLEDpopGDP$pop))), 0) # Only one NA in GDPpcn and GDPpcp: (GDPpNA <- which(is.na(ACLEDpopGDP$GDPpcp))) (GDPnNA <- which(is.na(ACLEDpopGDP$GDPpcn))) # Antarctica: all.equal(ACLEDpopGDP$ACLEDcountry[GDPpNA], 'Antarctica') # Normal probability plots of population qqnorm(unlist(Pops), datax=TRUE) qqnorm(unlist(Pops), datax=TRUE, log='x') (billion <- which(unlist(Pops)>1e9)) # 2*5 = 10: # Probably India and China ACLEDpopGDP[c('China', 'India'), ] ACLEDpopGDP[c('China', 'India'), pops] / 1e9 # Normal probability plot of GDPpc GDPpc <- ACLEDpopGDP[c(GDPp, GDPn)] qqnorm(unlist(GDPpc), datax=TRUE) qqnorm(unlist(GDPpc), datax=TRUE, log='x')
# Country in World Bank data ACLEDpopGDP['China', ] # Country NOT in World Bank data ACLEDpopGDP['Taiwan', ] # Partial matching works if unique ACLEDpopGDP['Czech',] # Partial matching does NOT work if not unique ACLEDpopGDP['Saint', ] # Instead use, e.g., grep ACLEDpopGDP[grep('Saint', ACLEDpopGDP[, 'ACLEDcountry']), ] # If you know the ISO 3166-1 3-letter code: ACLEDpopGDP['CPV'==ACLEDpopGDP[, 'ISO3'], ] # NOTE: In this example, ACLEDcountry != # WBcountry. # No NAs in pop all.equal(length(which(is.na(ACLEDpopGDP$pop))), 0) # Only one NA in GDPpcn and GDPpcp: (GDPpNA <- which(is.na(ACLEDpopGDP$GDPpcp))) (GDPnNA <- which(is.na(ACLEDpopGDP$GDPpcn))) # Antarctica: all.equal(ACLEDpopGDP$ACLEDcountry[GDPpNA], 'Antarctica') # Normal probability plots of population qqnorm(unlist(Pops), datax=TRUE) qqnorm(unlist(Pops), datax=TRUE, log='x') (billion <- which(unlist(Pops)>1e9)) # 2*5 = 10: # Probably India and China ACLEDpopGDP[c('China', 'India'), ] ACLEDpopGDP[c('China', 'India'), pops] / 1e9 # Normal probability plot of GDPpc GDPpc <- ACLEDpopGDP[c(GDPp, GDPn)] qqnorm(unlist(GDPpc), datax=TRUE) qqnorm(unlist(GDPpc), datax=TRUE, log='x')
a panel of 6 observations from 1970 to 1984
number of observations : 90
observation : production units
country : United States
data(Airline)
data(Airline)
A dataframe containing :
airline
year
total cost, in $1,000
output, in revenue passenger miles, index number
fuel price
load factor, the average capacity utilization of the fleet
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F7.1.
Index.Source
, Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a cross-section from 1972
number of observations : 30
observation : regional
country : United States
data(Airq)
data(Airq)
A dataframe containing :
indicator of air quality (the lower the better)
value added of companies (in thousands of dollars)
amount of rain (in inches)
is it a coastal area ?
population density (per square mile)
average income per head (in US dollars)
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 4.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
A data.frame
identifying which of 70
countries had a banking crisis each year
1800:2010. The first column is year
.
The remaining columns carry the names of the
countries; those columns are 1 for years
with banking crises and 0 otherwise.
data(bankingCrises)
data(bankingCrises)
A data.frame
This file was created using the following command:
bankingCrises
<-
readFinancialCrisisFiles(FinancialCrisisFiles)
HOWEVER: This function was in Ecfun
0.2-3
but was removed in 0.2-4. It used
gdata::read.xls
, and gdata
users were informed that gdata
might be
removed from CRAN, and any package that used it
would also be removed. It seemed that the
database that this function was designed to read
may not have been updated, which suggested that
it made sense to remove this function,
because it there may not be any further need
for it.
This dataset is an update of a subset of the data used to create Figure 10.1. Capital Mobility and the Incidence of Banking Crises, All Countries, 1800-2008, Reinhart and Rogoff (2009, p. 156).
The general upward trend visible in a plot of these data may be attributed to at least two different factors:
(1) The gradual increase in the proportion of human labor that is monetized.
(2) An increase in the general ability of cronies of those in power to gamble with other people's money in forming and bankrupting financial institutions. The marked feature of this plot is the virtual absence of banking crises during the period of the Bretton Woods agreement, 1944 to 1971. This period ended when US President Nixon in effect canceled the Bretton Woods agreement by taking the US off the silver standard.
Spencer Graves
http://www.reinhartandrogoff.com
Carmen M. Reinhart and Kenneth S. Rogoff (2009) This Time Is Different: Eight Centuries of Financial Folly, Princeton U. Pr.
data(bankingCrises) numberOfCrises <- rowSums(bankingCrises[-1], na.rm=TRUE) plot(bankingCrises$year, numberOfCrises, type='b') # Write to a file for Wikimedia Commons ## Not run: if(FALSE){ svg('bankingCrises.svg') plot(bankingCrises$year, numberOfCrises, type='b', cex.axis=2, las=1, xlab='', ylab='', bty='n', cex=0.5) abline(v=c(1945, 1971), lty='dashed', col='blue') text(1958, 14, 'Bretton Woods', srt=90, cex=2, col='blue') dev.off() } ## End(Not run)
data(bankingCrises) numberOfCrises <- rowSums(bankingCrises[-1], na.rm=TRUE) plot(bankingCrises$year, numberOfCrises, type='b') # Write to a file for Wikimedia Commons ## Not run: if(FALSE){ svg('bankingCrises.svg') plot(bankingCrises$year, numberOfCrises, type='b', cex.axis=2, las=1, xlab='', ylab='', bty='n', cex=0.5) abline(v=c(1945, 1971), lty='dashed', col='blue') text(1958, 14, 'Bretton Woods', srt=90, cex=2, col='blue') dev.off() } ## End(Not run)
a cross-section from 1972
number of observations : 4877
observation : individuals
country : United States
data(Benefits)
data(Benefits)
A time series containing :
state unemployment rate (in %)
state maximum benefit level
state of residence code
age in years
years of tenure in job lost
a factor with levels (slack_work,position_abolished,seasonal_job_ended,other)
non-white ?
more than 12 years of school ?
a factor with levels (male,female)
blue collar worker ?
lives in SMSA ?
married ?
has kids ?
has young kids (0-5 yrs) ?
year of job displacement (1982=1,..., 1991=10)
replacement rate
is head of household ?
applied for (and received) UI benefits ?
McCall, B.P. (1995) “The impact of unemployment insurance benefit levels on recipiency”, Journal of Business and Economic Statistics, 13, 189–198.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 7.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section
number of observations : 126
observation : production units
country : United States
data(Bids)
data(Bids)
A dataframe containing :
doc no.
weeks
count
delta (1 if taken over)
bid Premium
institutional holdings
size measured in billions
legal restructuring
real restructuring
financial restructuring
regulation
white knight
Jaggia, Sanjiv and Satish Thosar (1993) “Multiple Bids as a Consequence of Target Management Resistance”, Review of Quantitative Finance and Accounting, 447–457.
Cameron, A.C. and Per Johansson (1997) “Count Data Regression Models using Series Expansions: with Applications”, Journal of Applied Econometrics, 12, may, 203–223.
Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html, chapter 5.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
data.frame
of cyber security breaches
involving health care records of 500 or more
humans reported to the U.S. Department of
Health and Human Services (HHS) as of June
27, 2014.
data(breaches)
data(breaches)
A data.frame
with 1055 observations on
the following 24 variables:
integer record number in the HHS data base
factor
giving the name of the
entity experiencing the breach
Factor giving the 2-letter code of the state where the breach occurred. This has 52 levels for the 50 states plus the District of Columbia (DC) and Puerto Rico (PR).
Factor giving the name of a subcontractor (or blank) associated with the breach.
integer
number of humans whose
records were compromised in the breach. This
is 500 or greater; U.S. law requires reports
of breaches involving 500 or more records but
not of breaches involving fewer.
character
vector giving the
date or date range of the breach. Recodes
as Date
s in breach_start
and breach_end
.
factor
with 29 levels giving the
type of breach (e.g., "Theft" vs.
"Unauthorized Access/Disclosure", etc.)
factor
with 41 levels coding the
location from which the breach occurred (e.g.,
"Paper", "Laptop", etc.)
Date
the information was posted
to the HHS data base or last updated.
character
vector of a summary of
the incident.
Date
of the start of the
incident = first date given in
Date_of_Breach
above.
Date
of the end of the incident
or NA
if only one date is given in
Date_of_Breach
above.
integer
giving the year of the
breach
The data primarily consists of breaches that occurred from 2010 through early 2014 when the extract was taken. However, a few breaches are recorded including 1 from 1997, 8 from 2002-2007, 13 from 2008 and 56 from 2009. The numbers of breaches from 2010 - 2014 are 211, 229, 227, 254 and 56, respectively. (A chi-square test for equality of the counts from 2010 through 2013 is 4.11, which with 3 degrees of freedom has a significance probability of 0.25. Thus, even though the lowest number is the first and the largest count is the last, the apparent trend is not statistically significant under the usual assumption of independent Poisson trials.)
The following corrections were made to the file:
Number | Name of Covered Entity | Corrections |
45 | Wyoming Department of Health | Cause of breach was missing. Added "Unauthorized |
Access /
Disclosure" per
smartbrief.com/03/29/10 |
||
55 | Reliant Rehabilitation Hospital North | Cause of breach was missing. Added "Unauthorized |
Houston | Access / Disclosure" per Dissent. "Two Breaches | |
Involving Unauthorized Access Lead to Notification." | ||
www.phiprivacy.net/two-breaches-involving-unauthorized-access-lead-to-notification ;
approximately 2010-04-20. This web page has
since been removed, apparently without having
been captured by archive.net.] |
||
123 | Aetna | Cause of breach was missing. Added Improper |
disposal per
Aetna.com/news/newsReleases/2010/0630
|
||
157 | Mayo Clinic | Cause of breach was missing. Added Unauthorized |
Access/Disclosure per Anderson, Howard. "Mayo Fires | ||
"Employees in 2 Incidents: Both Involved | ||
Unauthorized Access to Records." | ||
Data Breach Today. N.p., 4 Oct. 2010 | ||
341 | Saint Barnabas MedicL Center |
Misspelled "Saint Barnabas Medical Center" |
347 | Americar Health Medicare |
Misspelled "American Health Medicare" |
484 | Lake Granbury Medicl Ceter |
Misspelled "Lake Granbury Medical Center" |
782 | See list of Practices under Item 9 | Replaced name as "Cogent Healthcare, Inc." checked |
from XML and web documents | ||
805 | Dermatology Associates of Tallahassee | Had 00/00/0000 on breach date. This was crossed |
check to determine that it was Sept 4, 2013 with 916 records | ||
815 | Santa Clara Valley Medical Center | Mistype breach year as 09/14/2913 corrected as 09/14/2013 |
961 | Valley View Hosptial Association |
Misspelled "Valley View Hospital Association" |
1034 | Bio-Reference Laboratories, Inc. | Date changed from 00/00/000 to 2/02/2014 as |
subsequently determined. | ||
Spencer Graves
U.S. Department of Health and Human Services: Health Information Privacy: Breaches Affecting 500 or More Individuals
HHSCyberSecurityBreaches
for a version of
these data downloaded more recently. This newer version
includes changes in reporting and in the variables included
in the data.frame
.
data(breaches) quantile(breaches$Individuals_Affected) # confirm that the smallest number is 500 # -- and the largest is 4.9e6 # ... and there are no NAs dDays <- with(breaches, breach_end - breach_start) quantile(dDays, na.rm=TRUE) # confirm that breach_end is NA or is later than # breach_start
data(breaches) quantile(breaches$Individuals_Affected) # confirm that the smallest number is 500 # -- and the largest is 4.9e6 # ... and there are no NAs dDays <- with(breaches, breach_end - breach_start) quantile(dDays, na.rm=TRUE) # confirm that breach_end is NA or is later than # breach_start
a cross-section from 1980
number of observations : 23972
observation : households
country : Spain
data(BudgetFood)
data(BudgetFood)
A dataframe containing :
percentage of total expenditure which the household has spent on food
total expenditure of the household
age of reference person in the household
size of the household
size of the town where the household is placed categorized into 5 groups: 1 for small towns, 5 for big ones
sex of reference person (man,woman)
Delgado, A. and Juan Mora (1998) “Testing non–nested semiparametric models : an application to Engel curves specification”, Journal of Applied Econometrics, 13(2), 145–162.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1973 to 1992
number of observations : 1729
observation : households
country : Italy
data(BudgetItaly)
data(BudgetItaly)
A dataframe containing :
food share
housing and fuels share
miscellaneous share
food price
housing and fuels price
miscellaneous price
total expenditure
year
income
household size
cellule weight
Bollino, Carlo Andrea, Frederico Perali and Nicola Rossi (2000) “Linear household technologies”, Journal of Applied Econometrics, 15(3), 253–274.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1980 to 1982
number of observations : 1519
observation : households
country : United Kingdom
data(BudgetUK)
data(BudgetUK)
A dataframe containing :
budget share for food expenditure
budget share for fuel expenditure
budget share for clothing expenditure
budget share for alcohol expenditure
budget share for transport expenditure
budget share for other good expenditure
total household expenditure (rounded to the nearest 10 UK pounds sterling)
total net household income (rounded to the nearest 10 UK pounds sterling)
age of household head
number of children
Blundell, Richard, Alan Duncan and Krishna Pendakur (1998) “Semiparametric estimation and consumer demand”, Journal of Applied Econometrics, 13(5), 435–462.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1994
number of observations : 1472
observation : individuals
country : Belgium
data(Bwages)
data(Bwages)
A dataframe containing :
gross hourly wage rate in euro
education level from 1 [low] to 5 [high]
years of experience
a factor with levels (males,female)
European Community Household Panel.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 3.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
monthly observations from 1960–01 to 2002–12
number of observations : 516
data(Capm)
data(Capm)
A time series containing :
excess returns food industry
excess returns durables industry
excess returns construction industry
excess returns market portfolio
risk-free return
most of the above data are from Kenneth French's data library at http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section
number of observations : 4654
observation : individuals
country : United States
data(Car)
data(Car)
A dataframe containing :
choice of a vehicle among 6 propositions
college education ?
size of household greater than 2 ?
commute lower than 5 miles a day ?
body type, one of regcar
(regular car),
sportuv
(sport utility vehicle),
sportcar
, stwagon
(station
wagon), truck
, van
, for each
proposition z from 1 to 6
fuel for proposition z, one of gasoline
,
methanol
, cng
(compressed natural
gas), electric
.
price of vehicle divided by the logarithm of income
hundreds of miles vehicle can travel between refuelings/rechargings
acceleration, tens of seconds required to reach 30 mph from stop
highest attainable speed in hundreds of mph
tailpipe emissions as fraction of those for new gas vehicle
0 for a mini, 1 for a subcompact, 2 for a compact and 3 for a mid–size or large vehicle
fraction of luggage space in comparable new gas vehicle
cost per mile of travel (tens of cents) : home recharging for electric vehicle, station refueling otherwise
fraction of stations that can refuel/recharge vehicle
McFadden, Daniel and Kenneth Train (2000) “Mixed MNL models for discrete response”, Journal of Applied Econometrics, 15(5), 447–470.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1998-1999
number of observations : 420
observation : schools
country : United States
data(Caschool)
data(Caschool)
A dataframe containing :
district code
county
district
grade span of district
total enrollment
number of teachers
percent qualifying for CalWORKS
percent qualifying for reduced-price lunch
number of computers
average test score
(read.scr+math.scr)/2
computer per student
expenditure per student
student teacher ratio
district average income
percent of English learners
average reading score
average math score
California Department of Education https://www.cde.ca.gov.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 4–7.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 2798
observation : individuals
country : United States
data(Catsup)
data(Catsup)
A dataframe containing :
individuals identifiers
one of heinz41
, heinz32
,
heinz28
, hunts32
is there a display for brand z ?
is there a newspaper feature advertisement for brand z ?
price of brand z
Jain, Dipak C., Naufel J. Vilcassim and Pradeep K. Chintagunta (1994) “A random–coefficients logit brand–choice model applied to panel data”, Journal of Business and Economics Statistics, 12(3), 317.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Ketchup
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
a panel of 46 observations from 1963 to 1992
number of observations : 1380
observation : regional
country : United States
data(Cigar)
data(Cigar)
A dataframe containing :
state abbreviation
the year
price per pack of cigarettes
population
population above the age of 16
consumer price index (1983=100)
per capita disposable income
cigarette sales in packs per capita
minimum price in adjoining states per pack of cigarettes
Baltagi, B.H. and D. Levin (1992) “Cigarette taxation: raising revenues and reducing consumption”, Structural Changes and Economic Dynamics, 3, 321–335.
Baltagi, B.H., J.M. Griffin and W. Xiong (2000) “To pool or not to pool: homogeneous versus heterogeneous estimators applied to cigarette demand”, Review of Economics and Statistics, 82, 117–126.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 48 observations from 1985 to 1995
number of observations : 528
observation : regional
country : United States
data(Cigarette)
data(Cigarette)
A dataframe containing :
state
year
consumer price index
state population
number of packs per capita
state personal income (total, nominal)
average state, federal, and average local excise taxes for fiscal year
average price during fiscal year, including sales taxes
average excise taxes for fiscal year, including sales taxes
Professor Jonathan Gruber, MIT.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 10.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1990
number of observations : 400
observation : production units
country : Netherland
data(Clothing)
data(Clothing)
A dataframe containing :
annual sales in Dutch guilders
sales per square meter
gross-profit-margin
number of owners (managers)
number of full-timers
number of part-timers
number of helpers (temporary workers)
total number of hours worked
number of hours worked per worker
investment in shop-premises
investment in automation.
sales floor space of the store (in m$^2$).
year start of business
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 3.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1993 to 1995
number of observations : 6259
observation : goods
country : United States
data(Computers)
data(Computers)
A dataframe containing :
price in US dollars of 486 PCs
clock speed in MHz
size of hard drive in MB
size of Ram in in MB
size of screen in inches
is a CD-ROM present ?
is a multimedia kit (speakers, sound card) included ?
is the manufacturer was a "premium" firm (IBM, COMPAQ) ?
number of 486 price listings for each month
time trend indicating month starting from January of 1993 to November of 1995.
Stengos, T. and E. Zacharias (2005) “Intertemporal pricing and price discrimination : a semiparametric hedonic analysis of the personal computer market”, Journal of Applied Econometrics, forthcoming.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
quarterly observations from 1947-1 to 1996-4
number of observations : 200
observation : country
country : Canada
data(Consumption)
data(Consumption)
A time series containing :
personal disposable income, 1986 dollars
personal consumption expenditure, 1986 dollars
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 1, 3, 4, 6, 9, 10, 14 and 15.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
Average surface temperature changes world wide and in the Northern Hemisphere 3 and 10 years after the injections of 5, 50 and 150 Tg (teragrams = millions of metric tons) of smoke into the upper troposphere, per Robock, Oman, and Stenchikov (2007).
These numbers are relative to the average for 1925-1975, which explains why the numbers are positive with smoke = 0.
data(coolingFromNuclearWar)
data(coolingFromNuclearWar)
A dataframe containing :
teragrams = millions of metric tons
average change in surface temperature 3 and
10 years after injection of smoke
into
the upper troposphere globally (g
) or in
the Northern Hemisphere (n
) in degrees
Celsius.
Alan Robock, Luke Oman, and Georgiy L. Stenchikov (2007) Nuclear winter revisited with a modern climate model and current nuclear arsenals: Still catastrophic consequences, Journal of Geophysical Research, 112
data(coolingFromNuclearWar) matplot(coolingFromNuclearWar[, 'smoke'], coolingFromNuclearWar[, 2:5], type='l') (linFit <- lm(cbind(dC3g, dC10g, dC3n, dC10n)~smoke, coolingFromNuclearWar)) # total change dC <- as.matrix(coolingFromNuclearWar[, 2:5] - rep(unlist(coolingFromNuclearWar[1, -1]), e=4)) (linFit0 <- lm(dC~smoke, coolingFromNuclearWar)) summary(linFit0)
data(coolingFromNuclearWar) matplot(coolingFromNuclearWar[, 'smoke'], coolingFromNuclearWar[, 2:5], type='l') (linFit <- lm(cbind(dC3g, dC10g, dC3n, dC10n)~smoke, coolingFromNuclearWar)) # total change dC <- as.matrix(coolingFromNuclearWar[, 2:5] - rep(unlist(coolingFromNuclearWar[1, -1]), e=4)) (linFit0 <- lm(dC~smoke, coolingFromNuclearWar)) summary(linFit0)
a cross-section from 1998
number of observations : 11130
observation : individuals
country : United States
data(CPSch3)
data(CPSch3)
A dataframe containing :
survey year
average hourly earnings
a factor with levels (male,female)
Bureau of labor statistics, U.S. Department of Labor https://www.bls.gov.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 3.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 3292
observation : individuals
country : United States
data(Cracker)
data(Cracker)
A dataframe containing :
individuals identifiers
one of sunshine
, kleebler
,
nabisco
, private
is there a display for brand z ?
is there a newspaper feature advertisement for brand z ?
price of brand z
Jain, Dipak C., Naufel J. Vilcassim and Pradeep K. Chintagunta (1994) “A random–coefficients logit brand–choice model applied to panel data”, Journal of Business and Economics Statistics, 12(3), 317.
Paap, R. and Philip Hans Frances (2000) “A dynamic multinomial probit model for brand choices with different short–run effects of marketing mix variables”, Journal of Applied Econometrics, 15(6), 717–744.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
Data casually collected on the number of packages on the Comprehensive R Archive Network (CRAN) at different dates.
NOTE: This could change in the future. See Details below.
data(CRANpackages)
data(CRANpackages)
A data.frame
containing:
an ordered factor of the R version number primarily in use at the time. This was taken from archives of the major releases at https://svn.r-project.org/R/branches/R-1-3-patches/tests/internet.Rout.save, ... https://svn.r-project.org/R/branches/R-3-1-branch/tests/internet.Rout.save
an object of class Date
giving
the date on which the count of the
number of CRAN packages was determined.
an integer number of packages on the
CRAN mirror checked on the indicated
Date
.
A factor giving the source (person) who collected the data.
This seems to provide the most widely available source for data on the growth of CRAN, manually recorded by John Fox and Spencer Graves. For a discussion of these and related data, see Fox (2009).
For more detail, see the
CRAN packages data on GitHub maintained by Hadley Wickham.
This contains the description file of every
package uploaded to CRAN prior to the date of
Hadley's most recent update. The current
maintainer of the Ecdat
and Ecfun
packages would consider contributions along the
following lines:
1. It might be nice to have a more complete
dataset or datasets showing CRAN growth.
This might include code fitting multiple models
and predicting future growth with error bounds
computed using Bayesian Model Averaging. These
model fits might make an interesting addition to
the examples in this help file. With a little
more effort, it might make an interesting note
for R Journal. Functions written to fit
those models might be added to the Ecfun
package.
2. It might be nice to have a function in
Ecfun
to download the
CRAN packages
data from GitHub and convert it to a format suitable for
updating this dataset.
The current maintainer for Ecdat
and Ecfun
(Spencer Graves) might be willing to accept code and
documentation for this but is not ready to do it himself at
the present time.
John Fox, "Aspects of the Social Organization and Trajectory of the R Project", R Journal, 1(2), Dec. 2009, 5-13. https://journal.r-project.org/archive/2009-2/RJournal_2009-2_Fox.pdf, accessed 2014-04-13.
plot(Packages~Date, CRANpackages, log='y') # almost exponential growth
plot(Packages~Date, CRANpackages, log='y') # almost exponential growth
a panel of 90 observations from 1981 to 1987
number of observations : 630
observation : regional
country : United States
data(Crime)
data(Crime)
A dataframe containing :
county identifier
year from 1981 to 1987
crimes committed per person
'probability' of arrest
'probability' of conviction
'probability' of prison sentence
average sentence, days
police per capita
hundreds of people per square mile
tax revenue per capita
one of 'other', 'west' or 'central'
'yes' or 'no' if in SMSA
percentage minority in 1980
weekly wage in construction
weekly wage in trns, util, commun
weekly wage in whole sales and retail trade
weekly wage in finance, insurance and real estate
weekly wage in service industry
weekly wage in manufacturing
weekly wage of federal employees
weekly wage of state employees
weekly wage of local governments employees
offense mix: face-to-face/other
percentage of young males
Thanks to Yungfong "Frank" Tang for identifying an error in the description of "density", previously documented erroneously as only "people per square mile".
Cornwell, C. and W.N. Trumbull (1994) “Estimating the economic model of crime with panel data”, Review of Economics and Statistics, 76, 360–366.
Baltagi, B. H. (2006) “Estimating an economic model of crime using panel data from North Carolina”, Journal of Applied Econometrics, 21(4), May/June 2006, pp. 543-547.
See also: CRIME4.DES
and Baltagi in JAE Data Archive.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
,
Crime
daily observations from 1969-1-03 to 1998-12-31
number of observations : 2528
observation : production units
country : United States
data(CRSPday)
data(CRSPday)
A dataframe containing :
the year
the month
the day
the return for General Electric, PERMNO 12060
the return for IBM, PERMNO 12490
the return for Mobil Corporation, PERMNO 15966
the return for the CRSP value-weighted index, including dividends
Center for Research in Security Prices, Graduate School of Business, University of Chicago, 725 South Wells - Suite 800, Chicago, Illinois 60607, https://www.crsp.org.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 7, 9 and 15.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
monthly observations from 1969-1 to 1998-12
number of observations : 360
observation : production units
country : United States
data(CRSPmon)
data(CRSPmon)
A time series containing :
the return for General Electric, PERMNO 12060
the return for IBM, PERMNO 12490
the return for Mobil Corporation, PERMNO 15966
the return for the CRSP value-weighted index, including dividends
Center for Research in Security Prices, Graduate School of Business, University of Chicago, 725 South Wells - Suite 800, Chicago, Illinois 60607, https://www.crsp.org.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 13.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 2000
number of observations : 308
observation : goods
country : Singapore
data(Diamond)
data(Diamond)
A dataframe containing :
weight of diamond stones in carat unit
a factor with levels (D,E,F,G,H,I)
a factor with levels
(IF,VVS1,VVS2,VS1,VS2
)
certification body, a factor with levels (
GIA
, IGI
, HRD
)
price in Singapore $
Chu, Singfat (2001) “Pricing the C's of Diamond Stones”, Journal of Statistics Education, 9(2).
Journal of Statistics Education's data archive : http://jse.amstat.org/jse_data_archive.htm.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
weekly observations from 1975 to 1989
number of observations : 778
observation : country
country : Germany
data(DM)
data(DM)
A dataframe containing :
the date of the observation (19850104 is January, 4, 1985)
the ask price of the dollar in units of DM in the spot market on Friday of the current week
the ask price of the dollar in units of DM in the 30-day forward market on Friday of the current week
the bid price of the dollar in units of DM in the spot market on the delivery date on a current forward contract
Bekaert, G. and R. Hodrick (1993) “On biases in the measurement of foreign exchange risk premiums”, Journal of International Money and Finance, 12, 115-138.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 6, 438-443.
Pound
,
Yen
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a cross-section from 1986
number of observations : 485
observation : individuals
country : United States
data(Doctor)
data(Doctor)
A dataframe containing :
the number of doctor visits
the number of children in the household
is a measure of access to health care
a measure of health status (larger positive numbers are associated with poorer health)
Gurmu, Shiferaw (1997) “Semiparametric estimation of hurdle regression models with an application to medicaid utilization”, Journal of Applied Econometrics, 12(3), 225-242.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 11.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
DoctorContacts
,
DoctorAUS
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
a cross-section from 1977–1978
number of observations : 5190
observation : individuals
country : Australia
data(DoctorAUS)
data(DoctorAUS)
A dataframe containing :
sex
age
annual income in tens of thousands of dollars
insurance contract (medlevy
:
medibanl
levy
, levyplus
:
private health insurance, freepoor
:
government insurance due to low income,
freerepa
: government insurance due to
old age disability or veteran status
number of illness in past 2 weeks
number of days of reduced activity in past 2 weeks due to illness or injury
general health score using Goldberg's method (from 0 to 12)
chronic condition (np
: no problem,
la
: limiting activity, nla
:
not limiting activity)
number of consultations with a doctor or specialist in the past 2 weeks
number of consultations with non-doctor health professionals (chemist, optician, physiotherapist, social worker, district community nurse, chiropodist or chiropractor) in the past 2 weeks
number of admissions to a hospital, psychiatric hospital, nursing or convalescent home in the past 12 months (up to 5 or more admissions which is coded as 5)
number of nights in a hospital, etc. during most recent admission: taken, where appropriate, as the mid-point of the intervals 1, 2, 3, 4, 5, 6, 7, 8-14, 15-30, 31-60, 61-79 with 80 or more admissions coded as 80. If no admission in past 12 months then equals zero.
total number of prescribed and nonprescribed medications used in past 2 days
total number of prescribed medications used in past 2 days
total number of nonprescribed medications used in past 2 days
Cameron, A.C. and P.K. Trivedi (1986) “Econometric Models Based on Count Data: Comparisons and Applications of Some Estimators and Tests”, Journal of Applied Econometrics, 1, 29-54..
Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html, chapter 3.
Doctor
,
DoctorContacts
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
a cross-section from 1977–1978
number of observations : 20186
data(DoctorContacts)
data(DoctorContacts)
A time series containing :
number of outpatient visits to a medical doctor
log(coinsrate+1)
where coinsurance
rate is 0 to 100
individual deductible plan ?
log
(annual participation incentive
payment) or 0 if no payment
log
(max(medical deductible
expenditure)) if IDP
=1 and
MDE
>1 or 0 otherwise
physical limitation ?
number of chronic diseases
self–rate health (excellent,good,fair,poor)
log of annual family income (in $)
log of family size
years of schooling of household head
exact age
sex (male,female)
age less than 18 ?
is household head black ?
Deb, P. and P.K. Trivedi (2002) “The Structure of Demand for Medical Care: Latent Class versus Two-Part Models”, Journal of Health Economics, 21, 601–625.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 553–556 and 565.
Doctor
,
MedExp
,
DoctorAUS
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a cross-section from 1988-1989
number of observations : 4266
observation : individuals
country : United States
data(Earnings)
data(Earnings)
A dataframe containing :
age groups, a factor with levels
(g1,g2,g3)
average annual earnings, in 1982 US dollars
Mills, Jeffery A. and Sourushe Zandvakili (1997) “Statistical Inference via Bootstrapping for Measures of Inequality”, Journal of Applied Econometrics, 12(2), pp. 133-150.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 5 and 7.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1970 to 1970
number of observations : 158
observation : production units
country : United States
data(Electricity)
data(Electricity)
A dataframe containing :
total cost
total output
wage rate
cost share for labor
capital price index
cost share for capital
fuel price
cost share for fuel
Christensen, L. and W. H. Greene (1976) “Economies of scale in U.S. electric power generation”, Journal of Political Economy, 84, 655-676.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, chapter 4, 317-320.
Hayashi, F. (2000) Econometrics, Princeton University Press, https://archive.org/details/econometrics0000haya, chapter 1, 76-84.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 601
observation : individuals
country : United States
data(Fair)
data(Fair)
A dataframe containing :
a factor with levels (male,female)
age
number of years married
children ? a factor
how religious, from 1 (anti) to 5 (very)
education
occupation, from 1 to 7, according to Hollingshead's classification (reverse numbering)
self rating of marriage, from 1 (very unhappy) to 5 (very happy)
number of affairs in past year
Fair, R. (1977) “A note on the computation of the tobit estimator”, Econometrica, 45, 1723-1727.
https://fairmodel.econ.yale.edu/rayfair/pdf/1978A200.PDF.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F22.2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a panel of 48 observations from 1982 to 1988
number of observations : 336
observation : regional
country : United States
data(Fatality)
data(Fatality)
A dataframe containing :
state ID code
year
traffic fatality rate (deaths per 10000)
tax on case of beer
minimum legal drinking age
mandatory jail sentence ?
mandatory community service ?
average miles per driver
unemployment rate
per capita personal income
Pr. Christopher J. Ruhm, Department of Economics, University of North Carolina.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 8.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
FinancialCrisisFiles
is an object of class
financialCrisisFiles
created by the
financialCrisisFiles
function in
Ecfun
. It describes files containing data
on financial crises downloadable from
https://web.archive.org/web/20150419090824/http://www.reinhartandrogoff.com/data/browse-by-topic/topics/7.
NOTE: When this dataset was created it was
downloaded from http://www.reinhartandrogoff.com/data/browse-by-topic/topics/7
. However, it was "Not Found" in testing on 2020-02-09. Fortunately the data are still available on the Internet Archive.
data(FinancialCrisisFiles)
data(FinancialCrisisFiles)
Reinhart and Rogoff (http://www.reinhartandrogoff.com) provide numerous data sets analyzed in their book, "This Time Is Different: Eight Centuries of Financial Folly". Of interest here are data on financial crises of various types for 70 countries spanning the years 1800 - 2010, downloadable from http://www.reinhartandrogoff.com/data/browse-by-topic/topics/7/.
Version 0.2-3 of the Ecfun
package
included a function financialCrisisFiles
that produced a list of class
financialCrisisFiles
describing four
different Excel files in very similar formats
with one sheet per Country and a few extra
descriptor sheets. This data object
FinancialCrisisFiles
was produced by
that function. That function required the
gdata
package, and users of that package
were advised to terminate use of it, because
it was scheduled to be removed from CRAN along
with all packages that used it. Since Reinhart
and Rogoff seemed not to be actively maintaining
that dataset, there seemed little need to do the
work required to make the
Ecfun::financialCrisisFiles
work without
gdata
, so it was removed from Ecfun
version 2.0-4.
FinancialCrisisFiles
is a list with
components carrying the names of files to be read.
Each component is a list of optional arguments to
pass to do.call(read.xls, ...)
to read the
sheet with name = name of that component. (This
read.xls
was part of the gdata
package, which may no longer be available on
CRAN.)
This corresponds to the files downloaded from http://www.reinhartandrogoff.com/data/browse-by-topic/topics/7/ in January 2013 (except for the fourth, which was not available there because of an error with the web site but instead was obtained directly from Prof. Reinhart).
Spencer Graves
http://www.reinhartandrogoff.com
Carmen M. Reinhart and Kenneth S. Rogoff (2009) This Time Is Different: Eight Centuries of Financial Folly, Princeton U. Pr.
a cross-section
number of observations : 1182
observation : individuals
country : United States
data(Fishing)
data(Fishing)
A dataframe containing :
recreation mode choice, on of : beach, pier, boat and charter
price for chosen alternative
catch rate for chosen alternative
price for beach mode
price for pier mode
price for private boat mode
price for charter boat mode
catch rate for beach mode
catch rate for pier mode
catch rate for private boat mode
catch rate for charter boat mode
monthly income
Herriges, J. A. and C. L. Kling (1999) “Nonlinear Income Effects in Random Utility Models”, Review of Economics and Statistics, 81, 62-72.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 463–466, 486 and 491–495.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
monthly observations from 1979–01 to 2001–12
number of observations : 276
data(Forward)
data(Forward)
A time series containing :
exchange rate USD/British Pound Sterling
exchange rate US D/Euro
exchange rate Euro/Pound
1 month forward rate USD/Pound
1 month forward rate USD/Euro
1 month forward rate Euro/Pound
3 month forward rate USD/Pound
month forward rate USD/Euro
month forward rate Euro/Pound
Datastream
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 4.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 2002–03
number of observations : 227
observation : individuals
country : United States
data(FriendFoe)
data(FriendFoe)
A dataframe containing :
contestant's sex
is contestant white ?
contestant's age in years
contestant's choice : a factor with levels "foe" and "friend". If both players play "friend", they share the trust box, if both play "foe", both players receive zero prize, if one of them play "foe" and the other one "friend", the "foe" player receive the entire trust box and the "friend" player nothing
round in which contestant is eliminated, a factor with levels ("1","2","3")
season show, a factor with levels ("1","2")
the amount of cash in the trust box
partner's sex
is partner white ?
partner's age in years
partner's choice : a factor with levels "foe" and "friend"
money won by contestant
money won by partner
Kalist, David E. (2004) “Data from the Television Game Show "Friend or Foe?"”, Journal of Statistics Education, 12(3).
Journal of Statistics Education's data archive : http://jse.amstat.org/jse_data_archive.htm.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
daily observations from 1980–01 to 1987–05–21
number of observations : 1867
observation : country
country : World
data(Garch)
data(Garch)
A dataframe containing :
date
date of observation (yymmdd
)
day
day of the week (a factor)
dm
exchange rate Dollar/Deutsch Mark
ddm
dm-dm(-1)
bp
exchange rate of Dollar/British Pound
cd
exchange rate of Dollar/Canadian Dollar
dy
exchange rate of Dollar/Yen
sf
exchange rate of Dollar/Swiss Franc
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 8.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 18 observations from 1960 to 1978
number of observations : 342
observation : country
country : OECD
data(Gasoline)
data(Gasoline)
A dataframe containing :
a factor with 18 levels
the year
logarithm of motor gasoline consumption per auto
logarithm of real per-capita income
logarithm of real motor gasoline price
logarithm of the stock of cars per capita
Baltagi, B.H. and Y.J. Griggin (1983) “Gasoline demand in the OECD: an application of pooling and testing procedures”, European Economic Review, 22.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1980
number of observations : 758
observation : individuals
country : United States
data(Griliches)
data(Griliches)
A dataframe containing :
residency in the southern states (first observation) ?
same variable for 1980
married (first observation) ?
same variable for 1980
residency in metropolitan areas (first observation) ?
same variable for 1980
mother's education in years
IQ score
score on the “knowledge of the world of work” test
year of the observation
age (first observation)
same variable for 1980
completed years of schooling (first observation)
same variable for 1980
experience in years (first observation)
same variable for 1980
tenure in years (first observation)
same variable for 1980
log wage (first observation)
same variable for 1980
Blackburn, M. and Neumark D. (1992) “Unobserved ability, efficiency wages, and interindustry wage differentials”, Quarterly Journal of Economics, 107, 1421-1436.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 3, 250-256.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a panel of 20 annual observations from 1935 to 1954 on each of 10 firms.
number of observations : 200
observation : production units
country : United States
data(Grunfeld)
data(Grunfeld)
A dataframe containing :
observation
date
gross Investment
value of the firm
stock of plant and equipment
There are several versions of these data.
GrunfeldGreene
is "A
data frame containing 20 annual observations on
3 variables for 5 firms." That dataset
reportedly contains errors but is maintained
in that way to avoid breaking the code of
others who use it. That help file also provides
a link to the corrected version.
See also for a version with only 5 firms.
Moody's Industrial Manual, Survey of Current Business.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, Table F13.1.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
GrunfeldGreene
,
a cross-section
number of observations : 250
observation : households
country : California
data(HC)
data(HC)
A dataframe containing :
heating system, one of gcc
(gas central heat with cooling),
ecc
(electric central
resistance heat with cooling),
erc
(electric room
resistance heat with cooling),
hpc
(electric heat pump
which provides cooling also),
gc
(gas central heat without
cooling, ec
(electric
central resistance heat without
cooling), er
(electric room
resistance heat without cooling)
installation cost of the heating portion of the system
installation cost for cooling
operating cost for the heating portion of the system
operating cost for cooling
annual income of the household
Kenneth Train's home page : https://eml.berkeley.edu/~train/.
Heating
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
a cross-section
number of observations : 900
observation : households
country : California
data(Heating)
data(Heating)
A dataframe containing :
id
heating system, one of gc
(gas
central), gr
(gas room), ec
(electric central), er
(electric
room), hp
(heat pump)
installation cost for heating system z (defined for the 5 heating systems)
annual operating cost for heating system z (defined for the 5 heating systems)
ratio oc.z/ic.z
annual income of the household
age of the household head
numbers of rooms in the house
Kenneth Train's home page : https://eml.berkeley.edu/~train/.
HC
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
a cross-section
number of observations : 506
observation : regional
country : United States
data(Hedonic)
data(Hedonic)
A dataframe containing :
median value of owner–occupied homes
crime rate
proportion of 25,000 square feet residential lots
proportion of nonretail business acres
is the tract bounds the Charles River ?
annual average nitrogen oxide concentration in parts per hundred million
average number of rooms
proportion of owner units built prior to 1940
weighted distances to five employment centers in the Boston area
index of accessibility to radial highways
full value property tax rate ($ / $10,000)
pupil/teacher ratio
proportion of blacks in the population
proportion of population that is lower status
town identifier
Harrison, D. and D.L. Rubinfeld (1978) “Hedonic housing prices and the demand for clean air”, Journal of Environmental Economics Ans Management, 5, 81–102.
Belsley, D.A., E. Kuh and R. E. Welsch (1980) Regression diagnostics: identifying influential data and sources of collinearity, John Wiley, New–York.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
Since October 2009 organizations in the U.S.
that store data on human health are required
to report any incident that compromises the
confidentiality of 500 or more patients /
human subjects
(45 C.F.R. 164.408) These reports are publicly available. HHSCyberSecurityBreaches
was downloaded from the Office for Civil Rights of the U.S. Department of Health and Human Services, 2015-02-26
data(HHSCyberSecurityBreaches)
data(HHSCyberSecurityBreaches)
A dataframe containing 1151 observations of 9 variables:
A character
vector identifying the
organization involved in the breach.
A factor
giving the two-letter
abbreviation of the US state or territory
where the breach occurred. This has 52
levels for the 50 states plus the District
of Columbia (DC) and Puerto Rico (PR).
A factor
giving the organization
type of the covered entity with levels
"Business Associate", "Health Plan",
"Healthcare Clearing House", and
"Healthcare Provider"
An integer
giving the number of
humans whose records were compromised in
the breach. This is 500 or greater; U.S.
law requires reports of breaches involving
500 or more records but not of breaches
involving fewer.
Date when the breach was reported.
A factor
giving one of 29 different
combinations of 7 different breach types,
separated by ", ": "Hacking/IT Incident",
"Improper Disposal", "Loss", "Other",
"Theft", "Unauthorized Access/Disclosure",
and "Unknown"
A factor
giving one of 47 different
combinations of 8 different location
categories: "Desktop Computer",
"Electronic Medical Record", "Email",
"Laptop", "Network Server", "Other",
"Other Portable Electronic Device",
"Paper/Films"
Logical
= (Covered.Entity.Type
== "Business Associate")
A character vector giving a narrative description of the incident.
This contains the breach report data downloaded 2015-02-26 from the US Health and Human Services. This catalogs reports starting 2009-10-21. Earlier downloads included a few breaches prior to 2009 when the law was enacted (inconsistently reported), and a date for breach occurrence in addition to the date of the report.
The following corrections were made to the file:
UCLA Health System, breach date 11/4/2011, had cover entity added as "Healthcare Provider"
Wyoming Department of Health, breach date 3/2/2010 had breach type changed to "Unauthorized Access / Disclosure"
Computer Program and Systems, Inc. (CPSI), breach date 3/30/2010 had breach type changed to "Unauthorized Access / Disclosure"
Aetna, breach date 7/27/2010 had breach type changed to "Improper Disposal' (see explanation below), breach date 5/24/2010 name changed to City of Charlotte, NC (Health Plan) and state changed to NC
Mercer, breach date 7/30/2010 state changed to MI
Not applicable, breach date 11/2/2011 name changed to Northridge Hospital Medical Center and state changed to CA
na
, breach date 4/4/2011 name
changed to Brian J Daniels DDS, Paul R
Daniels DDS, and state changed to AZ
NA
, breach date 5/27/2011 name
changed to and Spartanburg Regional
Healthcare System state changed to SC
NA
, breach date 7/4/2011 name
changed to Yanz Dental Corporation and state
changed to CA
breaches
for an earlier download of these
data. The exact reporting requirements and even the number
and definitions of variables included in the
data.frame
have changed.
## ## 1. mean(Individuals.Affected) ## mean(HHSCyberSecurityBreaches$Individuals.Affected) ## ## 2. Basic Breach Types ## tb <- as.character(HHSCyberSecurityBreaches$Type.of.Breach) tb. <- strsplit(tb, ', ') table(unlist(tb.)) # 8 levels, but two are the same apart from # a trailing blank. ## ## 3. Location.of.Breached.Information ## lb <- as.character(HHSCyberSecurityBreaches[[ 'Location.of.Breached.Information']]) table(lb) lb. <- strsplit(lb, ', ') table(unlist(lb.)) # 8 levels table(sapply(lb., length)) # 1 2 3 4 5 6 7 8 #1007 119 13 8 1 1 1 1 # all 8 levels together observed once # There are 256 = 2^8 possible combinations # of which 47 actually occur in these data.
## ## 1. mean(Individuals.Affected) ## mean(HHSCyberSecurityBreaches$Individuals.Affected) ## ## 2. Basic Breach Types ## tb <- as.character(HHSCyberSecurityBreaches$Type.of.Breach) tb. <- strsplit(tb, ', ') table(unlist(tb.)) # 8 levels, but two are the same apart from # a trailing blank. ## ## 3. Location.of.Breached.Information ## lb <- as.character(HHSCyberSecurityBreaches[[ 'Location.of.Breached.Information']]) table(lb) lb. <- strsplit(lb, ', ') table(unlist(lb.)) # 8 levels table(sapply(lb., length)) # 1 2 3 4 5 6 7 8 #1007 119 13 8 1 1 1 1 # all 8 levels together observed once # There are 256 = 2^8 possible combinations # of which 47 actually occur in these data.
a cross-section from 1993
number of observations : 22272
observation : individuals
country : United States
data(HI)
data(HI)
A dataframe containing :
hours worked per week by wife
wife covered by husband's HI ?
wife has HI thru her job ?
husband has HI thru own job ?
a factor with levels, "<9years", "9-11years", "12years", "13-15years", "16years", ">16years"
one of white, black, other
Hispanic ?
years of potential work experience
number of kids under age of 6
number of kids 6–18 years old
husband's income in thousands of dollars
one of other
, northcentral
,
south
, west
sampling weight
Olson, Craig A. (1998) “A comparison of parametric and semiparametric estimates of the effect of spousal health insurance coverage on weekly hours worked by wives”, Journal of Applied Econometrics, 13(5), September–October, 543–565.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1997-1998
number of observations : 2381 observation : individuals country : United States
In package version 0.2-9 and earlier this dataset was called Hdma
.
data(Hmda)
data(Hmda)
A dataframe containing :
debt payments to total income ratio
housing expenses to income ratio
ratio of size of loan to assessed value of property
consumer credit score from 1 to 6 (a low value being a good score)
mortgage credit score from 1 to 4 (a low value being a good score)
public bad credit record ?
denied mortgage insurance ?
self employed ?
is the applicant single ?
1989 Massachusetts unemployment rate in the applicant's industry
is unit a condominium ? (was called
comdominiom
in version 0.2-9 and
earlier versions of the package)
is the applicant black ?
mortgage application denied ?
Federal Reserve Bank of Boston.
Munnell, Alicia H., Geoffrey M.B. Tootell, Lynne E. Browne and James McEneaney (1996) “Mortgage lending in Boston: Interpreting HMDA data”, American Economic Review, 25-53.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 9.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1987
number of observations : 546
observation : goods
country : Canada
data(Housing)
data(Housing)
A dataframe containing :
sale price of a house
the lot size of a property in square feet
number of bedrooms
number of full bathrooms
number of stories excluding basement
does the house has a driveway ?
does the house has a recreational room ?
does the house has a full finished basement ?
does the house uses gas for hot water heating ?
does the house has central air conditioning ?
number of garage places
is the house located in the preferred neighbourhood of the city ?
Anglin, P.M. and R. Gencay (1996) “Semiparametric estimation of a hedonic price function”, Journal of Applied Econometrics, 11(6), 633-648.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 3.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
quarterly observations from 1960-1 to 2001-4
number of observations : 168
observation : country
country : Canada
data(Hstarts)
data(Hstarts)
A time series containing :
the log of urban housing starts in Canada, not seasonally adjusted, CANSIM series J6001, converted to quarterly
the log of urban housing starts in Canada, seasonally adjusted, CANSIM series J9001, converted to quarterly. Observations prior to 1966:1 are missing
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 13.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
four–weekly observations from 1951–03–18 to 1953–07–11
number of observations : 30
observation : country
country : United States
data(Icecream)
data(Icecream)
A time series containing :
consumption of ice cream per head (in pints);
average family income per week (in US Dollars);
price of ice cream (per pint);
average temperature (in Fahrenheit);
Hildreth, C. and J. Lu (1960) Demand relations with autocorrelated disturbances, Technical Bulletin No 2765, Michigan State University.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 4.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
Data on quantiles of the distributions of family incomes in the United States. This combines three data sources:
(1) US Census Table F-1 for the central quantiles
(2) Piketty and Saez for the 95th and higher quantiles
(3) Gross Domestic Product and implicit price
deflators from Measuring Worth. (NOTE: The
Measuring Worth Web site,
https://MeasuringWorth.com
, often gives
security warnings. The desired data still seems
to be available and not corrupted, however.)
data(incomeInequality)
data(incomeInequality)
A data.frame
containing:
numeric year 1947:2012
number of families in the US
quintile1, quintile2, quintile3, quintile4, and p95 are the indicated quantiles of the distribution of family income from US Census Table F-1. The media is computed as the geometric mean of quintile2 and quintile3. This is accurate to the extent that the lognormal distribution adequately approximates the central 20 percent of the income distribution, which it should for most practical purposes.
The indicated quantiles of family income per Piketty and Saez
real GDP in millions, GDP implicit price
deflators, US population in thousands, and
real GDP per capita, according to
MeasuringWorth.com
. (NOTE: The web
address for this,
https://MeasuringWorth.com
, seems
to be functional but may not be maintained
to current internet security standards. It
is therefore given here as text rather than
a hot link.)
ratio of the estimates of the 95th percentile of distributions of family income from the Piketty and Saez analysis of data from the Internal Revenue Service (IRS) and from the US Census Bureau.
The IRS has ranged between 72 and 98 percent of the Census Bureau figures for the 95th percentile of the distribution, with this ratio averaging around 75 percent since the late 1980s. However, this systematic bias is modest relative to the differences between the different quantiles of interest in this combined dataset.
average number of persons per family using
the number of families from US Census Table
F-1 and the population from MeasuringWorth.
(Note: The web site for Measuring Worth,
https://MeasuringWorth.com
, often gives
security warnings. It still seems to work.
It seems that the web site is not maintained
to current internet security standards.)
personsPerFamily * realGDPperCap
ratio of realGDPperFamily
to the
median. This is a measure of skewness and
income inequality.
For details on how this data.frame
was
created, see "F1.PikettySaez.R"
in
system.file('scripts', package='fda')
.
This provides links for files to download and R
commands to read those files and convert them
into an updated version of incomeInequality
.
This is a reasonable thing to do if it is more
than 2 years since
max(incomeInequality$year)
. All data
are in constant 2012 dollars.
Spencer Graves
United States Census Bureau, Table F-1. Income Limits for Each Fifth and Top 5 Percent of Families, All Races, https://www.census.gov/data/tables/time-series/demo/income-poverty/historical-income-inequality.html, accessed 2016-12-09.
Thomas Piketty and Emmanuel Saez (2003) "Income Inequality in the United States, 1913-1998", Quarterly Journal of Economics, 118(1) 1-39, https://eml.berkeley.edu/~saez/, update accessed February 28, 2014.
Louis Johnston and Samuel H. Williamson (2011)
"What Was the U.S. GDP Then?" MeasuringWorth. (Note:
Their web address,
https://www.measuringworth.org/usgdp
,
often gives security warnings. The desired data
still seems to be available there. However, it
seems that the site is not maintained to current
internet security standards. The data used
in the current USGDPpresidents
data set
was extracted February 28, 2014.)
## ## Rato of IRS to census estimates for the 95th percentile ## data(incomeInequality) plot(P95IRSvsCensus~Year, incomeInequality, type='b') # starts ~0.74, trends rapidly up to ~0.97, # then drifts back to ~0.75 abline(h=0.75) abline(v=1989) # check sum(is.na(incomeInequality$P95IRSvsCensus)) # The Census data runs to 2011; Pikety and Saez runs to 2010. quantile(incomeInequality$P95IRSvsCensus, na.rm=TRUE) # 0.72 ... 0.98 ## ## Persons per Family ## plot(personsPerFamily~Year, incomeInequality, type='b') quantile(incomeInequality$personsPerFamily) # ranges from 3.72 to 4.01 with median 3.84 # -- almost 4 ## ## GDP per family ## plot(realGDPperFamily~Year, incomeInequality, type='b', log='y') ## ## Plot the mean then the first quintile, then the median, ## 99th, 99.9th and 99.99th percentiles ## plotCols <- c(21, 3, 5, 11, 13:14) kcols <- length(plotCols) plotColors <- c(1:6, 8:13)[1:kcols] # omit 7=yellow plotLty <- 1:kcols matplot(incomeInequality$Year, incomeInequality[plotCols]/1000, log='y', type='l', col=plotColors, lty=plotLty) #*** Growth broadly shared 1947 - 1970, then began diverging #*** The divergence has been most pronounced among the top 1% #*** and especially the top 0.01% ## ## Growth rate by quantile 1947-1970 and 1970 - present ## keyYears <- c(1947, 1970, 2010) (iYears <- which(is.element(incomeInequality$Year, keyYears))) (dYears <- diff(keyYears)) kk <- length(keyYears) (lblYrs <- paste(keyYears[-kk], keyYears[-1], sep='-')) (growth <- sapply(incomeInequality[iYears,], function(x, labels=lblYrs){ dxi <- exp(diff(log(x))) names(dxi) <- labels dxi } )) # as percent (gr <- round(100*(growth-1), 1)) # The average annual income (realGDPperFamily) doubled between # 1970 and 2010 (increased by 101 percent), while the median household # income increased only 23 percent. ## ## Income lost by each quantile 1970-2010 ## relative to the broadly shared growth 1947-1970 ## (lostGrowth <- (growth[, 'realGDPperFamily']-growth[, plotCols])) # 1947-1970: The median gained 20% relative to the mean, # while the top 1% lost ground # 1970-2010: The median lost 79%, the 99th percentile lost 29%, # while the top 0.1% gained (lostIncome <- (lostGrowth[2, ] * incomeInequality[iYears[2], plotCols])) # The median family lost $39,000 per year in income # relative to what they would have with the same economic growth # broadly shared as during 1947-1970. # That's slightly over $36,500 per year = $100 per day (grYr <- growth^(1/dYears)) (grYr. <- round(100*(grYr-1), 1)) ## ## Regression line: linear spline ## (varyg <- c(3:14, 21)) Varyg <- names(incomeInequality)[varyg] str(F01ps <- reshape(incomeInequality[c(1, varyg)], idvar='Year', ids=F1.PikettySeaz$Year, times=Varyg, timevar='pctile', varying=list(Varyg), direction='long')) names(F01ps)[2:3] <- c('variable', 'value') F01ps$variable <- factor(F01ps$variable) # linear spline basis function with knot at 1970 F01ps$t1970p <- pmax(0, F01ps$Year-1970) table(nas <- is.na(F01ps$value)) # 6 NAs, one each of the Piketty-Saez variables in 2011 F01i <- F01ps[!nas, ] # formula: # log(value/1000) ~ b*Year + (for each variable: # different intercept + (different slope after 1970)) Fit <- lm(log(value/1000)~Year+variable*t1970p, F01i) anova(Fit) # all highly significant # The residuals may show problems with the model, # but we will ignore those for now. # Model predictions str(Pred <- predict(Fit)) ## ## Combined plot ## # Plot to a file? Wikimedia Commons prefers svg format. ## Not run: if(FALSE){ svg('incomeInequality8.svg') # If you want software to convert svg to another format # such as png, consider GIMP (www.gimp.org). # Base plot # Leave extra space on the right to label # with growth since 1970 op <- par(mar=c(5, 4, 4, 5)+0.1) matplot(incomeInequality$Year, incomeInequality[plotCols]/1000, log='y', type='l', col=plotColors, lty=plotLty, xlab='', ylab='', las=1, axes=FALSE, lwd=3) axis(1, at=seq(1950, 2010, 10), labels=c(1950, NA, 1970, NA, 1990, NA, 2010), cex.axis=1.5) yat <- c(10, 50, 100, 500, 1000, 5000, 10000) axis(2, yat, labels=c('$10K', '$50K', '$100K', '$500K', '$1M', '$5M', '$10M'), las=1, cex.axis=1.2) # Label the lines pctls <- paste(c(20, 40, 50, 60, 80, 90, 95, 99, 99.5, 99.9, 99.99), '%', sep='') lineLbl0 <- c('Year', 'families K', pctls, 'realGDP.M', 'GDP deflator', 'pop-K', 'realGDPperFamily', '95 pct(IRS / Census)', 'size of household', 'average family income', 'mean/median') (lineLbls <- lineLbl0[plotCols]) sel75 <- (incomeInequality$Year==1975) laby <- incomeInequality[sel75, plotCols]/1000 text(1973.5, c(1.2, 1.2, 1.3, 1.5, 1.9)*laby[-1], lineLbls[-1], cex=1.2) text(1973.5, 1.2*laby[1], lineLbls[1], cex=1.2, srt=10) ## ## Add lines + points for the knots in 1970 ## End <- numeric(kcols) F01names <- names(incomeInequality) for(i in seq(length=kcols)){ seli <- (as.character(F01i$variable) == F01names[plotCols[i]]) # with(F01i[seli, ], lines(Year, exp(Pred[seli]), # col=plotColors[i])) yri <- F01i$Year[seli] predi <- exp(Pred[seli]) lines(yri, predi, col=plotColors[i]) End[i] <- predi[length(predi)] sel70i <- (yri==1970) points(yri[sel70i], predi[sel70i], col=plotColors[i]) } ## ## label growth rates ## table(sel70. <- (incomeInequality$Year>1969)) (lastYrs <- incomeInequality[sel70., 'Year']) (lastYr. <- max(lastYrs)+4) #text(lastYr., End, gR., xpd=NA) text(lastYr., End, paste(gr[2, plotCols], '%', sep=''), xpd=NA) text(lastYr.+7, End, paste(grYr.[2, plotCols], '%', sep=''), xpd=NA) ## ## Label the presidents ## abline(v=c(1953, 1961, 1969, 1977, 1981, 1989, 1993, 2001, 2009)) (m99.95 <- with(incomeInequality, sqrt(P99.9*P99.99))/1000) text(1949, 5000, 'Truman') text(1956.8, 5000, 'Eisenhower', srt=90) text(1963, 5000, 'Kennedy', srt=90) text(1966.8, 5000, 'Johnson', srt=90) text(1971, 5*m99.95[24], 'Nixon', srt=90) text(1975, 5*m99.95[28], 'Ford', srt=90) text(1978.5, 5*m99.95[32], 'Carter', srt=90) text(1985.1, m99.95[38], 'Reagan' ) text(1991, 0.94*m99.95[44], 'GHW Bush', srt=90) text(1997, m99.95[50], 'Clinton') text(2005, 1.1*m99.95[58], 'GW Bush', srt=90) text(2010, 1.2*m99.95[62], 'Obama', srt=90) ## ## Done ## par(op) # reset margins dev.off() # for plot to a file } ## End(Not run)
## ## Rato of IRS to census estimates for the 95th percentile ## data(incomeInequality) plot(P95IRSvsCensus~Year, incomeInequality, type='b') # starts ~0.74, trends rapidly up to ~0.97, # then drifts back to ~0.75 abline(h=0.75) abline(v=1989) # check sum(is.na(incomeInequality$P95IRSvsCensus)) # The Census data runs to 2011; Pikety and Saez runs to 2010. quantile(incomeInequality$P95IRSvsCensus, na.rm=TRUE) # 0.72 ... 0.98 ## ## Persons per Family ## plot(personsPerFamily~Year, incomeInequality, type='b') quantile(incomeInequality$personsPerFamily) # ranges from 3.72 to 4.01 with median 3.84 # -- almost 4 ## ## GDP per family ## plot(realGDPperFamily~Year, incomeInequality, type='b', log='y') ## ## Plot the mean then the first quintile, then the median, ## 99th, 99.9th and 99.99th percentiles ## plotCols <- c(21, 3, 5, 11, 13:14) kcols <- length(plotCols) plotColors <- c(1:6, 8:13)[1:kcols] # omit 7=yellow plotLty <- 1:kcols matplot(incomeInequality$Year, incomeInequality[plotCols]/1000, log='y', type='l', col=plotColors, lty=plotLty) #*** Growth broadly shared 1947 - 1970, then began diverging #*** The divergence has been most pronounced among the top 1% #*** and especially the top 0.01% ## ## Growth rate by quantile 1947-1970 and 1970 - present ## keyYears <- c(1947, 1970, 2010) (iYears <- which(is.element(incomeInequality$Year, keyYears))) (dYears <- diff(keyYears)) kk <- length(keyYears) (lblYrs <- paste(keyYears[-kk], keyYears[-1], sep='-')) (growth <- sapply(incomeInequality[iYears,], function(x, labels=lblYrs){ dxi <- exp(diff(log(x))) names(dxi) <- labels dxi } )) # as percent (gr <- round(100*(growth-1), 1)) # The average annual income (realGDPperFamily) doubled between # 1970 and 2010 (increased by 101 percent), while the median household # income increased only 23 percent. ## ## Income lost by each quantile 1970-2010 ## relative to the broadly shared growth 1947-1970 ## (lostGrowth <- (growth[, 'realGDPperFamily']-growth[, plotCols])) # 1947-1970: The median gained 20% relative to the mean, # while the top 1% lost ground # 1970-2010: The median lost 79%, the 99th percentile lost 29%, # while the top 0.1% gained (lostIncome <- (lostGrowth[2, ] * incomeInequality[iYears[2], plotCols])) # The median family lost $39,000 per year in income # relative to what they would have with the same economic growth # broadly shared as during 1947-1970. # That's slightly over $36,500 per year = $100 per day (grYr <- growth^(1/dYears)) (grYr. <- round(100*(grYr-1), 1)) ## ## Regression line: linear spline ## (varyg <- c(3:14, 21)) Varyg <- names(incomeInequality)[varyg] str(F01ps <- reshape(incomeInequality[c(1, varyg)], idvar='Year', ids=F1.PikettySeaz$Year, times=Varyg, timevar='pctile', varying=list(Varyg), direction='long')) names(F01ps)[2:3] <- c('variable', 'value') F01ps$variable <- factor(F01ps$variable) # linear spline basis function with knot at 1970 F01ps$t1970p <- pmax(0, F01ps$Year-1970) table(nas <- is.na(F01ps$value)) # 6 NAs, one each of the Piketty-Saez variables in 2011 F01i <- F01ps[!nas, ] # formula: # log(value/1000) ~ b*Year + (for each variable: # different intercept + (different slope after 1970)) Fit <- lm(log(value/1000)~Year+variable*t1970p, F01i) anova(Fit) # all highly significant # The residuals may show problems with the model, # but we will ignore those for now. # Model predictions str(Pred <- predict(Fit)) ## ## Combined plot ## # Plot to a file? Wikimedia Commons prefers svg format. ## Not run: if(FALSE){ svg('incomeInequality8.svg') # If you want software to convert svg to another format # such as png, consider GIMP (www.gimp.org). # Base plot # Leave extra space on the right to label # with growth since 1970 op <- par(mar=c(5, 4, 4, 5)+0.1) matplot(incomeInequality$Year, incomeInequality[plotCols]/1000, log='y', type='l', col=plotColors, lty=plotLty, xlab='', ylab='', las=1, axes=FALSE, lwd=3) axis(1, at=seq(1950, 2010, 10), labels=c(1950, NA, 1970, NA, 1990, NA, 2010), cex.axis=1.5) yat <- c(10, 50, 100, 500, 1000, 5000, 10000) axis(2, yat, labels=c('$10K', '$50K', '$100K', '$500K', '$1M', '$5M', '$10M'), las=1, cex.axis=1.2) # Label the lines pctls <- paste(c(20, 40, 50, 60, 80, 90, 95, 99, 99.5, 99.9, 99.99), '%', sep='') lineLbl0 <- c('Year', 'families K', pctls, 'realGDP.M', 'GDP deflator', 'pop-K', 'realGDPperFamily', '95 pct(IRS / Census)', 'size of household', 'average family income', 'mean/median') (lineLbls <- lineLbl0[plotCols]) sel75 <- (incomeInequality$Year==1975) laby <- incomeInequality[sel75, plotCols]/1000 text(1973.5, c(1.2, 1.2, 1.3, 1.5, 1.9)*laby[-1], lineLbls[-1], cex=1.2) text(1973.5, 1.2*laby[1], lineLbls[1], cex=1.2, srt=10) ## ## Add lines + points for the knots in 1970 ## End <- numeric(kcols) F01names <- names(incomeInequality) for(i in seq(length=kcols)){ seli <- (as.character(F01i$variable) == F01names[plotCols[i]]) # with(F01i[seli, ], lines(Year, exp(Pred[seli]), # col=plotColors[i])) yri <- F01i$Year[seli] predi <- exp(Pred[seli]) lines(yri, predi, col=plotColors[i]) End[i] <- predi[length(predi)] sel70i <- (yri==1970) points(yri[sel70i], predi[sel70i], col=plotColors[i]) } ## ## label growth rates ## table(sel70. <- (incomeInequality$Year>1969)) (lastYrs <- incomeInequality[sel70., 'Year']) (lastYr. <- max(lastYrs)+4) #text(lastYr., End, gR., xpd=NA) text(lastYr., End, paste(gr[2, plotCols], '%', sep=''), xpd=NA) text(lastYr.+7, End, paste(grYr.[2, plotCols], '%', sep=''), xpd=NA) ## ## Label the presidents ## abline(v=c(1953, 1961, 1969, 1977, 1981, 1989, 1993, 2001, 2009)) (m99.95 <- with(incomeInequality, sqrt(P99.9*P99.99))/1000) text(1949, 5000, 'Truman') text(1956.8, 5000, 'Eisenhower', srt=90) text(1963, 5000, 'Kennedy', srt=90) text(1966.8, 5000, 'Johnson', srt=90) text(1971, 5*m99.95[24], 'Nixon', srt=90) text(1975, 5*m99.95[28], 'Ford', srt=90) text(1978.5, 5*m99.95[32], 'Carter', srt=90) text(1985.1, m99.95[38], 'Reagan' ) text(1991, 0.94*m99.95[44], 'GHW Bush', srt=90) text(1997, m99.95[50], 'Clinton') text(2005, 1.1*m99.95[58], 'GW Bush', srt=90) text(2010, 1.2*m99.95[62], 'Obama', srt=90) ## ## Done ## par(op) # reset margins dev.off() # for plot to a file } ## End(Not run)
quarterly observations from 1971–1 to 1985–2
number of observations : 58
observation : country
country : United Kingdom
data(IncomeUK)
data(IncomeUK)
A time series containing :
total disposable income (million Pounds, current prices)
consumer expenditure (million Pounds, current prices)
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapters 8 and 9.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
binomial model
Benefits
:
Unemployment of Blue Collar Workers
Hmda
: The Boston
HMDA Data Set
Mroz
: Labor Supply
Data
Participation
:
Labor Force Participation
Train
: Stated
Preferences for Train Traveling
censored and truncated model
Fair
: Extramarital Affairs
Data
HI
: Health Insurance and
Hours Worked By Wives
Mofa
: International
Expansion of U.S. MOFAs (majority–owned
Foreign Affiliates in Fire (finance,
Insurance and Real Estate)
Tobacco
: Households
Tobacco Budget Share
Workinghours
: Wife
Working Hours
count data
Accident
: Ship Accidents
Bids
: Bids Received By
U.S. Firms
Doctor
: Number of
Doctor Visits
DoctorAUS
: Doctor Visits
in Australia
DoctorContacts
: Contacts
With Medical Doctor
OFP
: Visits to Physician
Office
PatentsHGH
: Dynamic
Relation Between Patents and R&D
PatentsRD
: Patents, R&D
and Technological Spillovers for a Panel of
Firms
Somerville
: Visits to Lake
Somerville
StrikeNb
: Number of Strikes
in US Manufacturing
duration model
Oil
: Oil Investment
Strike
: Strike Duration Data
StrikeDur
: Strikes Duration
UnempDur
: Unemployment
Duration
Unemployment
: Unemployment
Duration
multinomial model
Car
: Stated Preferences
for Car Choice
Catsup
: Choice of Brand
for Catsup
Cracker
: Choice of Brand
for Crackers
Fishing
: Choice of Fishing
Mode
HC
: Heating and Cooling
System Choice in Newly Built Houses in California
Heating
: Heating System
Choice in California Houses
Ketchup
: Choice of Brand
for Ketchup
Mode
: Mode Choice
ModeChoice
: Data to Study Travel Mode Choice
Tuna
: Choice of Brand for Tuna
Yogurt
: Choice of Brand for Yogurts
ordered model
Kakadu
: Willingness to Pay for the Preservation of the Kakadu National Park
Mathlevel
: Level of Calculus Attained for Students Taking Advanced Micro–economics
NaturalPark
: Willingness to Pay for the Preservation of the Alentejo Natural Park
panel
Airline
: Cost for U.S. Airlines
Cigar
: Cigarette Consumption
Cigarette
: The Cigarette Consumption Panel Data Set
Crime
: Crime in North Carolina
Fatality
: Drunk Driving Laws and Traffic Deaths
Gasoline
: Gasoline Consumption
Grunfeld
: Grunfeld Investment Data
LaborSupply
: Wages and Hours Worked
Males
: Wages and Education of Young Males
MunExp
: Municipal Expenditure Data
Produc
: Us States Production
SumHes
: The Penn Table
Wages
: Panel Data of Individual Wages
system of equations
BudgetItaly
: Budget Shares for Italian Households
BudgetUK
: Budget Shares of British Households
Electricity
: Cost Function for Electricity Producers
Klein
: Klein's Model I
ManufCost
: Manufacturing Costs
Nerlove
: Cost Function for Electricity Producers, 1955
University
: Provision of University Teaching and Research
time–series
CRSPday
: Daily Returns from the CRSP Database
CRSPmon
: Monthly Returns from the CRSP Database
Capm
: Stock Market Data
Consumption
: Quarterly Data on Consumption and Expenditure
DM
: DM Dollar Exchange Rate
Forward
: Exchange Rates of US Dollar Against Other Currencies
Garch
: Daily Observations on Exchange Rates of the US Dollar Against Other Currencies
Hstarts
: Housing Starts
Icecream
: Ice Cream Consumption
IncomeUK
: Seasonally Unadjusted Quarterly Data on Disposable Income and Expenditure
Irates
: Monthly Interest Rates
LT
: Dollar Sterling Exchange Rate
MW
: Growth of Disposable Income and Treasury Bill Rate
Macrodat
: Macroeconomic Time Series for the United States
Mishkin
: Inflation and Interest Rates
MoneyUS
: Macroeconomic Series for the United States
Mpyr
: Money, National Product and Interest Rate
Orange
: The Orange Juice Data Set
PE
: Price and Earnings Index
PPP
: Exchange Rates and Price Indices for France and Italy
Pound
: Pound-dollar Exchange Rate
Pricing
: Returns of Size-based Portfolios
Solow
: Solow's Technological Change Data
Tbrate
: Interest Rate, GDP and Inflation
Yen
: Yen-dollar Exchange Rate
consumer behavior
BudgetFood
: Budget
Share of Food for Spanish Households
BudgetItaly
: Budget
Shares for Italian Households
BudgetUK
: Budget Shares
of British Households
Car
: Stated Preferences
for Car Choice
Cigar
: Cigarette
Consumption
Cigarette
: The Cigarette
Consumption Panel Data Set
Doctor
: Number of Doctor
Visits
Fishing
: Choice of
Fishing Mode
Gasoline
: Gasoline
Consumption
HC
: Heating and Cooling
System Choice in Newly Built Houses in
California
Heating
: Heating System
Choice in California Houses
Icecream
: Ice Cream
Consumption
Mode
: Mode Choice
ModeChoice
: Data to
Study Travel Mode Choice
Somerville
: Visits to
Lake Somerville
Tobacco
: Households
Tobacco Budget Share
Train
: Stated
Preferences for Train Traveling
economics of education
environmental economics
Airq
: Air Quality for
Californian Metropolitan Areas
Kakadu
: Willingness to
Pay for the Preservation of the Kakadu
National Park
NaturalPark
: Willingness
to Pay for the Preservation of the Alentejo
Natural Park
finance
CRSPday
: Daily Returns from
the CRSP Database
CRSPmon
: Monthly Returns
from the CRSP Database
Capm
: Stock Market Data
DM
: DM Dollar Exchange Rate
Forward
: Exchange Rates of
US Dollar Against Other Currencies
Garch
: Daily Observations
on Exchange Rates of the US Dollar Against
Other Currencies
Irates
: Monthly Interest
Rates
LT
: Dollar Sterling
Exchange Rate
PPP
: Exchange Rates and
Price Indices for France and Italy
Pound
: Pound-dollar
Exchange Rate
Pricing
: Returns of
Size-based Portfolios
Yen
: Yen-dollar Exchange
Rate
game theory
FriendFoe
: Data from the
Television Game Show Friend Or Foe ?
health economics
hedonic prices
labor economics
Benefits
: Unemployment of Blue Collar Workers
Bwages
: Wages in Belgium
CPSch3
: Earnings from the Current Population Survey
Earnings
: Earnings for Three Age Groups
Griliches
: Wage Data
HI
: Health Insurance and Hours Worked By Wives
LaborSupply
: Wages and Hours Worked
Labour
: Belgian Firms
Males
: Wages and Education of Young Males
Mroz
: Labor Supply Data
PSID
: Panel Survey of Income Dynamics
Participation
: Labor Force Participation
RetSchool
: Return to Schooling
Schooling
: Wages and Schooling
Strike
: Strike Duration Data
StrikeDur
: Strikes Duration
StrikeNb
: Number of Strikes in Us Manufacturing
Treatment
: Evaluating Treatment Effect of Training on Earnings
UnempDur
: Unemployment Duration
Unemployment
: Unemployment Duration
Wages
: Panel Data of Individual Wages
Wages1
: Wages, Experience and Schooling
Workinghours
: Wife Working Hours
macroeconomics
Consumption
: Quarterly Data on Consumption and Expenditure
Hstarts
: Housing Starts
IncomeUK
: Seasonally Unadjusted Quarterly Data on Disposable Income and Expenditure
Klein
: Klein's Model I
Longley
: The Longley Data
MW
: Growth of Disposable Income and Treasury Bill Rate
Macrodat
: Macroeconomic Time Series for the United States
Mishkin
: Inflation and Interest Rates
Money
: Money, GDP and Interest Rate in Canada
MoneyUS
: Macroeconomic Series for the United States
Mpyr
: Money, National Product and Interest Rate
PE
: Price and Earnings Index
Produc
: Us States Production
Solow
: Solow's Technological Change Data
SumHes
: The Penn Table
Tbrate
: Interest Rate, GDP and Inflation
marketing
producer behavior
Accident
: Ship Accidents
Airline
: Cost for U.S. Airlines
Bids
: Bids Received By U.S. Firms
Clothing
: Sales Data of Men's Fashion Stores
Electricity
: Cost Function for Electricity Producers
Grunfeld
: Grunfeld Investment Data
Hmda
: The Boston HMDA Data Set
ManufCost
: Manufacturing Costs
Metal
: Production for SIC 33
Mofa
: International Expansion of U.S. MOFAs (majority–owned Foreign Affiliates in Fire (finance, Insurance and Real Estate)
Nerlove
: Cost Function for Electricity Producers, 1955
Oil
: Oil Investment
Orange
: The Orange Juice Data Set
PatentsHGH
: Dynamic Relation Between Patents and R&D
PatentsRD
: Patents, R&D and Technological Spillovers for a Panel of Firms
TranspEq
: Statewide Data on Transportation Equipment Manufacturing
University
: Provision of University Teaching and Research
socioeconomics
country
Consumption
: Quarterly
Data on Consumption and Expenditure
DM
: DM Dollar Exchange
Rate
Garch
: Daily
Observations on Exchange Rates of the US
Dollar Against Other Currencies
Gasoline
: Gasoline
Consumption
Hstarts
: Housing
Starts
Icecream
: Ice Cream
Consumption
IncomeUK
: Seasonally
Unadjusted Quarterly Data on Disposable
Income and Expenditure
Irates
: Monthly
Interest Rates
Klein
: Klein's Model I
LT
: Dollar Sterling
Exchange Rate
Longley
: The Longley
Data
MW
: Growth of
Disposable Income and Treasury Bill Rate
Macrodat
:
Macroeconomic Time Series for the United
States
ManufCost
:
Manufacturing Costs
Mishkin
: Inflation
and Interest Rates
Mofa
: International
Expansion of U.S. MOFAs (majority–owned
Foreign Affiliates in Fire (finance,
Insurance and Real Estate)
Money
: Money, GDP and
Interest Rate in Canada
Mpyr
: Money, National
Product and Interest Rate
Orange
: The Orange
Juice Data Set
PE
: Price and
Earnings Index
PPP
: Exchange Rates
and Price Indices for France and Italy
Pound
: Pound-dollar
Exchange Rate
Solow
: Solow's
Technological Change Data
StrikeNb
: Number of
Strikes in Us Manufacturing
SumHes
: The Penn Table
Tbrate
: Interest Rate,
GDP and Inflation
Yen
: Yen-dollar
Exchange Rate
goods
households
BudgetFood
: Budget Share of Food for Spanish Households
BudgetItaly
: Budget Shares for Italian Households
BudgetUK
: Budget Shares of British Households
HC
: Heating and Cooling System Choice in Newly Built Houses in California
Heating
: Heating System Choice in California Houses
VietNamH
: Medical Expenses in Vietnam (household Level)
individuals
Benefits
: Unemployment of Blue Collar Workers
Bwages
: Wages in Belgium
CPSch3
: Earnings from the Current Population Survey
Car
: Stated Preferences for Car Choice
Catsup
: Choice of Brand for Catsup
Cracker
: Choice of Brand for Crackers
Doctor
: Number of Doctor Visits
DoctorAUS
: Doctor Visits in Australia
Earnings
: Earnings for Three Age Groups
Fair
: Extramarital Affairs Data
Fishing
: Choice of Fishing Mode
FriendFoe
: Data from the Television Game Show Friend Or Foe ?
Griliches
: Wage Data
HI
: Health Insurance and Hours Worked By Wives
Hmda
: The Boston HMDA Data Set
Kakadu
: Willingness to Pay for the Preservation of the Kakadu National Park
Ketchup
: Choice of Brand for Ketchup
Males
: Wages and Education of Young Males
Mathlevel
: Level of Calculus Attained for Students Taking Advanced Micro–economics
Mode
: Mode Choice
ModeChoice
: Data to Study Travel Mode Choice
Mroz
: Labor Supply Data
NaturalPark
: Willingness to Pay for the Preservation of the Alentejo Natural Park
OFP
: Visits to Physician Office
PSID
: Panel Survey of Income Dynamics
Participation
: Labor Force Participation
RetSchool
: Return to Schooling
Schooling
: Wages and Schooling
Somerville
: Visits to Lake Somerville
Star
: Effects on Learning of Small Class Sizes
Tobacco
: Households Tobacco Budget Share
Train
: Stated Preferences for Train Traveling
Tuna
: Choice of Brand for Tuna
Unemployment
: Unemployment Duration
VietNamI
: Medical Expenses in Vietnam (individual Level)
Wages
: Panel Data of Individual Wages
Wages1
: Wages, Experience and Schooling
Workinghours
: Wife Working Hours
Yogurt
: Choice of Brand for Yogurts
production units
Airline
: Cost for U.S. Airlines
Bids
: Bids Received By U.S. Firms
CRSPday
: Daily Returns from the CRSP Database
CRSPmon
: Monthly Returns from the CRSP Database
Clothing
: Sales Data of Men's Fashion Stores
Electricity
: Cost Function for Electricity Producers
Grunfeld
: Grunfeld Investment Data
Labour
: Belgian Firms
Nerlove
: Cost Function for Electricity Producers, 1955
Oil
: Oil Investment
PatentsHGH
: Dynamic Relation Between Patents and R&D
PatentsRD
: Patents, R&D and Technological Spillovers for a Panel of Firms
regional
Airq
: Air Quality for Californian Metropolitan Areas
Cigar
: Cigarette Consumption
Cigarette
: The Cigarette Consumption Panel Data Set
Crime
: Crime in North Carolina
Fatality
: Drunk Driving Laws and Traffic Deaths
Hedonic
: Hedonic Prices of Census Tracts in Boston
Metal
: Production for SIC 33
MunExp
: Municipal Expenditure Data
Produc
: Us States Production
TranspEq
: Statewide Data on Transportation Equipment Manufacturing
schools
Caschool
: The California Test Score Data Set
MCAS
: The Massachusetts Test Score Data Set
University
: Provision of University Teaching and Research
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/
Bids
: Bids Received By
U.S. Firms
BudgetFood
: Budget
Share of Food for Spanish Households
BudgetItaly
: Budget
Shares for Italian Households
BudgetUK
: Budget
Shares of British Households
Car
: Stated
Preferences for Car Choice
Computers
: Prices of
Personal Computers
Crime
: Crime in
North Carolina
Doctor
: Number of
Doctor Visits
Earnings
: Earnings
for Three Age Groups
HI
: Health Insurance
and Hours Worked By Wives
Housing
: Sales Prices
of Houses in the City of Windsor
Males
: Wages and
Education of Young Males
Mathlevel
: Level of
Calculus Attained for Students Taking
Advanced Micro–economics
MoneyUS
:
Macroeconomic Series for the United States
MunExp
: Municipal
Expenditure Data
OFP
: Visits to
Physician Office
Oil
: Oil Investment
Participation
: Labor
Force Participation
PatentsRD
: Patents,
R&D and Technological Spillovers for a
Panel of Firms
Train
: Stated
Preferences for Train Traveling
Unemployment
:
Unemployment Duration
University
: Provision
of University Teaching and Research
Workinghours
: Wife
Working Hours
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20
Benefits
: Unemployment of
Blue Collar Workers
Catsup
: Choice of Brand for
Catsup
Cracker
: Choice of Brand
for Crackers
Kakadu
: Willingness to Pay
for the Preservation of the Kakadu National
Park
Ketchup
: Choice of Brand
for Ketchup
LaborSupply
: Wages and
Hours Worked
Mofa
: International
Expansion of U.S. MOFAs (majority–owned
Foreign Affiliates in Fire (finance,
Insurance and Real Estate)
Somerville
: Visits to
Lake Somerville
Tuna
: Choice of Brand for
Tuna
Yogurt
: Choice of Brand
for Yogurts
Journal of Statistics Education's data archive : http://jse.amstat.org/jse_data_archive.htm
Kenneth Train's home page : https://eml.berkeley.edu/~train/
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge
DoctorContacts
: Contacts With Medical Doctor
Fishing
: Choice of Fishing Mode
LaborSupply
: Wages and Hours Worked
MedExp
: Structure of Demand for Medical Care
PSID
: Panel Survey of Income Dynamics
PatentsHGH
: Dynamic Relation Between Patents and R&D
RetSchool
: Return to Schooling
StrikeDur
: Strikes Duration
Treatment
: Evaluating Treatment Effect of Training on Earnings
UnempDur
: Unemployment Duration
VietNamH
: Medical Expenses in Vietnam (household Level)
VietNamI
: Medical Expenses in Vietnam (individual Level)
Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html
Bids
: Bids Received By U.S. Firms
DoctorAUS
: Doctor Visits in Australia
OFP
: Visits to Physician Office
PatentsHGH
: Dynamic Relation Between Patents and R&D
Somerville
: Visits to Lake Somerville
StrikeNb
: Number of Strikes in Us Manufacturing
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press
CRSPday
: Daily Returns from the CRSP Database
CRSPmon
: Monthly Returns from the CRSP Database
Consumption
: Quarterly Data on Consumption and Expenditure
Doctor
: Number of Doctor Visits
Earnings
: Earnings for Three Age Groups
Hstarts
: Housing Starts
MW
: Growth of Disposable Income and Treasury Bill Rate
Money
: Money, GDP and Interest Rate in Canada
Participation
: Labor Force Participation
Tbrate
: Interest Rate, GDP and Inflation
Greene, W.H. (2003) Econometric Analysis, Prentice Hall
Accident
: Ship Accidents
Airline
: Cost for U.S. Airlines
Electricity
: Cost Function for Electricity Producers
Fair
: Extramarital Affairs Data
Grunfeld
: Grunfeld Investment Data
Klein
: Klein's Model I
Longley
: The Longley Data
ManufCost
: Manufacturing Costs
Metal
: Production for SIC 33
ModeChoice
: Data to Study Travel Mode Choice
Mroz
: Labor Supply Data
MunExp
: Municipal Expenditure Data
Nerlove
: Cost Function for Electricity Producers, 1955
Solow
: Solow's Technological Change Data
Strike
: Strike Duration Data
TranspEq
: Statewide Data on Transportation Equipment Manufacturing
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm
DM
: DM Dollar Exchange Rate
Electricity
: Cost Function for Electricity Producers
Griliches
: Wage Data
LT
: Dollar Sterling Exchange Rate
Mishkin
: Inflation and Interest Rates
Mpyr
: Money, National Product and Interest Rate
Nerlove
: Cost Function for Electricity Producers, 1955
Pound
: Pound-dollar Exchange Rate
SumHes
: The Penn Table
Yen
: Yen-dollar Exchange Rate
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers
CPSch3
: Earnings from the Current Population Survey
Caschool
: The California Test Score Data Set
Cigarette
: The Cigarette Consumption Panel Data Set
Fatality
: Drunk Driving Laws and Traffic Deaths
Hmda
: The Boston HMDA Data Set
Journals
: Economic Journals Data Set
MCAS
: The Massachusetts Test Score Data Set
Macrodat
: Macroeconomic Time Series for the United States
Orange
: The Orange Juice Data Set
Star
: Effects on Learning of Small Class Sizes
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons
Airq
: Air Quality for Californian Metropolitan Areas
Benefits
: Unemployment of Blue Collar Workers
Bwages
: Wages in Belgium
Capm
: Stock Market Data
Clothing
: Sales Data of Men's Fashion Stores
Forward
: Exchange Rates of US Dollar Against Other Currencies
Garch
: Daily Observations on Exchange Rates of the US Dollar Against Other Currencies
Housing
: Sales Prices of Houses in the City of Windsor
Icecream
: Ice Cream Consumption
IncomeUK
: Seasonally Unadjusted Quarterly Data on Disposable Income and Expenditure
Irates
: Monthly Interest Rates
Labour
: Belgian Firms
Males
: Wages and Education of Young Males
MoneyUS
: Macroeconomic Series for the United States
NaturalPark
: Willingness to Pay for the Preservation of the Alentejo Natural Park
PE
: Price and Earnings Index
PPP
: Exchange Rates and Price Indices for France and Italy
PatentsRD
: Patents, R&D and Technological Spillovers for a Panel of Firms
Pricing
: Returns of Size-based Portfolios
SP500
: Returns on Standard & Poor's 500 Index
Schooling
: Wages and Schooling
Tobacco
: Households Tobacco Budget Share
Wages1
: Wages, Experience and Schooling
annual
daily
four–weekly
Icecream
: Ice Cream Consumption
monthly
CRSPmon
: Monthly Returns from the CRSP Database
Capm
: Stock Market Data
Forward
: Exchange Rates of US Dollar Against Other Currencies
Irates
: Monthly Interest Rates
Mishkin
: Inflation and Interest Rates
Orange
: The Orange Juice Data Set
PPP
: Exchange Rates and Price Indices for France and Italy
Pricing
: Returns of Size-based Portfolios
StrikeNb
: Number of Strikes in Us Manufacturing
quarterly
Consumption
: Quarterly Data on Consumption and Expenditure
Hstarts
: Housing Starts
IncomeUK
: Seasonally Unadjusted Quarterly Data on Disposable Income and Expenditure
MW
: Growth of Disposable Income and Treasury Bill Rate
Macrodat
: Macroeconomic Time Series for the United States
Money
: Money, GDP and Interest Rate in Canada
MoneyUS
: Macroeconomic Series for the United States
Tbrate
: Interest Rate, GDP and Inflation
weekly
monthly observations from 1946–12 to 1991–02
number of observations : 531
observation : country
country : United–States
data(Irates)
data(Irates)
A time series containing :
interest rate for a maturity of 1 months (% per year).
interest rate for a maturity of 2 months (% per year).
interest rate for a maturity of 3 months (% per year).
interest rate for a maturity of 5 months (% per year).
interest rate for a maturity of 6 months (% per year).
interest rate for a maturity of 11 months (% per year).
interest rate for a maturity of 12 months (% per year).
interest rate for a maturity of 36 months (% per year).
interest rate for a maturity of 60 months (% per year).
interest rate for a maturity of 120 months (% per year).
McCulloch, J.H. and H.C. Kwon (1993) U.S. term structure data, 1947–1991, Ohio State Working Paper 93-6, Ohio State University, Columbus.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 8.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 2000
number of observations : 180
observation : goods
data(Journals)
data(Journals)
A dataframe containing :
journal title
publisher
scholarly society ?
library subscription price
number of pages
characters per page
total number of citations
year journal was founded
number of library subscriptions
field description
Professor Theodore Bergstrom of the Department of Economics at the University of California, San Diego.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 6.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 1827
observation : individuals
country : Australia
data(Kakadu)
data(Kakadu)
A dataframe containing :
lower bound of willingness to pay, 0 if observation is left censored
upper bound of willingness to pay, 999 if observation is right censored
an ordered factor with levels
nn
(respondent answers no, no),
ny
(respondent answers no, yes or yes, no),
yy
(respondent answers yes, yes)
the greatest value of national parks and nature reserves is in recreation activities (from 1 to 5)
jobs are the most important thing in deciding how to use our natural resources (from 1 to 5)
development should be allowed to proceed where environmental damage from activities such as mining is possible but very unlikely (from 1 to 5)
it's important to have places where wildlife is preserved (from 1 to 5)
it's important to consider future generations (from 1 to 5)
in deciding how to use areas such as Kakadu national park, their importance to the local aboriginal people should be a major factor (from 1 to 5)
in deciding how to use our natural resources such as mineral deposits and forests, the most important thing is the financial benefits for Australia (from 1 to 5)
if areas within natural parks are set aside for development projects such as mining, the value of the parks is greatly reduced (from 1 to 5)
there should be more national parks created from state forests (from 1 to 5)
the government pays little attention to the people in making decisions (from 1 to 4)
the respondent recycles things such as paper or glass and regularly buys unbleached toilet paper or environmentally friendly products?
the respondent has visited a national park or bushland recreation area in the previous 12 months?
the respondent watches TV programs about the environment? (from 1 to 9)
the respondent is member of a conservation organization?
male,female
age
years of schooling
respondent's income in thousands of dollars
the respondent received the major–impact scenario of the Kakadu conservation zone survey ?
Werner, Megan (1999) “Allowing for zeros in dichotomous–choice contingent–valuation models”, Journal of Business and Economic Statistics, 17(4), October, 479–486.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 4956
observation : individuals
country : United States
data(Ketchup)
data(Ketchup)
A dataframe containing :
individuals identifiers
purchase identifiers
one of heinz
, hunts
,
delmonte
, stb
(store
brand)
price of brand z
Kim, Byong–Do, Robert C. Blattberg and Peter E. Rossi (1995) “Modeling the distribution of price sensitivity and implications for optimal retail pricing”, Journal of Business Economics and Statistics, 13(3), 291.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Catsup
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
annual observations from 1920 to 1941
number of observations : 22
observation : country
country : United States
data(Klein)
data(Klein)
A time series containing :
consumption
corporate profits
private wage bill
investment
previous year's capital stock
GNP
government wage bill
government spending
taxes
Klein, L. (1950) Economic fluctuations in the United States, 1921-1941, New York, John Wiley and Sons.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F15.1.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 532 observations from 1979 to 1988
number of observations : 5320
data(LaborSupply)
data(LaborSupply)
A dataframe containing :
log of annual hours worked
log of hourly wage
number of children
age
bad health
id
year
Ziliak, Jim (1997) “Efficient Estimation With Panel Data when Instruments are Predetermined: An Empirical Comparison of Moment-Condition Estimators”, Journal of Business and Economic Statistics, 419–431.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 708–15, 754–6.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1996
number of observations : 569
observation : production units
country : Belgium
data(Labour)
data(Labour)
A dataframe containing :
total fixed assets, end of 1995 (in 1000000 euro)
number of workers (employment)
value added (in 1000000 euro)
wage costs per worker (in 1000 euro)
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 4.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
annual observations from 1947 to 1962
number of observations : 16
observation : country
country : United States
data(Longley)
data(Longley)
A time series containing :
employment (1,000s)
GNP deflator
nominal GNP (millions)
armed forces
Longley, J. (1967) “An appraisal of least squares programs from the point of view of the user”, Journal of the American Statistical Association, 62, 819-841.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F4.2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
annual observations from 1791 to 1990
number of observations : 200
observation : country
country : United Kingdom
data(LT)
data(LT)
A time series containing :
US *Dollar / *Pound exchange rate
US wholesale price index, normalized to 100 for 1914
US wholesale price index, normalized to 100 for 1914
Lothian, J. and M. Taylor (1996) “Real exchange rate behavior: the recent float from the perspective of the past two centuries”, Journal of Political Economy, 104, 488-509.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 9, 613-621.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
quarterly observations from 1959-1 to 2000-4
number of observations : 168
observation : country
country : United States
data(Macrodat)
data(Macrodat)
A time series containing :
unemployment rate (average of months in quarter)
CPI (Average of Months in Quarter)
federal funds interest rate (last month in quarter)
3 month treasury bill interest rate (last month in quarter)
1 year treasury bond interest rate (last month in quarter)
dollar / Pound exchange rate (last month in quarter)
real GDP for Japan
Bureau of Labor Statistics, OECD, Federal Reserve.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 12 and 14.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 545 observations from 1980 to 1987
number of observations : 4360
observation : individuals
country : United States
data(Males)
data(Males)
A dataframe containing :
identifier
year
years of schooling
years of experience (=age-6-school)
wage set by collective bargaining ?
a factor with levels (black
,
hisp
, other
)
married ?
health problem ?
log of hourly wage
a factor with 12 levels
a factor with 9 levels
a factor with levels (rural area
,
north east
, northern central
,
south
)
National Longitudinal Survey (NLS Youth Sample).
Vella, F. and M. Verbeek (1998) “Whose wages do unions raise ? A dynamic model of unionism and wage”, Journal of Applied Econometrics, 13, 163–183.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 10.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
annual observations from 1947 to 1971
number of observations : 25
observation : country
country : United States
data(ManufCost)
data(ManufCost)
A time series containing :
cost index
capital cost share
labor cost share
energy cost share
materials cost share
capital price
labor price
energy price
materials price
Berndt, E. and D. Wood (1975) “Technology, prices and the derived demand for energy”, Journal of Economics and Statistics, 57, 376-384.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F14.1.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1983 to 1986
number of observations : 609
observation : individuals
country : United States
data(Mathlevel)
data(Mathlevel)
A dataframe containing :
highest level of math attained , an ordered factor with levels 170, 171a, 172, 171b, 172b, 221a, 221b
sat Math score
foreign language proficiency ?
male, female
one of other
, eco
, oss
(other social sciences), ns
(natural
sciences), hum
(humanities)
number of courses in advanced math (0 to 3)
number of courses in physics (0 to 2)
number of courses in chemistry (0 to 2)
Butler, J.S., T. Aldrich Finegan and John J. Siegfried (1998) “Does more calculus improve student learning in intermediate micro and macroeconomic theory ?”, Journal of Applied Econometrics, 13(2), April, 185–202.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1997-1998
number of observations : 220
observation : schools
country : United States
data(MCAS)
data(MCAS)
A dataframe containing :
district code (numerical)
municipality (name)
district name
spending per pupil, regular
spending per pupil, special needs
spending per pupil, bilingual
spending per pupil, occupational
spending per pupil, total
students per computer
special education students
eligible for free or reduced price lunch
students per teacher
per capita income
4th grade score (math+english+science
)
8th grade score (math+english+science
)
average teacher salary
percent English learners
Massachusetts Comprehensive Assessment System (MCAS), Massachusetts Department of Education, 1990 U.S. Census.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 7.
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/
number of observations : 5574
data(MedExp)
data(MedExp)
A time series containing :
annual medical expenditures in constant dollars excluding dental and outpatient mental
log(coinsrate+1)
where
coinsurance rate is 0 to 100
individual deductible plan ?
log
(annual participation
incentive payment) or 0 if no payment
log
(max(medical deductible
expenditure)) if IDP
=1 and
MDE
>1 or 0 otherwise
physical limitation ?
number of chronic diseases
self–rate health (excellent,good,fair,poor)
log of annual family income (in $)
log of family size
years of schooling of household head
exact age
sex (male,female)
age less than 18 ?
is household head black ?
Deb, P. and P.K. Trivedi (2002) “The Structure of Demand for Medical Care: Latent Class versus Two-Part Models”, Journal of Health Economics, 21, 601–625.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge.
DoctorContacts
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a cross-section
number of observations : 27
observation : regional
country : United States
data(Metal)
data(Metal)
A dataframe containing :
output
labor input
capital input
Aigner, D., K. Lovell and P. Schmidt (1977) “Formulation and estimation of stochastic frontier production models”, Journal of Econometrics, 6, 21-37.
Hildebrand, G. and T. Liu (1957) Manufacturing production functions in the United States, Ithaca, N.Y.: Cornell University Press.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F6.1.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
monthly observations from 1950-2 to 1990-12
number of observations : 491
observation : country
country : United States
data(Mishkin)
data(Mishkin)
A time series containing :
one-month inflation rate (in percent, annual rate)
three-month inflation rate (in percent, annual rate)
one-month T-bill rate (in percent, annual rate)
three-month T-bill rate (in percent, annual rate)
CPI for urban consumers, all items (the 1982-1984 average is set to 100)
Mishkin, F. (1992) “Is the Fisher effect for real ?”, Journal of Monetary Economics, 30, 195-215.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 2, 176-184.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section
number of observations : 453
observation : individuals
data(Mode)
data(Mode)
A dataframe containing :
one of car, carpool, bus or rail
cost of mode z
time of mode z
Kenneth Train's home page : https://eml.berkeley.edu/~train/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 840
observation : individuals
country : Australia
data(ModeChoice)
data(ModeChoice)
A dataframe containing :
choice : air, train, bus or car
terminal waiting cost time, 0 for car
in vehicle cost-cost component
travel time in vehicle
generalized cost measure
household income
party size in mode chosen
Greene, W.H. and D. Hensher (1997) Multinomial logit and discrete choice models in Greene, W. H. (1997) LIMDEP version 7.0 user's manual revised, Plainview, New York econometric software, Inc .
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F21.2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1982
number of observations : 50
observation : country
country : United States
data(Mofa)
data(Mofa)
A dataframe containing :
capital expenditures made by the MOFAs of nonbank U.S. corporations in finance, insurance and real estate. Source: "U.S. Direct Investment Abroad: 1982 Benchmark Survey data." Table III.C 6.
gross domestic product. Source: "World Bank, World Development Report 1984." Table 3. (This variable is scaled by a factor of 1/100,000)
sales made by the majority owned foreign affiliates of nonbank U.S. parents in finance, insurance and real estate. Source: "U.S. Direct Investment Abroad: 1982 Benchmark Survey Data." Table III.D 3. (This variable is scaled by a factor of 1/100)
the number of U.S. affiliates in the host country. Source: "U.S. Direct Investment Abroad: 1982 Benchmark Survey Data." Table 5. (This variable is scaled by a factor of 1/100)
net income earned by MOFAs of nonbank U.S. corporations operating in the nonbanking financial sector of the host country. Source: "U.S. Direct Investment Abroad: 1982 Benchmark Survey Data." Table III.D 6.(This variable is scaled by a factor of 1/10)
Ioannatos, Petros E. (1995) “Censored regression estimation under unobserved heterogeneity : a stochastic parameter approach”, Journal of Business and Economics Statistics, 13(3), July, 327–335.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
quarterly observations from 1967-1 to 1998-4
number of observations : 128
observation : country
country : Canada
data(Money)
data(Money)
A time series containing :
log of the real money supply
the log of GDP, in 1992 dollars, seasonally adjusted
the log of the price level
the 3-month treasury till rate
CANSIM Database of Statistics Canada.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 7 and 8.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
quarterly observations from 1954–01 to 1994–12
number of observations : 164
country : United States
data(MoneyUS)
data(MoneyUS)
A time series containing :
log of real M1 money stock
quarterly inflation rate (change in log prices), % per year
commercial paper rate, % per year
log real GDP (in billions of 1987 dollars)
treasury bill rate
Hoffman, D.L. and R.H. Rasche (1996) “Assessing forecast performance in a cointegrated system”, Journal of Applied Econometrics, 11, 495–517.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 9.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
annual observations from 1900 to 1989
number of observations : 90
observation : country
country : United States
data(Mpyr)
data(Mpyr)
A time series containing :
natural log of M1
natural log of the net national product price deflator
natural log of the net national product
the commercial paper rate in percent at an annual rate
Stock, J. and M. Watson (1999) “Testing for common trends”, Journal of the American Statistical Association, 83, 1097-1107.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 10, 665-667.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section
number of observations : 753
observation : individuals
country : United States
data(Mroz)
data(Mroz)
A dataframe containing :
work at home in 1975?
(Same carData::Mroz[['lfp']]
=
labor force participation.)
wife's hours of work in 1975
number of children less than 6 years old
in household (Same as
carData::Mroz['k5']
.)
number of children between ages 6 and 18
in household (Same as
carData::Mroz['k618']
)
wife's age
wife's educational attainment, in years
wife's average hourly earnings, in 1975 dollars
wife's wage reported at the time of the 1976 interview (not= 1975 estimated wage)
husband's hours worked in 1975
husband's age
husband's educational attainment, in years
husband's wage, in 1975 dollars
family income, in 1975 dollars
wife's mother's educational attainment, in years
wife's father's educational attainment, in years
unemployment rate in county of residence, in percentage points
lives in large city (SMSA) ?
actual years of wife's previous labor market experience
These data seem to have come from the same source
as carData::Mroz
, though each data set has
variables not in the other. The variables that
are shared have different names.
On 2019-11-04 Bruno Rodrigues explained that
Ecdat::Mroz['work']
had the two labels
incorrectly swapped, and
wooldridge::mroz['inlf']
was correct;
wooldridge
matches
carData::Mroz['lfp']
.
Mroz, T. (1987) “The sensitivity of an empirical model of married women's hours of work to economic and statistical assumptions”, Econometrica, 55, 765-799.
1976 Panel Study of Income Dynamics.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F4.1.
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Mroz
mroz
head(Mroz) #If 'car' and / or 'carData' is also in the path, # then use the following to be clear that # you want this version: head(Ecdat::Mroz)
head(Mroz) #If 'car' and / or 'carData' is also in the path, # then use the following to be clear that # you want this version: head(Ecdat::Mroz)
a panel of 265 observations from 1979 to 1987
number of observations : 2385
observation : regional
country : Sweden
data(MunExp)
data(MunExp)
A dataframe containing :
identification
date
expenditure
revenue from taxes and fees
grants from Central Government
Dahlberg, M. and E. Johansson (2000) “An examination of the dynamic behavior of local government using GMM
boot-strapping methods”, Journal of Applied Econometrics, 21, 333-355.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F18.1.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
quarterly observations from 1963-3 to 1975-4
number of observations : 50
observation : country
country : United States
data(MW)
data(MW)
A time series containing :
the rate of growth of real U.S. disposable income, seasonally adjusted
the U.S. treasury bill rate
MacKinnon, J. G. and H. T. White (1985) “Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties”, Journal of Econometrics, 29, 305-325.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 5.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1987
number of observations : 312
observation : individuals
country : Portugal
data(NaturalPark)
data(NaturalPark)
A dataframe containing :
initial bid, in euro
higher bid
lower bid
a factor with levels (nn
,
ny
, yn
, yy
)
age in 6 classes
a factor with levels (male,female)
income in 8 classes
Nunes, Paulo (2000) Contingent Valuation of the Benefits of natural areas and its warmglow component, PhD thesis 133, FETEW
, KU Leuven.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 7.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1955 to 1955
number of observations : 159
observation : production units
country : United States
data(Nerlove)
data(Nerlove)
A dataframe containing :
total cost
total output
wage rate
cost share for labor
capital price index
cost share for capital
fuel price
cost share for fuel
Nerlove, M. (1963) Returns to scale in electricity industry in Christ, C. ed. (1963) Measurement in Economics: Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld , Stanford, California, Stanford University Press .
Christensen, L. and W. H. Greene (1976) “Economies of scale in U.S. electric power generation”, Journal of Political Economy, 84, 655-676.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometricanaly0000gree_f4x3, Table F14.2.
Hayashi, F. (2000) Econometrics, Princeton University Press, https://archive.org/details/econometrics0000haya, chapter 1, 76-84.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
A data.frame
describing
names containing character codes rare or
non-existent in standard English text,
e.g., with various accent marks that may
not be coded consistently in different
locales or by different software.
data(nonEnglishNames)
data(nonEnglishNames)
A data.frame
with two columns:
a character vector containing names that often have non-standard characters with the non-standard characters replaced by "_"
a character vector containing a standard English-character
translation of nonEnglish
grepNonStandardCharacters
,
subNonStandardCharacters
data(nonEnglishNames) all.equal(ncol(nonEnglishNames), 2)
data(nonEnglishNames) all.equal(ncol(nonEnglishNames), 2)
Data on the 9 nuclear-weapon states as of April 2019.
data(nuclearWeaponStates)
data(nuclearWeaponStates)
A dataframe containing :
The name of the country (character). The former USSR is listed here as Russia.
ISO 31661- alpha-2 two-letter country codes (character).
Date of first test of a nuclear weapon.
For Israel, which has not publicly acknowledged that it has nuclear weapons, this uses the Date of the Vela Incident.
lubridate::decimal_date(firstTest)
c(NA, diff(firstTestYr))
number of nuclear weapons
number of weapons for which the yield in
(nYieldNA)
= unknown or variable,
(nLowYield)
= at most 15 kt
(kilotons), the size of the Hiroshima
bomb, (nMidYield)
= greater than 15
but less that 50 kt, and (nHighYield)
= at least 50 kt.
popM
= estimated population in millions
for year popYr
, per the Wikipedia
article for the indicated country on 2020-02-05.
GDP_B
= nominal Gross Domestic Product
in billions of US dollars for year
GDPyr
, per the Wikipedia article for
the indicated country on 2020-02-05.
Country code used by the Maddison Project.
Estimated date of the substantive commitment of the country to obtain nuclear weapons. See 'Details' below
lubridate::decimal_date(startNucPgm)
Most of the contents of this dataset are
easily defined and not controversial. That's
not true for the date upon which each country
started its nuclear program, coded in
startNucPgm
and startNucPgmYr
.
The following summarizes the rationale behind
the selection of the date for each country in
this dataset.
US
The Manhattan Project started in stages. It was officially brought to the attention of the US government by a letter officially from Albert Einstein to US President Roosevelt, 1939-08-02. It was officially authorized 1942-01-19. We use this later date as the date of the start of the US nuclear-weapons program.
RU
Russian scientists were studying uranium before the first world war but didn't get much official attention until the atomic bombing of Hiroshima, 1945-08-06. Shortly thereafter on 1945-08-22, Stalin appointed Lavrentiy Beria. Beria was a able administrator and guided the project to fruition in four years.
GB
British scientists were among the leaders in nuclear technology in the late nineteenth century. They welcomed German-Jewish physicists Otto Frisch and Rudolf Peierls, who estimated in 1939 that only a few pounds or kilograms of uranium-235 might be enough to achieve a critical mass, whereas several tonnes of natural uranium would likely be required. Because of the war, this information was passed to scientists in the United States, who developed it into the bomb dropped on Hiroshima 1945-08-06, with help from British and Canadian scientists and Canadian industry. After the war, the US refused to share much of the information developed in the Manhattan Project with the British. British elites felt disrespected by US. On 1947-01-08, the British government decided to initiate their own nuclear-weapons program.
FR
France was one of the nuclear pioneers, going back to the work of Marie Curie and Henri Becquerel in the 1890s. In 1956 the French were deeply offended by the refusal of the US to support them in the Suez Crisis. On France and Israel secretly agreed to collaborate in the development of nuclear weapons.
CN
Mao Zedong reportedly decided to begin a Chinese nuclear-weapons program during the First Taiwan Strait Crisis of 1954–1955. That crisis was resolved shortly after 1955-04-23, when China stated it was willing to negotiate. We use this as the date of the start of China's nuclear weapons program.
IN
Indian scientists started research on nuclear weapons before Indian independence but didn't make a substantive commitment to actually making a nuclear weapon until they lost territory to China in the Sino-Indian War that ended 1962-11-21. We use that date as the date for the initiation of India's nuclear-weapons program.
IL
Israel's first Prime Minister David Ben-Gurion was reportedly "nearly obsessed" with obtaining nuclear weapons to prevent the Holocaust from recurring. For present purposes, we use 1949-03-10, the date of the end of the 1948 Arab–Israeli War, as the beginning of Israel's nuclear-weapons program.
PK
Pakistan's elite were totally humiliated by their defeat in the Indo-Pakistani War of 1971, 1971-12-03 / -16: That war ended the Bangladesh Liberation War, by which Pakistan lost over half their population and 14 percent of their land area. Prime Minister Zulfiqar Ali Bhutto compared Pakistan's surrender to the Treaty of Versailles, which Germany was forced to sign in 1919. Bhutto observed 1972-01-20 that a Pakistani scientist had been part of the Manhattan Project, and Pakistani scientists could do the same in Pakistan. While significant funding seemed not to have come until later, 1972-01-20 is the date we will use here for the beginning of Pakistan's nuclear-weapons program.
KP
The 1950-1953 Korean War ended with a cease-fire, not an official end to hostilities. Since then North Korea has perceived nuclear threats from the US. In 1956 the Soviet Union began giving North Korean scientists and engineers "basic knowledge" to help them initiate a nuclear program. About 1962, North Korea committed itself to what it called "all-fortressization", which was the beginning of the hyper-militarized North Korea of today. North Korea reportedly asked the Soviet Union for help with a nuclear weapons program in 1963 and was turned down. China turned down similar requests in 1964 and 1974. Around 1980 North Korea began mining its own supplies of uranium and building its own factory to produce yellowcake. (See also Bolton, 2012.) For lack of something better, we use 1980-01-01 as the start of North Korea's nuclear weapons program. They clearly wanted nuclear weapons much earlier but didn't seem to move seriously in the direction of developing nuclear weapons until around
Overview from World Nuclear Weapon Stockpile
firstTest
from Wikipedia, "List of states with nuclear weapons"
US from Hans M. Kristensen & Robert S. Norris (2018) United States nuclear forces,2018, Bulletin of the Atomic Scientists, 74:2, 120-131, doi:10.1080/00963402.2018.1438219
Russia from Hans M. Kristensen & Matt Korda (2019) Russian nuclear forces, 2019, Bulletin of the Atomic Scientists, 75:2, 73-84, doi:10.1080/00963402.2019.1580891
UK from Robert S. Norris and Hans M. Kristensen (2013) The British nuclear stockpile, 1953-2013, Bulletin of the Atomic Scientists, 69:4, 69-75s, doi:10.1177/0096340213493260
France from Robert S. Norris & Hans M. Kristensen (2008) French nuclear forces, 2008, Bulletin of the Atomic Scientists, 64:4, 52-54, 57, doi:10.2968/064004012
China from Hans M. Kristensen & Robert S. Norris (2018) Chinese nuclear forces, 2018,Bulletin of the Atomic Scientists, 74:4, 289-295, doi:10.1080/00963402.2018.1486620
India from Hans M. Kristensen & Robert S. Norris (2017) Indian nuclear forces, 2017,Bulletin of the Atomic Scientists, 73:4, 205-209, doi:10.1080/00963402.2017.1337998
Israel from Hans M. Kristensen and Robert S. Norris (2014) Israeli nuclear weapons, 2014, Bulletin of the Atomic Scientists, 70:6, 97-115, doi:10.1177/0096340214555409
Pakistan from Hans M. Kristensen, Robert S. Norris & Julia Diamond (2018)Pakistani nuclear forces, 2018, Bulletin of the Atomic Scientists, 74:5, 348-358, doi:10.1080/00963402.2018.1507796
North Korea from Hans M. Kristensen & Robert S. Norris (2018) North Korean nuclear capabilities, 2018, Bulletin of the Atomic Scientists, 74:1, 41-51, doi:10.1080/00963402.2017.1413062
Derek Bolton (2012) North Korea's Nuclear Program (2012-08, American Security Program, accessed 2020-07-15) https://www.americansecurityproject.org/ASP%20Reports/Ref%200072%20-%20North%20Korea%E2%80%99s%20Nuclear%20Program%20.pdf
data(nuclearWeaponStates) plot(yearsSinceLastFirstTest~firstTest, nuclearWeaponStates, type='h', xlab='', ylab='') with(nuclearWeaponStates, text(firstTest, yearsSinceLastFirstTest, ctry))
data(nuclearWeaponStates) plot(yearsSinceLastFirstTest~firstTest, nuclearWeaponStates, type='h', xlab='', ylab='') with(nuclearWeaponStates, text(firstTest, yearsSinceLastFirstTest, ctry))
Proportion of the US population in each of the
283 OCC1950
occupation codes for each year
in the
Integrated Public Use Microdata Series (IPUMS) - US database.
data("OCC1950")
data("OCC1950")
A matrix
with one row for each of
281 OCC1950 occupation codes in IPUMS-US and one
column for each year in their dataset as of
2020-03-17, being
c(1850:1880, 1900:2000, 2001:2016)
.
This dataset was created using the code in the
IPUMS vignette in the Ecfun
package using
tapply(HHWT, IPUMSdata[c("OCC1950", "YEAR")], sum)
,
then normalizing so the total for each year was 1.
In fact a plot of the sums for each year of
HHWT
were close to the
USGDPpresidents$population.K*1000
except
for 1970, when they were double.
Universe Note from the IPUMS documentation for
their variable OCC1950
: "New Workers" are
persons seeking employment for the first time, who
had not yet secured their first job.
OCC1950 applies the 1950 Census Bureau occupational classification system to occupational data, to enhance comparability across years. For pre-1940 samples created at the University of Minnesota, the alphabetic responses supplied by enumerators were directly coded into the 1950 classification. For other samples, the information in the variable OCC was recoded into the 1950 classification. Codes above 970 are non-occupational responses retained in the historical census samples or blank/unknown. The design of OCC1950 is described at length in "Integrated Occupation and Industry Codes and Occupational Standing Variables in the IPUMS.". The composition of the 1950 occupation categories is described in detail in U.S. Bureau of the Census, Alphabetic Index of Occupations and Industries: 1950 (Washington D.C., 1950).
In 1850-1880, any laborer with no specified industry in a household with a farmer is recoded into farm labor. In 1860-1900, any woman with an occupational response of "housekeeper" enters the non-occupational category "keeping house" if she is related to the head of household. Cases affected by these imputation procedures are identified by an appropriate data quality flag (present in the raw IPUMS data but ignored for this summary).
A parallel variable called OCC1990, available for the samples from 1950 onward, codes occupations into a simplified version of the 1990 occupational coding scheme." [OCC1990 was ignored for the present purposes, because it is not coded for data prior to 1950.]
NOTE: In the 2020-03-17 extraction, there were 283 OCC1950 codes documented, but only 291 of them were actually in the data I got. The codes for "Not yet classified" and "New Workers" were not used.
Steven Ruggles, Sarah Flood, Ronald Goeken, Josiah Grover, Erin Meyer, Jose Pacas, and Matthew Sobek (2020) doi:10.18128/D010.V10.0 IPUMS USA: Version 10.0 [dataset]. Minneapolis, MN: IPUMS.
data(OCC1950)
data(OCC1950)
a cross-section
number of observations : 4406
observation : individuals
country : United States
data(OFP)
data(OFP)
A dataframe containing :
number of physician office visits
number of nonphysician office visits
number of physician outpatient visits
number of nonphysician outpatient visits
number of emergency room visits
number of hospitalizations
number of chronic conditions
the person has a condition that limits activities of daily living ?
age in years (divided by 10)
is the person African–American ?
is the person male ?
is the person married ?
number of years of education
family income in 10000$
is the person employed ?
is the person covered by private health insurance?
is the person covered by medicaid ?
the region (noreast
, midwest
,
west
)
self-perceived health (excellent, poor, other)
Deb, P. and P.K. Trivedi (1997) “Demand for Medical Care by the Elderly: A Finite Mixture Approach”, Journal of Applied Econometrics, 12, 313-326..
Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html, chapter 6.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1969 to 1992
number of observations : 53
observation : production units
country : United Kingdom
data(Oil)
data(Oil)
A dataframe containing :
duration of the appraisal lag in months (time span between discovery of an oil field and beginning of development, i.e. approval of annex B).
size of recoverable reserves in millions of barrels
depth of the sea in metres
size of recoverable gas reserves in billions of cubic feet
equity market value (in 1991 million pounds) of the company operating the oil field
real after–tax oil price measured at time of annex B approval
volatility of the real oil price process
measured as the squared recursive standard
errors of the regression of
pt-pt-1
on a constant
adaptive expectations (with parameter theta=0.97) for the real after–tax oil prices formed at the time of annex B approval
volatility of the adaptive expectations
(with parameter theta=0.97) for real after
tax oil prices measured as the squared
recursive standard errors of the regression
of pt
on pte(theta)
adaptive expectations (with parameter theta=0.98) for the real after–tax oil prices formed at the time of annex B approval
volatility of the adaptive expectations
(with parameter theta=0.98) for real after
tax oil prices measured as the squared
recursive standard errors of the regression
of pt
on pte(theta)
Favero, Carlo A., M. Hashem Pesaran and Sunil Sharma (1994) “A duration model of irreversible oil investment : theory and empirical evidence”, Journal of Applied Econometrics, 9(S), S95–S112.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
monthly observations from 1948-01 to 2001-06
number of observations : 642
observation : country
country : United States
data(Orange)
data(Orange)
A time series containing :
producer price for frozen orange juice
producer price index for finished goods
freezing degree days (from daily minimum temperature recorded at Orlando area airports)
U.S. Bureau of Labor Statistics for PPIOJ
and PWFSA
, National Oceanic and Atmospheric Administration (NOAA) of the U.S Department of Commerce for fdd
.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section
number of observations : 872
observation : individuals
country : Switzerland
data(Participation)
data(Participation)
A dataframe containing :
labour force participation ?
the log of nonlabour income
age in years divided by 10
years of formal education
the number of young children (younger than 7)
number of older children
foreigner ?
Gerfin, Michael (1996) “Parametric and semiparametric estimation of the binary response”, Journal of Applied Econometrics, 11(3), 321-340.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 11.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a panel of 346 observations from 1975 to 1979
number of observations : 1730
observation : production units
country : United States
data(PatentsHGH)
data(PatentsHGH)
A dataframe containing :
firm index
year
Compustat's identifying number for the firm (Committee on Uniform Security Identification Procedures number)
a two-digit code for the applied R&D industrial classification (roughly that in Bound, Cummins, Griliches, Hall, and Jaffe, in the Griliches R&D, Patents, and Productivity volume)
is the firm in the scientific sector ?
the logarithm of the book value of capital in 1972.
the sum of patents applied for between 1972-1979.
the logarithm of R&D spending during the year (in 1972 dollars)
the logarithm of R&D spending (one year lag)
the logarithm of R&D spending (two years lag)
the logarithm of R&D spending (three years lag)
the logarithm of R&D spending (four years lag)
the logarithm of R&D spending (five years lag)
the number of patents applied for during the year that were eventually granted
the number of patents (one year lag)
the number of patents (two years lag)
the number of patents (three years lag)
the number of patents (four years lag)
Hall, Bronwyn, Zvi Griliches and Jerry Hausman (1986) “Patents and R&D: Is There a Lag?”, International Economic Review, 27, 265-283.
Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html, chapter 9.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 792–5.
PatentsRD
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a panel of 181 observations from 1983 to 1991
number of observations : 1629
observation : production units
country : world
data(PatentsRD)
data(PatentsRD)
A dataframe containing :
year
firm's id
firm's main industry sector, one of
aero
(aerospace),
chem
(chemistry),
comput
(computer),
drugs
,
elec
(electricity),
food
,
fuel
(fuel and mining),
glass
,
instr
(instruments),
machin
(machinery),
metals
,
other
,
paper
,
soft
(software),
motor
(motor vehicles)
geographic area, one of
eu
(European
Union),
japan
,
usa
,
rotw
(rest of the world)
numbers of European patent applications
log of R&D expenditures
log of spillovers
Cincer, Michele (1997) “Patents, R & D and technological spillovers at the firm level : some evidence from econometric count models for panel data”, Journal of Applied Econometrics, 12(3), May–June, 265–280.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/. Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 7.
PatentsHGH
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
annual observations from 1800 to 1931
number of observations : 132
observation : country
country : United States
data(PE)
data(PE)
A time series containing :
S&P composite stock price index
S&P composite earnings index
Robert Shiller.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 8.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
Data from McChesney and Nichols (2010) on domestic and international knowledge in Denmark, Finland, the UK and the US among college graduates, people with some college, and roughly 12th grade only.
data(politicalKnowledge)
data(politicalKnowledge)
A data.frame
containing 12 columns and 4 rows.
a character vector of Denmark, Finland, UK, and US, being the four countries compared in this data set.
percent correct answers to calibrated questions
regarding knowledge of prominent items in domestic
news in a survey of residents of the four
countries among college graduates (ending
".c
"), some college (".sc
") and
high school (".hs
"). Source: McChesney
and Nichols (2010, chapter 1, chart 8).
percent correct answers to calibrated
questions regarding knowledge of
prominent items in international news
in a survey of residents of the four
countries by education level as for
DomesticKnowledge
. Source:
McChesney and Nichols (2010, chapter 1,
chart 7).
average of domestic and international knowledge
Per capital spending on public media in 2007 in US dollars from McChesney and Nichols (2010, chapter 4, chart 1)
Spending on public media relative to the US, being
PublicMediaPerCapita / PublicMediaPerCapita[4]
.
Spencer Graves
Robert W. McChesney and John Nichols (2010) The Death and Life of American Journalism (Nation Books)
## ## 1. Combine first 2 rows ## data(politicalKnowledge) pk <- politicalKnowledge[-1,] pk[1, -1] <- ((politicalKnowledge[1, -1] + politicalKnowledge[2, -1])/2) pk[1, 'country'] <- 'DK-FI' ## ## 2. plot ## xlim <- range(pk[, 'PublicMediaPerCapita']) ylim <- 100*range(pk[2:7]) text.cex <- 2 # to label the lines (US.UK <- (pk[2, -1]+pk[3, -1])/2) #png('Knowledge v. public media.png') op <- par(mar=c(5, 7, 4, 2)+.1) plot(c(0, 110), 100*ylim, type='n', axes=FALSE, xlab='public media $ per capita', ylab='Political Knowledge\n(% of standard questions)', cex.lab=2) axis(1, cex.axis=2) axis(2, las=2, cex.axis=2) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.c, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.c, type='b', pch=' ')) with(US.UK, text(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, 'High School\nor less', srt=37, cex=1.5)) with(US.UK, text(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, 'some\ncollege', srt=10.5, cex=1.5)) with(US.UK, text(PublicMediaPerCapita, 100*PoliticalKnowledge.c, "Bachelor's\nor more", srt=-1, cex=1.5)) par(op) #dev.off() ## ## redo for Wikimedia commons ## without English axis labels ## to facilitate multilingual use ## #svg('Knowledge v. public media.svg') op <- par(mar=c(3,3,2,2)+.1) plot(c(0, 110), 100*ylim, type='n', axes=FALSE, xlab='', ylab='', cex.lab=2) axis(1, cex.axis=2) axis(2, las=2, cex.axis=2) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.c, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.c, type='b', pch=' ')) par(op) #dev.off()
## ## 1. Combine first 2 rows ## data(politicalKnowledge) pk <- politicalKnowledge[-1,] pk[1, -1] <- ((politicalKnowledge[1, -1] + politicalKnowledge[2, -1])/2) pk[1, 'country'] <- 'DK-FI' ## ## 2. plot ## xlim <- range(pk[, 'PublicMediaPerCapita']) ylim <- 100*range(pk[2:7]) text.cex <- 2 # to label the lines (US.UK <- (pk[2, -1]+pk[3, -1])/2) #png('Knowledge v. public media.png') op <- par(mar=c(5, 7, 4, 2)+.1) plot(c(0, 110), 100*ylim, type='n', axes=FALSE, xlab='public media $ per capita', ylab='Political Knowledge\n(% of standard questions)', cex.lab=2) axis(1, cex.axis=2) axis(2, las=2, cex.axis=2) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.c, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.c, type='b', pch=' ')) with(US.UK, text(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, 'High School\nor less', srt=37, cex=1.5)) with(US.UK, text(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, 'some\ncollege', srt=10.5, cex=1.5)) with(US.UK, text(PublicMediaPerCapita, 100*PoliticalKnowledge.c, "Bachelor's\nor more", srt=-1, cex=1.5)) par(op) #dev.off() ## ## redo for Wikimedia commons ## without English axis labels ## to facilitate multilingual use ## #svg('Knowledge v. public media.svg') op <- par(mar=c(3,3,2,2)+.1) plot(c(0, 110), 100*ylim, type='n', axes=FALSE, xlab='', ylab='', cex.lab=2) axis(1, cex.axis=2) axis(2, las=2, cex.axis=2) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, text(PublicMediaPerCapita, 100*PoliticalKnowledge.c, country, cex=text.cex, xpd=NA, col=c('forestgreen', 'orange', 'red'))) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.hs, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.sc, type='b', pch=' ')) with(pk, lines(PublicMediaPerCapita, 100*PoliticalKnowledge.c, type='b', pch=' ')) par(op) #dev.off()
weekly observations from 1975 to 1989
number of observations : 778
observation : country
country : Germany
data(Pound)
data(Pound)
A dataframe containing :
the date of the observation (19850104 is January, 4, 1985)
the ask price of the dollar in units of Pound in the spot market on Friday of the current week
the ask price of the dollar in units of Pound in the 30-day forward market on Friday of the current week
the bid price of the dollar in units of Pound in the spot market on the delivery date on a current forward contract
Bekaert, G. and R. Hodrick (1993) “On biases in the measurement of foreign exchange risk premiums”, Journal of International Money and Finance, 12, 115-138.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 6, 438-443.
DM
,
Yen
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
monthly observations from 1981–01 to 1996–06
number of observations : 186
observation : country
country : France and Italy
data(PPP)
data(PPP)
A time series containing :
log price index Italy
log price index France
log exchange rate France/Italy
consumer price index Italy
consumer price index France
Datastream
.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapters 8 and 9.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
monthly observations from 1959–02 to 1993–11
number of observations : 418
data(Pricing)
data(Pricing)
A time series containing :
monthly return on portfolio 1 (small firms)
monthly return on portfolio 2
monthly return on portfolio 3
monthly return on portfolio 4
monthly return on portfolio 5
monthly return on portfolio 6
monthly return on portfolio 7
monthly return on portfolio 8
monthly return on portfolio 9
monthly return on portfolio 10 (large firms)
risk free rate (return on 3-month T-bill)
real per capita consumption growth based on total US personal consumption expenditures (nondurables and services)
Center for research in security prices.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 5.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 48 observations from 1970 to 1986
number of observations : 816
observation : regional
country : United States
data(Produc)
data(Produc)
A dataframe containing :
the state
the year
private capital stock
highway and streets
water and sewer facilities
other public buildings and structures
public capital
gross state products
labor input measured by the employment in non–agricultural payrolls
state unemployment rate
Munnell, A. (1990) “Why has productivity growth declined? Productivity and public investment”, New England Economic Review, 3–22.
Baltagi, B. H. and N. Pinnoi (1995) “Public capital stock and state productivity growth: further evidence”, Empirical Economics, 20, 351–359.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1993
number of observations : 4856
observation : individuals
country : United States
data(PSID)
data(PSID)
A dataframe containing :
1968 interview number
person number
age of individual
highest grade completed
total labor income
annual work hours
live births to this individual
last known marital status (married, never
married, windowed, divorced, separated,
NA/DF
, no histories)
Panel Survey of Income Dynamics.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 295–300.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a panel of 48 observations from 1970 to 1986
number of observations : 5225
observation : individuals
country : United States
data(RetSchool)
data(RetSchool)
A time series containing :
wage in 1876
grade level in 1976
experience 1n 1976
black ?
lived in south in 1976 ?
lived in SMSA in 1976 ?
region, a factor with levels (un
,
midatl
, enc
, wnc
,
sa
, esc
, wsc
,
m
, p
)
lived in SMSA in 1966 ?
lived with both parents at age 14 ?
lived with mother only at age 14 ?
father has no formal education ?
mother has no formal education ?
mean grade level of father
mean grade level of mother
father's and mother's education, a factor with 9 levels
age in 1976
is any 4-year college nearby ?
Kling, Jeffrey R. (2001) “Interpreting Instrumental Variables Estimates of the Return to Schooling”, Journal of Business and Economic Statistics, 19(3), July, 358–364.
Dehejia, R.H. and S. Wahba (2002) “Propensity-score Matching Methods for Nonexperimental Causal Studies”, Restat, 151–161.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge.
Schooling
,
Treatment
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a cross-section from 1976
number of observations : 3010
observation : individuals
country : United States
data(Schooling)
data(Schooling)
A dataframe containing :
lived in SMSA in 1966 ?
lived in SMSA in 1976 ?
grew up near 2-yr college ?
grew up near 4-yr college ?
grew up near 4-year public college ?
grew up near 4-year private college ?
education in 1976
education in 1966
age in 1976
dad's education (imputed avg if missing)
dad's education imputed ?
mother's education
mom's education imputed ?
lived with mom and dad at age 14 ?
single mom at age 14 ?
step parent at age 14 ?
lived in south in 1966 ?
lived in south in 1976 ?
log wage in 1976 (outliers trimmed)
mom-dad education class (1-9)
black ?
wage in 1976 (raw, cents per hour)
enrolled in 1976 ?
the kww
score
a normed IQ score
married in 1976 ?
library card in home at age 14 ?
experience in 1976
National Longitudinal Survey of Young Men (NLSYM).
Card, D. (1995) Using geographical variation in college proximity to estimate the return to schooling in Christofides, L.N., E.K. Grant and R. Swidinsky (1995) Aspects of labour market behaviour : essays in honour of John Vanderkamp, University of Toronto Press, Toronto.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 5.
RetSchool
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
annual observations from 1909 to 1949
number of observations : 41
observation : country
country : United States
data(Solow)
data(Solow)
A time series containing :
output
capital/labor ratio
index of technology
Solow, R. (1957) “Technical change and the aggregate production function”, Review of Economics and Statistics, 39, 312-320.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometrics0000haya, Table F7.2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1980
number of observations : 659
observation : individuals
country : United States
data(Somerville)
data(Somerville)
A dataframe containing :
annual number of visits to lake Somerville
quality ranking score for lake Somerville
engaged in water–skiing at the lake ?
annual household income
annual user fee paid at lake Somerville ?
expenditures when visiting lake Conroe
expenditures when visiting lake Somerville
expenditures when visiting lake Houston
Seller, Christine, John R. Stoll and Jean–Paul Chavas (1985) “Valuation of empirical measures of welfare change : a comparison of nonmarket techniques”, Land Economics, 61(2), May, 156–175.
Gurmu, Shiferaw and Pravin K. Trivedi (1996) “ Excess zeros in count models for recreational trips”, Journal of Business and Economics Statistics, 14(4), October, 469–477.
Santos Silva, Jao M. C. (2001) “A score test for non–nested hypotheses with applications to discrete data models”, Journal of Applied Econometrics, 16(5), 577–597.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20. Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html, chapter 6.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
daily observations from 1981–01 to 1991–04
number of observations : 2783
data(SP500)
data(SP500)
A dataframe containing :
daily return S&P500 (change in log index)
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1985-89
number of observations : 5748
observation : individuals
country : United States
data(Star)
data(Star)
A dataframe containing :
total math scaled score
total reading scaled score
type of class, a factor with levels (regular,small.class,regular.with.aide)
years of total teaching experience
a factor with levels (boy,girl)
qualified for free lunch ?
a factor with levels (white,black,other)
school indicator variable
Project STAR:
Description from 2001-06-02. Description from 2011-06-18.
Stock, James H. and Mark W. Watson (2003) Introduction to Econometrics, Addison-Wesley Educational Publishers, chapter 11.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1968 to 1976
number of observations : 62
country : United States
data(Strike)
data(Strike)
A dataframe containing :
strike duration in days
unanticipated output
Kennan, J. (1985) “The duration of contract strikes in U.S. manufacturing”, Journal of Econometrics, 28, 5-28.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometrics0000haya, Table F22.1.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1968 to 1976
number of observations : 566
country : United States
data(StrikeDur)
data(StrikeDur)
A dataframe containing :
duration of the strike in days
measure of stage of business cycle (deviation of monthly log industrial production in manufacturing from prediction from OLS on time, time-squared and monthly dummies)
Kennan, J. (1985) “The Duration of Contract strikes in U.S. Manufacturing”, Journal of Econometrics, 28, 5-28.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 574–5 and 582.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
monthly observations from 1968(1) to 1976 (12)
number of observations : 108
observation : country
country : United States
data(StrikeNb)
data(StrikeNb)
A time series containing :
number of strikes (number of contract strikes in U.S. manufacturing beginning each month)
level of economic activity (measured as cyclical departure of aggregate production from its trend level)
a time trend from 1 to 108
Kennan, J. (1985) “The Duration of Contract strikes in U.S. Manufacturing”, Journal of Econometrics, 28, 5-28.
Cameron, A.C. and Trivedi P.K. (1990) “Regression Based Tests for Overdispersion in the Poisson Model”, Journal of Econometrics, December, 347-364.
Cameron, A.C. and Trivedi P.K. (1998) Regression analysis of count data, Cambridge University Press, http://cameron.econ.ucdavis.edu/racd/racddata.html, chapter 7.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 125 observations from 1960 to 1985
number of observations : 3250
observation : country
country : World
data(SumHes)
data(SumHes)
A dataframe containing :
the year
the country name (factor)
OPEC member ?
communist regime ?
country's population (in thousands)
real GDP per capita (in 1985 US dollars)
saving rate (in percent)
Summers, R. and A. Heston (1991) “The Penn world table (mark 5): an expanded set of international comparisons, 1950-1988”, Quarterly Journal of Economics, 29, 229-256.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 5, 358-363.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
quarterly observations from 1950-1 to 1996-4
number of observations : 188
observation : country
country : Canada
data(Tbrate)
data(Tbrate)
A time series containing :
the 91-day treasury bill rate
the log of real GDP
the inflation rate
CANSIM database of Statistics Canada.
Davidson, R. and James G. MacKinnon (2004) Econometric Theory and Methods, New York, Oxford University Press, chapter 2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
The Global Terrorism Database (GTD) "is a database of incidents of terrorism from 1970 onward". Through 2020, this database contains information on 209,706 incidents.
terrorism
provides a few summary
statistics along with an ordered
factor methodology
, which
Pape et al.
insisted is necessary, because an increase
of over 70 percent in suicide terrorism
between 2007 and 2013 is best explained by
a methodology change in GTD that occurred
on 2011-11-01; Pape's own
Suicide Attack Database
showed a 19 percent decrease over
the same period.
data(terrorism) data(incidents.byCountryYr) data(nkill.byCountryYr)
data(terrorism) data(incidents.byCountryYr) data(nkill.byCountryYr)
incidents.byCountryYr
and
nkill.byCountryYr
are matrices giving
the numbers of incidents and numbers of deaths
by year and by location of the event for 204
countries (rows) and for all years between
1970 and 2060 (columns) except for 1993, for
which the entries are all NA, because the raw
data previously collected was lost (though
the total for that year is available in
the data.frame
terrorism
).
NOTES:
1. For nkill.byCountryYr
and for
terrorism[c('nkill', 'nkill.us')]
, NAs
in GTD were treated as 0. Thus the actual
number of deaths were likely higher, unless
this was more than offset by incidents being
classified as terrorism, when they should not
have been.
2. incidents.byCountryYr
and
nkill.byCountryYr
are NA for 1993,
because the GTD data for that year were lost.
terrorism
is a data.frame
containing the following:
integer year, 1970:2020.
an ordered
factor giving the
methodology / organization responsible for
the data collection for most of the given
year. The Pinkerton Global Intelligence
Service (PGIS
) managed data collection
from 1970-01-01 to 1997-12-31. The
Center for Terrorism and Intelligence
Studies (CETIS
) managed the project
from 1998-01-01 to 2008-03-31. The
Institute for the Study of Violent Groups
(ISVG
) carried the project from
2008-04-01 to 2011-10-31. The National
Consortium for the Study of Terrorism and
Responses to Terrorism (START
) has
managed data collection since
2011-11-01. For this variable,
partial years are ignored, so
methodology
= CEDIS
for
1998:2007, ISVG
for 2008:2011, and
START
for more recent data.
a character vector consisting of
the first character of the levels
of methodology
:
c('p', 'c', 'i', 's')
integer number of incidents identified each year.
NOTE:
sum(terrorism[["incidents"]])
=
214660 = 209706 in the GTD database
plus 4954 for 1993, for which the
incident-level data were lost.
integer number of incidents identified
each year with country_txt
=
"United States".
integer number of incidents classified
as "suicide" by GTD variable
suicide
= 1. For 2007, this
is 359, the number reported by
Pape et al.
For 2013, it is 624, which is 5 more
than the 619 mentioned by Pape et al.
Without checking with the SMART
project administrators, one might
suspect that 5 more suicide incidents
from 2013 were found after the data
Pape et al. analyzed but before the
data used for this analysis.
Number of suicide incidents by year
with country_txt
=
"United States".
number of confirmed fatalities for
incidents in the given year, including
attackers =
sum(nkill, na.rm=TRUE)
in the
GTD incident data.
NOTE: nkill
in the GTD incident
data includes both perpetrators
and victims when both are available.
It includes one when only one is
available and is NA
when
neither is available. However, in
most cases, we might expect that the
more spectacular and lethal incidents
would likely be more accurately
reported. To the extent that this is
true, it means that when numbers are
missing, they are usually zero or
small. This further suggests that
the summary numbers recorded here
probably represent a slight but not
substantive undercount.
number of U.S. citizens who died as a
result of incidents for that year =
sum(nkill.us, na.rm=TRUE)
in the
GTD incident data.
NOTES:
1. This is subject to the same likely
modest undercount discussed with
nkill
.)
2. These are U.S. citizens killed
regardless of location. This explains at
least part of the discrepancies between
terrorism[, 'nkill.us']
and
nkill.byCountryYr['United States', ]
.
number of people wounded. (This is
subject to the same likely modest
undercount discussed with
nkill
.)
Number of U.S. citizens wounded in
terrorist incidents for that year =
sum(nwound.us, na.rm=TRUE)
in
the GTD incident data. (This is
subject to the same likely modest
undercount discussed with
nkill
.)
proportion of observations by year
with missing values. These numbers
are higher for the early data than
more recent numbers. This is
particularly true for nkill.us
and nwound.us
, which exceed
90 percent for most of the period
with methodology
=
PGIS
, prior to 1998.
Estimated de facto population in thousands living in the world and in the US as of 1 July of the year indicated, according to the Population Division of the Department of Economic and Social Affairs of the United Nations; see "Sources" below.
Crude death rate
(deaths per 1,000 population) worldwide
and in the US, according to the World
Bank; see "Sources" below. This World
Bank data set includes USdeathRate
for each year from 1900 to 2020.
NOTE: USdeathRate
to 2009 is to
two significant digits only. Other death
rates carry more significant digits.
number of deaths by year in the world and US
worldDeaths =
worldPopulation * worldDeathRate
.
USdeaths
were computed by summing
across age groups in "Deaths_5x1.txt" for
the United States, downloaded from
https://www.mortality.org/Country/Country?cntr=USA
from the Human Mortality Database; see sources below.
terrorism deaths per million population worldwide and in the US =
nkill / (0.001*worldPopulation)
nkill.us / (0.001*USpopulation)
terrorism deaths as a proportion of total deaths worldwide and in the US
pkill = nkill / worldDeaths
pkill.us = nkill.us / USdeaths
As noted with the "description" above,
Pape et al.
noted that the GTD reported an increase in
suicide terrorism of over 70 percent
between 2007 and 2013, while their Suicide Attack Database
showed a 19 percent decrease over
the same period. Pape et al. insisted that
the most likely explanation for this
difference is the change in the
organization responsible for managing
that data collection from ISVG
to
START
.
If the issue is restricted to how incidents are classified as "suicide terrorism", this concern does not affect the other variables in this summary.
However, if it also impacts what incidents are classified as "terrorism", it suggests larger problems.
Spencer Graves
START (National Consortium for the Study of Terrorism and Responses to Terrorism). (2022). Global Terrorism Database, 1970 - 2020 [data file]. Retrieved from https://www.start.umd.edu/gtd, 2024-10-17.
See also the Global Terrorism Database maintained by the National Consortium for the Study of Terrorism and Responses to Terrorism (START, 2022), https://www.start.umd.edu/gtd.
The world and US population figures came from "Total Population - Both Sexes", World Population Prospects 2022, published by the Population Division, World Population Prospects, of the United Nations, accessed 2022-10-09.
Human Mortality Database. University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany), accessed 2022-10-11.
Robert Pape, Keven Ruby, Vincent Bauer and Gentry Jenkins, "How to fix the flaws in the Global Terrorism Database and why it matters", The Washington Post, August 11, 2014 (accessed 2016-01-09).
data(terrorism) ## ## plot deaths per million population ## plot(kill.pmp~year, terrorism, pch=method, type='b') plot(kill.pmp.us~year, terrorism, pch=method, type='b', log='y', las=1) # terrorism as parts per 10,000 # of all deaths plot(pkill*1e4~year, terrorism, pch=method, type='b', las=1) plot(pkill.us*1e4~year, terrorism, pch=method, type='b', log='y', las=1) # plot number of incidents, number killed, # and proportion NA plot(incidents~year, terrorism, type='b', pch=method) plot(nkill.us~year, terrorism, type='b', pch=method) plot(nkill.us~year, terrorism, type='b', pch=method, log='y') plot(pNA.nkill.us~year, terrorism, type='b', pch=method) abline(v=1997.5, lty='dotted', col='red') ## ## by country by year ## data(incidents.byCountryYr) data(nkill.byCountryYr) yr <- as.integer(colnames( incidents.byCountryYr)) str(maxDeaths <- apply(nkill.byCountryYr, 1, max) ) str(omax <- order(maxDeaths, decreasing=TRUE)) head(maxDeaths[omax], 8) tolower(substring( names(maxDeaths[omax[1:8]]), 1, 2)) pch. <- c('i', 'g', 'f', 'l', 's', 'c', 'u', 'p') cols <- 1:4 matplot(yr, sqrt(t( nkill.byCountryYr[omax[1:8], ])), type='b', pch=pch., axes=FALSE, ylab='(square root scale) ', xlab='', col=cols, main='number of terrorism deaths\nby country') axis(1) (max.nk <- max(nkill.byCountryYr[omax[1:8], ])) i.nk <- c(1, 100, 1000, 3000, 5000, 7000, 10000) cbind(i.nk, sqrt(i.nk)) axis(2, sqrt(i.nk), i.nk, las=1) ip <- paste(pch., names(maxDeaths[omax[1:8]])) legend('topleft', ip, cex=.55, col=cols, text.col=cols)
data(terrorism) ## ## plot deaths per million population ## plot(kill.pmp~year, terrorism, pch=method, type='b') plot(kill.pmp.us~year, terrorism, pch=method, type='b', log='y', las=1) # terrorism as parts per 10,000 # of all deaths plot(pkill*1e4~year, terrorism, pch=method, type='b', las=1) plot(pkill.us*1e4~year, terrorism, pch=method, type='b', log='y', las=1) # plot number of incidents, number killed, # and proportion NA plot(incidents~year, terrorism, type='b', pch=method) plot(nkill.us~year, terrorism, type='b', pch=method) plot(nkill.us~year, terrorism, type='b', pch=method, log='y') plot(pNA.nkill.us~year, terrorism, type='b', pch=method) abline(v=1997.5, lty='dotted', col='red') ## ## by country by year ## data(incidents.byCountryYr) data(nkill.byCountryYr) yr <- as.integer(colnames( incidents.byCountryYr)) str(maxDeaths <- apply(nkill.byCountryYr, 1, max) ) str(omax <- order(maxDeaths, decreasing=TRUE)) head(maxDeaths[omax], 8) tolower(substring( names(maxDeaths[omax[1:8]]), 1, 2)) pch. <- c('i', 'g', 'f', 'l', 's', 'c', 'u', 'p') cols <- 1:4 matplot(yr, sqrt(t( nkill.byCountryYr[omax[1:8], ])), type='b', pch=pch., axes=FALSE, ylab='(square root scale) ', xlab='', col=cols, main='number of terrorism deaths\nby country') axis(1) (max.nk <- max(nkill.byCountryYr[omax[1:8], ])) i.nk <- c(1, 100, 1000, 3000, 5000, 7000, 10000) cbind(i.nk, sqrt(i.nk)) axis(2, sqrt(i.nk), i.nk, las=1) ip <- paste(pch., names(maxDeaths[omax[1:8]])) legend('topleft', ip, cex=.55, col=cols, text.col=cols)
a cross-section from 1995-96
number of observations : 2724
observation : individuals
country : Belgium
data(Tobacco)
data(Tobacco)
A dataframe containing :
a factor with levels (bluecol
, whitecol
,
inactself
), the last level being inactive and
self-employed
a factor with levels (flanders
, wallon
,
brussels
)
number of kids of more than two years old
number of kids of less than two years old
number of adults in household
log of total expenditures
budget share of tobacco
budget share of alcohol
age in brackets (0-4)
National Institute of Statistics (NIS), Belgium.
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons, chapter 7.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1987
number of observations : 2929
observation : individuals
country : Netherland
data(Train)
data(Train)
A dataframe containing :
individual identifier
choice identifier
one of choice1, choice2
price of proposition z (z=1,2) in cents of guilders
travel time of proposition z (z=1,2) in minutes
comfort of proposition z (z=1,2), 0, 1 or 2 in decreasing comfort order
number of changes for proposition z (z=1,2)
Meijer, Erik and Jan Rouwendal (2005) “Measuring welfare effects in models with random coefficients”, Journal of Applied Econometrics, forthcoming.
Ben–Akiva, M., D. Bolduc and M. Bradley (1993) “Estimation of travel choice models with randomly distributed values of time”, Transportation Research Record, 1413, 88–97.
Carson, R.T., L. Wilks and D. Imber (1994) “Valuing the preservation of Australia's Kakadu conservation zone”, Oxford Economic Papers, 46, 727–749.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 25
observation : regional
country : United States
data(TranspEq)
data(TranspEq)
A dataframe containing :
state name
output
capital input
labor input
number of firms
Zellner, A. and N. Revankar (1970) “Generalized production functions”, Review of Economic Studies, 37, 241-250.
Greene, W.H. (2003) Econometric Analysis, Prentice Hall, https://archive.org/details/econometrics0000haya, Table F9.2.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1974
number of observations : 2675
country : United States
data(Treatment)
data(Treatment)
A dataframe containing :
treated ?
age
education in years
a factor with levels ("other
",
"black
", "hispanic
")
married ?
real annual earnings in 1974 (pre-treatment)
real annual earnings in 1975 (pre-treatment)
real annual earnings in 1978 (post-treatment)
unemployed in 1974 ?
unemployed in 1975 ?
Lalonde, R. (1986) “Evaluating the Econometric Evaluations of Training Programs with Experimental Data”, American Economic Review, 604–620.
Dehejia, R.H. and S. Wahba (1999) “Causal Effects in Nonexperimental Studies: reevaluating the Evaluation of Training Programs”, JASA, 1053–1062.
Dehejia, R.H. and S. Wahba (2002) “Propensity-score Matching Methods for Nonexperimental Causal Studies”, Restat, 151–161.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 889–95.
RetSchool
,
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section
number of observations : 13705
observation : individuals
country : United States
data(Tuna)
data(Tuna)
A dataframe containing :
individuals identifiers
purchase identifiers
one of skw
(Starkist water),
cosw
(Chicken of the sea water),
pw
(store–specific private
label water), sko
(Starkist
oil), coso
(Chicken of the sea
oil)
price of brand z
Kim, Byong–Do, Robert C. Blattberg and Peter E. Rossi (1995) “Modeling the distribution of price sensitivity and implications for optimal retail pricing”, Journal of Business Economics and Statistics, 13(3), 291.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20
number of observations : 3343
data(UnempDur)
data(UnempDur)
A time series containing :
length of spell in number of two-week intervals
= 1 if re-employed at full-time job
= 1 if re-employed at part-time job
1 if re-employed but left job: pt-ft status unknown
1 if still jobless
age
= 1 if filed UI claim
eligible replacement rate
eligible disregard rate
log weekly earnings in lost job (1985$)
years tenure in lost job
McCall, B.P. (1996) “Unemployment Insurance Rules, Joblessness, and Part-time Work”, Econometrica, 64, 647–682.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp. 603–8, 632–6, 658–62, 671–4 and 692.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1993
number of observations : 452
observation : individuals
country : United States
data(Unemployment)
data(Unemployment)
A dataframe containing :
duration of first spell of unemployment, t, in weeks
1 if spell is complete
one of nonwhite, white
one of male, female
reason for unemployment, one of new
(new entrant), lose
(job loser),
leave
(job leaver), reentr
(labor force reentrant)
'yes' if (1) the unemployment spell is completed between the first and second surveys and number of methods used to search > average number of methods used across all records in the sample, or, (2) for individuals who remain unemployed for consecutive surveys, if the number of methods used is strictly nondecreasing at all survey points, and is strictly increasing at least at one survey point
'yes' if an individual used a public employment agency to search for work at any survey points relating to the individuals first unemployment spell
1 if an individual is searching for full time work at survey 1
1 if an individual is searching for full time work at survey 2
1 if an individual is searching for full time work at survey 3
1 if an individual is searching for full time work at survey 4
number of observations on the first spell of unemployment for the record
Romeo, Charles J. (1999) “Conducting inference in semiparametric duration models under inequality restrictions on the shape of the hazard implied by the job search theory”, Journal of Applied Econometrics, 14(6), 587–605.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1988
number of observations : 62
observation : schools
country : United Kingdom
data(University)
data(University)
A dataframe containing :
undergraduate students
postgraduate students
net assets
academic numbers
academic related numbers
clerical numbers
computer operators
technicians
student fees
academic pay
academic related pay
secretarial pay
admin pay
aggregate research rank
furniture and equipment
land and buildings
research grants
Glass, J.C., D.G. McKillop and N. Hyndman (1995) “Efficiency in the provision of university teaching and research : an empirical analysis of UK universities”, Journal of Applied Econometrics, 10(1), January–March, 61–72.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
Data on classification activity of the United States government.
Fitzpatrick (2013) notes that the dramatic jump
in derivative classification activity
(DerivClassActivity
) that occurred in 2009
coincided with "New guidance issued to include
electronic environment". Apart from the jump in
2009, the DerivClassActivity
tended to
increase by roughly 12 percent per year (with a
standard deviation of the increase in the natural
logarithm of DerivClassActivity
of 0.18).
data(USclassifiedDocuments)
data(USclassifiedDocuments)
A dataframe containing :
the calendar year
Number of people in the government designated
as Original Classification Authorities for
the indicated year
.
Original classification activity for the indicated year: These are the number of documents created with an original classification, i.e., so designated by an official Original Classification Authority.
Percent of OCActivity
covered by the
10 year declassification rules.
Derivative classification activity for the indicated year: These are the number of documents created that claim another document as the authority for classification.
The lag 1 autocorrelation of the first
difference of the logarithms of
DerivClassActivity
through 2008 is
-0.52
. However, because there are only
13 numbers (12 differences), this negative
correlation is not statistically significant.
Fitzpatrick, John P. (2013) Annual Report to the President for 2012, United States Information Security Oversight Office, National Archives and Record Administration, June 20, 2013. Information Security Oversight Office (ISOO) of the National Archives.
## ## 1. plot DerivClassActivity ## plot(DerivClassActivity~year, USclassifiedDocuments) # Exponential growth? plot(DerivClassActivity~year, USclassifiedDocuments, log='y') # A jump in 2009 as discussed by Fitzpatrick (2013). # Otherwise plausibly a straight line. ## ## 2. First difference? ## plot(diff(log(DerivClassActivity))~year[-1], USclassifiedDocuments) # Jump in 2009 but otherwise on distribution ## ## 3. autocorrelation? ## sel <- with(USclassifiedDocuments, (1995 < year) & (year < 2009) ) acf(diff(log(USclassifiedDocuments$ DerivClassActivity[sel]))) # lag 1 autocorrelation = (-0.52). # However, with only 12 numbers, # this is not statistically significant.
## ## 1. plot DerivClassActivity ## plot(DerivClassActivity~year, USclassifiedDocuments) # Exponential growth? plot(DerivClassActivity~year, USclassifiedDocuments, log='y') # A jump in 2009 as discussed by Fitzpatrick (2013). # Otherwise plausibly a straight line. ## ## 2. First difference? ## plot(diff(log(DerivClassActivity))~year[-1], USclassifiedDocuments) # Jump in 2009 but otherwise on distribution ## ## 3. autocorrelation? ## sel <- with(USclassifiedDocuments, (1995 < year) & (year < 2009) ) acf(diff(log(USclassifiedDocuments$ DerivClassActivity[sel]))) # lag 1 autocorrelation = (-0.52). # However, with only 12 numbers, # this is not statistically significant.
A data.frame
giving the profits of
the finance industry in the United States
as a proportion of total corporate domestic
profits.
data(USFinanceIndustry)
data(USFinanceIndustry)
A data.frame
with the following columns:
integer year starting with 1929
Corporate profits with inventory valuation and capital consumption adjustments in billions of current (not adjusted for inflation) US dollars
Domestic industries profits in billions
Financial industries profits in billions
Nonfinancial industries profits in billions
Profits of the "Rest of the world" in their contribution to US Gross Domestic Product in billions
= Financial/Domestic
This is extracted from Table 6.16 of the National Income and Product Accounts (NIPA) compiled by the Bureau of Economic Analysis of the United States federal government. This table comes in four parts, A (1929-1947), B (1948-1987), C (1987-2000), and D (1998-present). Parts A, B, C and D contain different numbers of data elements, but the first five have the same names and are the only ones used here. The overlap between parts C and D (1998-2000) have a root mean square relative difference of 0.7 percent; there were no differences between the numbers in the overlap period between parts B and C (1987).
This was created using the following command:
demoDir <- system.file('demoFiles', package='Ecdat')
demoCsv <- dir(demoDir, pattern='csv$', full.names=TRUE)
nipa6.16 <- Ecfun::readNIPA(demoCsv)
USFinanceIndustry <- as.data.frame(nipa6.16)
names(USFinanceIndustry) <- c('year',
'CorporateProfitsAdj', 'Domestic', 'Financial',
'Nonfinancial', 'restOfWorld')
USFinanceIndustry$FinanceProportion <-
with(USFinanceIndustry, Financial/Domestic)
https://www.bea.gov: Under "U.S. Economic Accounts", first
select "Corporate Profits" under "National". Then next to
"Interactive Tables", select, "National Income and Product Accounts
Tables". From there, select "Begin using the data...". Under
"Section 6 - income and employment by industry", select each of the
tables starting "Table 6.16". As of February 2013, there were 4 such
tables available: Table 6.16A, 6.16B, 6.16C and 6.16D. Each of the
last three are available in annual and quarterly summaries. The
USFinanceIndustry
data combined the first 4 rows of the 4
annual summary tables.
data(USFinanceIndustry) plot(FinanceProportion~year, USFinanceIndustry, type='b', ylim=c(0, max(FinanceProportion, na.rm=TRUE)), xlab='', ylab='', las=1, cex.axis=2, bty='n', lwd=2, col='blue') # Write to a file for Wikimedia Commons ## Not run: if(FALSE){ svg('USFinanceIndustry.svg') plot(FinanceProportion~year, USFinanceIndustry, type='b', ylim=c(0, max(FinanceProportion, na.rm=TRUE)), xlab='', ylab='', las=1, cex.axis=2, bty='n', lwd=2, col='blue') dev.off() } ## End(Not run)
data(USFinanceIndustry) plot(FinanceProportion~year, USFinanceIndustry, type='b', ylim=c(0, max(FinanceProportion, na.rm=TRUE)), xlab='', ylab='', las=1, cex.axis=2, bty='n', lwd=2, col='blue') # Write to a file for Wikimedia Commons ## Not run: if(FALSE){ svg('USFinanceIndustry.svg') plot(FinanceProportion~year, USFinanceIndustry, type='b', ylim=c(0, max(FinanceProportion, na.rm=TRUE)), xlab='', ylab='', las=1, cex.axis=2, bty='n', lwd=2, col='blue') dev.off() } ## End(Not run)
It is commonly claimed that Franklin Roosevelt (FDR) did not end the Great Depression: World War II (WW2) did. This is supported by the 10.6 percent growth per year in real Gross Domestic Product (GDP) per capita seen in the standard GDP estimates from 1940 to 1945. It is also supported by the rapid decline in unemployment during the war.
However, no comparable growth spurts in GDP per
capita catch the eye in a plot of
log(GDP per capita)
from 1790 to 2015,
whether associated with a war or not, using data
from Measuring Worth. The only other features
of that plot that seem visually comparable are
the economic disaster of Herbert Hoover's
presidency (when GDP per capital fell by 10
percent per year, 1929-1932), the impressive
growth of the US economy during the first seven
years of Franklin Roosevelt's presidency (6.4
percent per year, 1933-1940), and the post-World
War II recession (when GDP per capita fell by
7.9 percent per year, 1945-1947). (NOTE: The
web site for Measuring Worth,
https://measuringworth.com/
still works,
but has not always been maintained to current
internet security standards. Therefore, the
link is provided here in text but not as a link.)
Closer inspection of this plot suggests that the US economy has generally grown faster after FDR than before. This might plausibly be attributed to "The Keynesian Ascendancy 1939-1979".
Unemployment dropped during the First World War as it did during WW2. Comparable unemployment data are not available for the U.S. during other major wars, most notably the American Civil War and the Mexican-American War.
This data set provides a platform for testing
the effects of presidency, war, and Keynes. It
does this by combining the numbers for US
population and real GDP per capital dollars from
Measuring Worth with the presidency and a list
of major wars and an estimate of the battle
deaths by year per million population. (As
noted above, the web address for measuring worth,
https://measuringworth.com/
, often gives
security warnings but still seems to provide the
data as before.)
US unemployment is also considered.
data(USGDPpresidents)
data(USGDPpresidents)
A data.frame
containing 259
observations on the following variables:
integer: the year,
c(seq(1610, 1770, 10), 1774:2015)
Numeric: U. S. Consumer Price Index per Officer and Williamson (2022), starting in 1774. Average 1982-84 = 100.
numeric: Implicit price deflators for Gross Domestic Product with 2012 = 100 per Johnston and Williamson.
integer: US population in thousands.
Population figures for 1610 to 1780 came from Springston (2013). The rest came from Johnston and Williamson. (The early population figures reflect only the European settlers in the British colonies that eventually became the US.)
numeric: real Gross Domestic Product (GDP) per capita in 2012 dollars since 1790.
Real GDP
=
population.K*realGDPperCapita
,
in thousands.
Current or nominal GDPperCapita
=
realGDPperCapita*GDPdeflator/100
.
ordered
: Crown of England
through 1774, followed by the
"ContinentalCongress"
and the
"ArticlesOfConfederation"
until
Washington, who became President under the
current base constitution in 1789. Two
nineteenth century presidents are not
listed here (William Henry Harrison and
James A. Garfield), because they died so
soon after inauguration that any
contribution they made to the economic
growth of the nation might seem too slight
to measure accurately in annual data like
this; their contributions therefore
appear combined with their replacements
(John Tyler and Chester A. Arthur,
respectively). The service of two other
presidents is officially combined here:
"Taylor-Fillmore" refers to the 16 months
served by Zachary Taylor with the 32 months
of Millard Fillmore. These modifications
make
Barack Obama
number 41 on this list, even though he's
the 44th president of the U.S.
ordered
: This lists the
major wars in US history by years
involving active hostilities. A war is
"major" for present purposes if it met
two criteria:
(1) It averaged at least 10 battle deaths per year per million US population.
(2) It was listed in one of two lists of wars: For wars since 1816, it must have appeared in the Correlates of War. For wars between 1790 and 1815, it must have appeared in the Wikipedia "List of wars involving the United States".
The resulting list includes a few adjustments to the list of wars that might come readily to mind for people moderately familiar with US history.
A traditional list might start with the American Revolution, the War of 1812, the Mexican-American war, the Civil War, the Spanish-American war, World Wars I and II, Korea, and Vietnam. In addition, the Northwest Indian War involved very roughly 30 battle deaths per year per million population 1785-1795. This compares with the roughly 100 battle deaths per year 1812-1815 for the War of 1812.
For present purposes, the Spanish-American War is combined with the lesser-known American-Philippine War: The latter involved 50 percent more battle deaths but over a longer period of time and arguably with less impact on the stature of the US as a growing world power. However, its magnitude suggest it might have impacted the US economy in a way roughly comparable to the Spanish-American war. The two are therefore listed here together as "Spanish-American-Philippine" war.
The Correlates of
War (COW) data include multiple US uses
of military force during the Vietnam War
era. It starts with "Vietnam Phase 1",
1961-65, with 506 battle deaths in the COW
data base. It includes the "Second
Laotian" war phases 1 and 2, plus
engagement with a "Communist Coalition"
and Khmer Rouge as well as actions in the
Dominican Republic and Guatemala. The
current data.frame
includes
only "Vietnam", referring primarily to
COW's "Vietnam War, Phase 2", 1965-1973.
The associated battle deaths include
battle deaths from these other, lesser
concurrent conflicts.
The COW data currently ends in 2007. However, the post-2000 conflicts in Afghanistan and Iraq averaged less than 1,000 battle deaths per year or roughly 3 battle deaths per year per million population. This is below the threshold of 10 battle deaths per year per million population. This in turn suggests that any impact of those conflicts on the US economy might be small and difficult to estimate.
numeric: Numbers of battle deaths by year estimated by allocating to the different years the totals reported for each major war in proportion to the number of days officially in conflict each year. The totals were obtained (in August-September 2015) from The Correlates of War data for conflicts since 1816 and from Wikipedia for previous wars back to 1774, as noted above.
numeric: battle deaths per million
population =
1000*battleDeaths/population.K
.
integer taking the value 1 between 1939 and 1979 and 0 otherwise, as suggested by the section entitled "The Keynesian Ascendancy 1939-1979" in the Wikipedia article on John Maynard Keynes.
Estimated US unemployment rate
ordered
giving the source for
US unemployment:
<NA>
Lebergott
Romer
Coen
BLS
Clearly, the more recent numbers should be more accurate.
Receipts and Outlays of the US federal government in millions of current dollars.
For data beginning with 1901, these are from the US federal budget from The White House (2022). Earlier data are from series Y 335-337 in US Census Bureau (1975). As of 2022-02-22 the data from The White House included aggregations for 1789-1849 and 1850-1900, which matched the totals of Y 335-337 for those two sets of years. The numbers from 1901 to 1933 are the same in both sources.
We used The White House (2022) for the more recent numbers with one exception: Between 1976 and 1977 the fiscal year was changed from starting July 1 to October 1. July, August, and September, 1976, is called the "transitional quarter", and has been deleted from this dataset.
NOTES:
The numbers for 1843 are for only the first half of the year, January 1 through June 30. This explains why the numbers for 1843 are only roughly half of the corresponding values for 1844 and 1845.
Also, the numbers for 1791 are actually for 1789-1791. However, those numbers seem comparable to those for 1792 and 1793, so it is listed as only for one year rather than three.
US federal government debt in millions of
current dollars per FiscalData
(2022).
This matches Y 338 in United States
Census Bureau (1975) 1921-1939 but not
earlier, and Y 338 ends with 1939.
Between 1921 and 1939 these numbers are
as of June 30. Between 1843 and 1920
they are as of July 1. The earlier
numbers are as of January 1.
FiscalData
(2022) includes debt
for both January 1 (20 million) and
July 1 (33 million) for 1843. For
present purposes, we omit the
January 1 number. This overstates
the volatility of the national
debt during that period, showing
it rising from 14 million in
1842 (January 1) to 33 million
in 1843 (July 1), being 18 not
12 months. The alternative would
be to delete the 33 million, but
that would understate the volatility
of the debt during that period.
numeric = fedReceipts
,
fedOutlays
, fedSurplus
,
and fedDebt
divided by
(population.K * realGPDperCapita / (GDPdeflator))
,
except for the single year 1843,
for which fedReceipts
,
fedOutlays
, and fedSurplus
were for only the first six months; to
compute *_pGDP
for these numbers
for 1843 only, the denominator in this
formula is cut in half to compensate.
rownames(USGDPpresidents) = Year
Spencer Graves
Robert M. Coen (1973) Labor Force and Unemployment in the 1920's and 1930's: A Re-Examination Based on Postwar Experience", The Review of Economics and Statistics, 55(1): 46-55.
FiscalData
(2022)
"Historical Debt Outstanding", accessed 2022-04-11.
Louis Johnston and Samuel H. Williamson,
"What Was the U.S. GDP Then?", Measuring Worth,
accessed 2022-02-22. (NOTE: This came from
https://www.measuringworth.org/usgdp/
.
this web link generally works as of 2022-02-22.
However, in the past it has sometimes returned
a warning, e.g., "SSL certificate problem".
The web site seems to be good but not
maintained to current security standards.)
Stanley Lebergott (1964). Manpower in Economic Growth: The American Record since 1800. Pages 164-190. New York: McGraw-Hill. Cited from Wikipedia, "Unemployment in the United States", accessed 2016-07-08.
Lawrence H Officer and Samuel H. Williamson, 'The Annual Consumer Price Index for the United States, 1774-Present,' MeasuringWorth, 2022-02-22.
Christina Romer (1986). "Spurious Volatility in Historical Unemployment Data", The Journal of Political Economy, 94(1): 1-37.
Sarkees, Meredith Reid; Wayman, Frank (2010). "The Correlates of War Project: COW War Data, 1816 - 2007 (v4.0)", accessed 2015-09-02.
The White House (2022). Historical Tables: Spreadsheets: Table 1.1-Summary of Receipts, Outlays, and Surpluses or Deficits (-): 1789-2026, accessed 2022-02-22.
United States Census Bureau (1975) Bicentennial Edition: Historical Statistics of the United States, Colonial Times to 1970, Part 2. Chapter Y. Government, accessed 2022-02-22.
Wikipedia, "List of wars involving the United States", accessed 2015-09-13.
Wikipedia, "Unemployment in the United States". See also https://en.wikipedia.org/wiki/User_talk:Peace01234#Unemployment_Data. Accessed 2016-07-08.
The unemployment data since 1940 are from
series LNS14000000
from the Current
Population Survey. These data are available as
a monthly series from the
Current Population Survey of the Bureau of Labor Statistics.
Chuck Springston, "Population of the 13 Colonies 1610-1790", October 28, 2013
## ## GDP, Presidents and Wars ## data(USGDPpresidents) (wars <- levels(USGDPpresidents$war)) nWars <- length(wars) plot(realGDPperCapita/1000~Year, USGDPpresidents, log='y', type='l', ylab='average annual income (K$)', las=1) abline(v=c(1929, 1933, 1945), lty='dashed') text(1930, 2.5, "Hoover", srt=90, cex=0.9) text(1939.5, 30, 'FDR', srt=90, cex=1.1, col='blue') # label wars (logGDPrange <- log(range(USGDPpresidents$realGDPperCapita, na.rm=TRUE)/1000)) (yrRange <- range(USGDPpresidents$Year)) (yrMid <- mean(yrRange)) for(i in 2:nWars){ w <- wars[i] sel <- (USGDPpresidents$war==w) yrs <- range(USGDPpresidents$Year[sel]) abline(v=yrs, lty='dotted', col='grey') yr. <- mean(yrs) w.adj <- (0.5 - 0.6*(yr.-yrMid)/diff(yrRange)) logy <- (logGDPrange[1]+w.adj*diff(logGDPrange)) y. <- exp(logy) text(yr., y., w, srt=90, col='red', cex=0.5) } ## ## CPI v. GDPdeflator ## plot(GDPdeflator~CPI, USGDPpresidents, type='l', log='xy') ## ## Unemployment ## plot(unemployment~Year, USGDPpresidents, type='l') ## ## federal outlays, pct of GDP ## sel <- !is.na(USGDPpresidents$fedOutlays_pGDP) plot(100*fedOutlays_pGDP~Year, USGDPpresidents[sel,], type='l', log='y', xlab='', ylab='US federal outlays, pct of GDP') abline(h=2:3) war <- (USGDPpresidents$war !='') abline(v=USGDPpresidents$Year[war], lty='dotted', col='light gray') abline(v=c(1929, 1933), col='red', lty='dotted') text(1931, 22, 'Hoover', srt=90, col='red')
## ## GDP, Presidents and Wars ## data(USGDPpresidents) (wars <- levels(USGDPpresidents$war)) nWars <- length(wars) plot(realGDPperCapita/1000~Year, USGDPpresidents, log='y', type='l', ylab='average annual income (K$)', las=1) abline(v=c(1929, 1933, 1945), lty='dashed') text(1930, 2.5, "Hoover", srt=90, cex=0.9) text(1939.5, 30, 'FDR', srt=90, cex=1.1, col='blue') # label wars (logGDPrange <- log(range(USGDPpresidents$realGDPperCapita, na.rm=TRUE)/1000)) (yrRange <- range(USGDPpresidents$Year)) (yrMid <- mean(yrRange)) for(i in 2:nWars){ w <- wars[i] sel <- (USGDPpresidents$war==w) yrs <- range(USGDPpresidents$Year[sel]) abline(v=yrs, lty='dotted', col='grey') yr. <- mean(yrs) w.adj <- (0.5 - 0.6*(yr.-yrMid)/diff(yrRange)) logy <- (logGDPrange[1]+w.adj*diff(logGDPrange)) y. <- exp(logy) text(yr., y., w, srt=90, col='red', cex=0.5) } ## ## CPI v. GDPdeflator ## plot(GDPdeflator~CPI, USGDPpresidents, type='l', log='xy') ## ## Unemployment ## plot(unemployment~Year, USGDPpresidents, type='l') ## ## federal outlays, pct of GDP ## sel <- !is.na(USGDPpresidents$fedOutlays_pGDP) plot(100*fedOutlays_pGDP~Year, USGDPpresidents[sel,], type='l', log='y', xlab='', ylab='US federal outlays, pct of GDP') abline(h=2:3) war <- (USGDPpresidents$war !='') abline(v=USGDPpresidents$Year[war], lty='dotted', col='light gray') abline(v=c(1929, 1933), col='red', lty='dotted') text(1931, 22, 'Hoover', srt=90, col='red')
Counts of prisoners under the jurisdiction of state and federal correctional authorities in the US. This does not include jail inmates.
data("USincarcerations")
data("USincarcerations")
A data frame with 95 observations on the following 7 variables.
an integer vector giving the year
c(1925:2019)
.
Total number of incarcerees =
maleTotal + femaleTotal
.
incarceration rate =
stateFedIncarcerees
per
100,000 population.
Total number of male incarcerees.
male incarceration rate =
maleTotal
per 100,000 males
in the US population.
Total number of female incarcerees.
female incarceration rate =
femaleTotal
per 100,000
females in the US population.
This dataset began as an effort to update
File:U.S. incarceration rates 1925 onwards.png
on Wikimedia Commons.
Conveniently data on these variables
was provided in a table for 1925 to 2014.
And a description was given of how to update
that table using files p*t03.csv
and
p*t05.csv
from
Prisoners In 2019.
An initial rationality check was to compute
checkTot
<-
stateFedIncarcerees
-
stateFedMales
-
stateFedFemales
This was 0 except for 1927 and 1973, when it
was 637 and 684. The stateFedFemales
for 1972:1974 was 6269, 6004, 7389. We
replaced 6004 with 6688, which made the
checkTot
0 for 1973.
Similar checks for 1927 yielded nothing as
obvious. However, the
stateFedIncarcerees
increased 6.9
percent in 1926 over 1925, and 12.2 and
5.8 percent in the following two years.
Subtracting 637 from 109983 for 1927 gave us
109346, which reduced the increase to 11.6
percent for 1927. It's no longer the maximum
annual increase prior to 1975.
Next, these numbers were compared with those
in p19t03.csv
and p19t05.csv
,
which include numbers of incarcerees and
rates per 100,000 population for 2009:2019.
The numbers were identical for 2009:2011,
but there were several differences for the
more recent counts.
For USincarcerations
, we used the
numbers from p19t03.csv
and
p19t05.csv
, because they seem likely
to be more accurate.
However, these numbers include only people in state and federal prisons. It excludes jails.
Key Statistic: Total correctional population
includes a plot of "Total adult
correctional population 1980-2016", which
does include jails. The data there are
available as
Total_correctional_population_counts_by_status.csv
. Data on these variables covering
2008-2018 are available as
cpus1718.csv
from "Data tables" at
Publication Correctional Populations In The United States, 2017-2018.
The data in cpus1718.csv
is mostly
but not entirely identical to "Total adult
correctional population 1980-2016" for
2008-2016, the period of overlap. We
therefore used the older data up to 2007
and cpus1718.csv
for 2008-2018.
Actual analysis of the jail data is left for another project.
Data from 1925 to 2014 from
File:U.S. incarceration rates 1925 onwards.png
on Wikimedia Commons, accessed 2020-11-23.
The primary source for the more recent data are
files p*t03.csv
and p*t05.csv
from
Prisoners In 2019, accessed 2020-11-23.
Data on jails and community supervision dating back to 1980 are available in Key Statistic: Total correctional population with data on the most recent years available from Publication Correctional Populations In The United States, 2017-2018.
Some time in 2021 or later more recent data should become available. When that happens, it may be desired to update this table to include those numbers – and check for any revisions of earlier numbers.
United States incarceration rate.
data(USincarcerations) matplot(USincarcerations[1], 0.001*USincarcerations[c(3, 5, 7)], type='l', xlab='', ylab='incarceration rate (%)') abline(h=0.5, lty='dotted', col='gray') lbl <- paste("US incarceration rate", '(percent of the population)', sep='\n') text(1955, 0.75, lbl) text(2007, 0.86, 'male', col=2) text(2007, 0.15, 'female', col=3)
data(USincarcerations) matplot(USincarcerations[1], 0.001*USincarcerations[c(3, 5, 7)], type='l', xlab='', ylab='incarceration rate (%)') abline(h=0.5, lty='dotted', col='gray') lbl <- paste("US incarceration rate", '(percent of the population)', sep='\n') text(1955, 0.75, lbl) text(2007, 0.86, 'male', col=2) text(2007, 0.15, 'female', col=3)
Advertising and circulation revenue for US newspapers since 1956 with GDP in billions of current dollars (i.e., not adjusted for inflation) plus ads as a proportion of revenue and revenue as a proportion of US Gross Domestic Product (GDP).
data("USnewspapers")
data("USnewspapers")
A data frame with 65 observations on the following 14 variables.
an integer vector giving the year
c(1956:2020)
.
Total newspaper revenue from advertising, circulation, and combined in billions of US dollars, both current and adjusted for inflation to 2012 dollars. The data were compiled from detailed reports until 2012 and estimated since.
Advertising as a proportion of total revenue.
US GDP in billions of dollars, both current and adjusted for inflation to constant 2012 dollars.
Newspaper advertising revenue as a percent of GDP.
Newspaper revenue as a proportion of GDP.
US population in millions
Newspaper revenue per person in current dollars.
Newspaper revenue per person in constant 2012 dollars.
Data used by McChesney and Nichols (2021-12-13) To Protect and Extend Democracy, Recreate Local News Media (Freepress.net, p. 6, note 10) to estimate that newspaper subsidies averaged roughly 0.216 percent of GDP between 1840 and 1844.
Newspaper data from "Newspapers fact sheet" published by the Pew Research Center, accessed 2021-12-18.
GDP data from Measuring Worth, accessed 2021-12-18.
McChesney and Nichols (2021-12-13) To Protect and Extend Democracy, Recreate Local News Media (Freepress.net, p. 6, note 10), accessed 2021-12-18.
Newspaper data from "Newspaper fact sheet" published by the Pew Research Center.
GDP data from Measuring Worth.
data(USnewspapers) plotNewsRevenue <- function(ys=c(2, 4, 6)){ ylim. <- range(USnewspapers[ys], na.rm=TRUE) xlim. <- range(USnewspapers$Year) to2013 <- (USnewspapers$Year<2013) matplot(USnewspapers$Year[to2013], USnewspapers[to2013, ys], type='l', log='y', xlim=xlim., ylim=ylim., las=1, xlab='', ylab='') matlines(USnewspapers$Year[!to2013], col=4:6, USnewspapers[!to2013, ys]) lnms <- outer(names(USnewspapers[c(2, 4, 6)]), c('', '-est'), paste0) legend('bottom', lnms, col=1:6, lty=1:6, cex=0.5) } plotNewsRevenue() plotNewsRevenue(c(3, 5, 7)) plot(100*newspapers_p_GDP~Year, USnewspapers, type='l', las=1, xlab='', ylab='newspapers percent of GDP') plot(RevenuePerCap_nominal~Year, USnewspapers, type='l', las=1, xlab='', ylab='Revenue per capita (nominal)') plot(RevenuePerCap_2012~Year, USnewspapers, type='l', las=1, xlab='', ylab='Revenue per capita (2012$)')
data(USnewspapers) plotNewsRevenue <- function(ys=c(2, 4, 6)){ ylim. <- range(USnewspapers[ys], na.rm=TRUE) xlim. <- range(USnewspapers$Year) to2013 <- (USnewspapers$Year<2013) matplot(USnewspapers$Year[to2013], USnewspapers[to2013, ys], type='l', log='y', xlim=xlim., ylim=ylim., las=1, xlab='', ylab='') matlines(USnewspapers$Year[!to2013], col=4:6, USnewspapers[!to2013, ys]) lnms <- outer(names(USnewspapers[c(2, 4, 6)]), c('', '-est'), paste0) legend('bottom', lnms, col=1:6, lty=1:6, cex=0.5) } plotNewsRevenue() plotNewsRevenue(c(3, 5, 7)) plot(100*newspapers_p_GDP~Year, USnewspapers, type='l', las=1, xlab='', ylab='newspapers percent of GDP') plot(RevenuePerCap_nominal~Year, USnewspapers, type='l', las=1, xlab='', ylab='Revenue per capita (nominal)') plot(RevenuePerCap_2012~Year, USnewspapers, type='l', las=1, xlab='', ylab='Revenue per capita (2012$)')
Numbers of post offices in the US from 1789 to 2020 with their income and expenses in current dollars and proportion of the federal government and of Gross Domestic Product (GDP). Also includes the number of pieces of mail, numbers of periodicals, pieces and periodicals per person, and cost coverage of periodicals for selected years.
It would be interesting to find the total value of the subsidies for newspapers and other periodicals as a proportion of the budgets of the USPS and the federal government as well as of GDP. That is currently absent from the data consulted to produce this.
data(USPS)
data(USPS)
A data.frame
containing 232
observations on the following variables:
integer: the year: 1789:2020
Income and expenses in millions of current dollars, per Historian (2022).
Income
and Expenses
as a
proportion of
USGDPpresidents[, 'fedReceipts']
and
USGDPpresidents[, 'fedOutlays']
,
respectively.
Income
and Expenses
as a
proportion of GDP
, per
MeasuringWorth
.
Income
and Expenses
per
capita in current dollars =
Income
and Expenses
divided by 1000 *
USGDPpresidents[, 'population.K']
.
Income
and Expenses
per
capita in constant 2012 dollars =
Income_cap
and
Expenses_cap
divided by
USGDPpresidents[, 'GDPdeflator']
.
Number of post offices per Historian (2022).
US population in thousands per post
office:
USGDPpresidents[, 'population.K']
divided by postOffices
.
numeric: Millions of pieces of mail handled and periodicals mailed. "Pieces of mail"" are from Historian (2022). "Periodicals" are from Historian (2010).
piecesOfMail
and periodicals
handled per capita (per human in the US)
per year.
Cost coverage of periodicals, per Historian (2010). This is available here only since 1960, though Historian (2010) gave a general outline of these numbers. This included saying, "In 1966, the percentage of its own costs covered by second-class mail (or 'cost coverage'), including the subsidy, was 35 percent [reported as 36 percent here]. Its real coverage was 24 percent." The narrative noted that during parts of the nineteenth century the actual rate was zero. Sometimes it was zero only within county. Sometimes advertising was charged a higher rate than news.
Other than numbers for the period since 1960, we note the coverage in 1951 as 20 percent, based on the following comment:
"In February 1951, in a special message to Congress, President Harry S. Truman argued at length for a rate increase: 'In fiscal year 1952 . . . newspaper and magazine publishers will have 200 million dollars – or 80 percent – of their postal costs paid for them by the general public.'"
rownames(USPS) = year
Data used by McChesney and Nichols (2021-12-13) To Protect and Extend Democracy, Recreate Local News Media (Freepress.net, p. 6, note 10) to estimate that newspaper subsidies averaged roughly 0.216 percent of GDP between 1840 and 1844.
Spencer Graves
Historian (2010-06) Postage Rates for Periodicals: A Narrative History, accessed 2022-04-29.
Historian (2022-02) Pieces of Mail Handled, Number of Post Offices, Income, and Expenses Since 1789.
Robert W. McChesney and John Nichols (2010) The Death and Life of American Journalism (Nation Books, pp. 310-311) describe how they computed 0.216 as an estimate of the percent of national income (Gross Domestic Product, GDP) devoted to newspaper subsidies, 1840-1844. The numbers in the current dataset seem essentially equivalent but new and therefore perhaps more accurate. With these numbers, we got 0.209 percent of GDP rather than their 0.216 percent.
## ## plot Expenses as a percent of the ## federal budget and of GDP ## data(USPS) plot(Expenses_pFed~Year, USPS, type='l') plot(Expenses_pGDP~Year, USPS, type='l') plot(100*periodicals/piecesOfMail~Year, USPS, type='l', ylab='', main='periodicals as percent of mail') # Select a year # as a charcter string not a number: USPS['1850',] ## ## Plot Expenses_pGDP with ## USGDPpresidents[, 'fedOutlays_pGDP'] ## str(yrs2 <- intersect(USPS$Year, USGDPpresidents$Year)) yrs2a <- as.character(yrs2) str(USPS_fed <- cbind(USPS[yrs2a, "Expenses_pGDP"], USGDPpresidents[yrs2a, "fedOutlays_pGDP"])) matplot(yrs2, USPS_fed, log='y', ylab='', las=1, type='l', xlab='') abline(v=c(1840, 1844), lty='dotted', col='grey') text(1842, 6e-3, cex=.7, 'McChesney & Nichols analysis', srt=90, col='grey') abline(v=c(1861, 1865), lty='dotted', col='grey') text(1863, 6e-3, 'Civil War', srt=90, col='grey') sel1 <- (USGDPpresidents$war=='World War I') (yr1 <- USGDPpresidents$Year[sel1]) abline(v=yr1, col='grey', lty='dotted') text(mean(yr1), 2e-3, 'WWI', col='grey', srt=90) sel2 <- (USGDPpresidents$war=='World War II') (yr2 <- range(USGDPpresidents$Year[sel2])) abline(v=yr2, col='grey', lty='dotted') text(mean(yr2), 2e-3, 'WWII', col='grey', srt=90) abline(h=c(.001, .01, .1), lty='dotted', col='grey') legend("bottomright", c('USPS Expenses_pGDP', 'fedOutlays_pGDP'), col=1:2, lty=1:2, bty='n')
## ## plot Expenses as a percent of the ## federal budget and of GDP ## data(USPS) plot(Expenses_pFed~Year, USPS, type='l') plot(Expenses_pGDP~Year, USPS, type='l') plot(100*periodicals/piecesOfMail~Year, USPS, type='l', ylab='', main='periodicals as percent of mail') # Select a year # as a charcter string not a number: USPS['1850',] ## ## Plot Expenses_pGDP with ## USGDPpresidents[, 'fedOutlays_pGDP'] ## str(yrs2 <- intersect(USPS$Year, USGDPpresidents$Year)) yrs2a <- as.character(yrs2) str(USPS_fed <- cbind(USPS[yrs2a, "Expenses_pGDP"], USGDPpresidents[yrs2a, "fedOutlays_pGDP"])) matplot(yrs2, USPS_fed, log='y', ylab='', las=1, type='l', xlab='') abline(v=c(1840, 1844), lty='dotted', col='grey') text(1842, 6e-3, cex=.7, 'McChesney & Nichols analysis', srt=90, col='grey') abline(v=c(1861, 1865), lty='dotted', col='grey') text(1863, 6e-3, 'Civil War', srt=90, col='grey') sel1 <- (USGDPpresidents$war=='World War I') (yr1 <- USGDPpresidents$Year[sel1]) abline(v=yr1, col='grey', lty='dotted') text(mean(yr1), 2e-3, 'WWI', col='grey', srt=90) sel2 <- (USGDPpresidents$war=='World War II') (yr2 <- range(USGDPpresidents$Year[sel2])) abline(v=yr2, col='grey', lty='dotted') text(mean(yr2), 2e-3, 'WWII', col='grey', srt=90) abline(h=c(.001, .01, .1), lty='dotted', col='grey') legend("bottomright", c('USPS Expenses_pGDP', 'fedOutlays_pGDP'), col=1:2, lty=1:2, bty='n')
The object returned by Ecfun::readUSstateAbbreviations()
on May 20,
2013.
data(USstateAbbreviations)
data(USstateAbbreviations)
A data.frame
containing 10 different character vectors of names
or codes for 76 different political entities including the United
States, the 50 states within the US, plus the District of Columbia, US
territories and other political designation, some of which are
obsolete but are included for historical reference.
The standard name of the entity.
description of status, e.g., state / commonwealth vs. island, territory, military mail code, etc.
Alternative abbreviations used per different standards. The most
commonly used among these may be the 2-letter codes officially
used by the US Postal Service (USPS
).
This was read from the Wikipedia article on "List of U.S. state abbreviations"
the Wikipedia article on "List of U.S. state abbreviations"
readUSstateAbbreviations
showNonASCII
grepNonStandardCharacters
subNonStandardCharacters
## ## to use ## data(USstateAbbreviations) ## ## to update ## ## Not run: USstateAbb2 <- readUSstateAbbreviations() ## End(Not run)
## ## to use ## data(USstateAbbreviations) ## ## to update ## ## Not run: USstateAbb2 <- readUSstateAbbreviations() ## End(Not run)
Thousands of words in US tax law for 1955
to 2015 in 10 year intervals. This
includes income taxes and all taxes in the
code itself (written by congress) and
regulations (written by government
administrators). For 2015 only
EntireTaxCodeAndRegs
is given; for
other years, this number is broken down by
income tax vs. other taxes and code vs.
regulations.
data(UStaxWords)
data(UStaxWords)
A data.frame
containing:
tax year
number of words in thousands in the US income tax code
number of words in thousands in US tax code other than income tax
number of words in thousands in the US tax code
number of words in thousands in US income tax regulations
number of words in thousands in US tax regulations other than income tax
number of words in thousands in both the code and regulations for the US income tax
number of words in thousands in both code and regulations for US taxes apart from income taxes.
number of words in thousands in US tax code and regulations
Thousands of words in the US tax code and
federal tax regulations, 1955-2015. This
is based on data from the Tax Foundation
(taxfoundation.org
), adjusted to
eliminate an obvious questionable observation
in otherTaxRegulations
for 1965. The
numbers of words in
otherTaxRegulations
was not reported
directly by the Tax Foundation but is
easily computed as the difference between
their Income and Entire tax numbers. This
series shows the numbers falling by 48
percent between 1965 and 1975 and by 1.5
percent between 1995 and 2005. These are
the only declines seen in these numbers
and seem inconsistent with the common
concern (expressed e.g., in Moody,
Warcholik and Hodge, 2005) about the
difficulties of simplifying any
governmental program, because vested
interest appear to defend almost anything.
Lessig (2011) notes that virtually all
provisions of US law that favor certain
segments of society are set to expire after
a modest number of years. These sunset
provisions provide recurring opportunities
for incumbent politicians to extort
campaign contributions from those same
segments to ensure the continuation of the
favorable treatment.
The decline of 48 percent in
otherTaxRegulations
seems more
curious for two additional reasons: First,
it was preceded by a tripling of
otherTaxRegulations
between 1955 and
1965. Second, it was NOT accompanied by
any comparable behavior of
otherTaxCode
. Instead, the latter
grew each decade by between 17 and 53
percent, similar to but slower than the
growth in IncomeTaxCode
and
IncomeTaxRegulations
.
Accordingly, otherTaxRegulations
for
1965 is replaced by the average of the
numbers for 1955 and 1975, and
EntireTaxRegulations
for 1965 is
comparably adjusted. This replaces (1322,
2960) for those two variables for 1965
with (565, 2203). In addition,
otherTaxCodeAndRegs
and
EntireTaxCodeAndRegulations
are also
changed from (1626, 3507) to (870, 2751).
Independent of whether this adjustment is correct or not, it's clear that there have been roughly 3 words of regulations for each word in the tax code. Most of these are income tax regulations, which have recently contained 4.5 words for every word in code. The income tax code currently includes roughly 50 percent more words than other tax code.
Spencer Graves
Tax Foundation: Number of Words in Internal Revenue Code and Federal Tax Regulations, 1955-2005 Scott Greenberg, "Federal Tax Laws and Regulations are Now Over 10 Million Words Long", October 08, 2015
J. Scott Moody, Wendy P. Warcholik, and Scott A. Hodge (2005) "The Rising Cost of Complying with the Federal Income Tax", The Tax Foundation Special Report No. 138.
data(UStaxWords) plot(EntireTaxCodeAndRegs/1000 ~ year, UStaxWords, type='b', ylab='Millions of words in US tax code & regs') # Write to a file for Wikimedia Commons ## Not run: svg('UStaxWords.svg') ## End(Not run) matplot(UStaxWords$year, UStaxWords[c(2:3, 5:6)]/1000, type='b', bty='n', ylab='', ylim=c(0, max(UStaxWords$EntireTaxCodeAndRegs)/1000), las=1, xlab="", cex.axis=2) lines(EntireTaxCodeAndRegs/1000~year, UStaxWords, lwd=2) ## Not run: dev.off() ## End(Not run) # lines 1:4 = IncomeTaxCode, otherTaxCode, # IncomeTaxRegulations, # and otherTaxRegulations, respectively ## ## Plotting the original numbers ## without the adjustment ## UStax. <- UStaxWords UStax.[2,c(6:7, 9:10)] <- c(1322, 2960, 1626, 3507) matplot(UStax.$year, UStax.[c(2:3, 5:6)]/1000, type='b', bty='n', ylab='', ylim=c(0, max( UStax.$EntireTaxCodeAndRegs)/1000), las=1, xlab="", cex.axis=2) lines(EntireTaxCodeAndRegs/1000~year, UStax., lwd=2) # Note especially the anomalous behaviour of # line 4 = otherTaxRegulations. As noted with # "details" above, otherTaxRegulations could have # tripled between 1955 and 1965, then fallen by 48 # percent between 1965 and 1975. However, that # does not seem credible, especially since there # was no corresponding behavior in otherTaxCode. ## ## linear trend ## (newWdsPerYr <- lm(EntireTaxCodeAndRegs~year, UStaxWords)) plot(UStaxWords$year, resid(newWdsPerYr)) # Roughly 150,000 additional words added each year # since 1955. # No indication of nonlinearity. # adusted R-squared exceeds 99 percent. ## ## linear trend with increased slope ## during the Reagan years ## # linear spline with knots at # 1981 and 1989 Reagan <- pmax(0, pmin( (UStaxWords$year-1981)/8, 1)) plot(Reagan~year, UStaxWords, type='b') UStaxWords$Reagan <- Reagan ReaganMdl <- EntireTaxCodeAndRegs~year + Reagan fitReagan <- lm(ReaganMdl, UStaxWords ) summary(fitReagan)
data(UStaxWords) plot(EntireTaxCodeAndRegs/1000 ~ year, UStaxWords, type='b', ylab='Millions of words in US tax code & regs') # Write to a file for Wikimedia Commons ## Not run: svg('UStaxWords.svg') ## End(Not run) matplot(UStaxWords$year, UStaxWords[c(2:3, 5:6)]/1000, type='b', bty='n', ylab='', ylim=c(0, max(UStaxWords$EntireTaxCodeAndRegs)/1000), las=1, xlab="", cex.axis=2) lines(EntireTaxCodeAndRegs/1000~year, UStaxWords, lwd=2) ## Not run: dev.off() ## End(Not run) # lines 1:4 = IncomeTaxCode, otherTaxCode, # IncomeTaxRegulations, # and otherTaxRegulations, respectively ## ## Plotting the original numbers ## without the adjustment ## UStax. <- UStaxWords UStax.[2,c(6:7, 9:10)] <- c(1322, 2960, 1626, 3507) matplot(UStax.$year, UStax.[c(2:3, 5:6)]/1000, type='b', bty='n', ylab='', ylim=c(0, max( UStax.$EntireTaxCodeAndRegs)/1000), las=1, xlab="", cex.axis=2) lines(EntireTaxCodeAndRegs/1000~year, UStax., lwd=2) # Note especially the anomalous behaviour of # line 4 = otherTaxRegulations. As noted with # "details" above, otherTaxRegulations could have # tripled between 1955 and 1965, then fallen by 48 # percent between 1965 and 1975. However, that # does not seem credible, especially since there # was no corresponding behavior in otherTaxCode. ## ## linear trend ## (newWdsPerYr <- lm(EntireTaxCodeAndRegs~year, UStaxWords)) plot(UStaxWords$year, resid(newWdsPerYr)) # Roughly 150,000 additional words added each year # since 1955. # No indication of nonlinearity. # adusted R-squared exceeds 99 percent. ## ## linear trend with increased slope ## during the Reagan years ## # linear spline with knots at # 1981 and 1989 Reagan <- pmax(0, pmin( (UStaxWords$year-1981)/8, 1)) plot(Reagan~year, UStaxWords, type='b') UStaxWords$Reagan <- Reagan ReaganMdl <- EntireTaxCodeAndRegs~year + Reagan fitReagan <- lm(ReaganMdl, UStaxWords ) summary(fitReagan)
a cross-section from 1997
number of observations : 5999
observation : households
country : Vietnam
data(VietNamH)
data(VietNamH)
A dataframe containing :
gender of household head (male,female)
age of household head
schooling year of household head
farm household ?
urban household ?
household size
log household total expenditure
log household medical expenditure
log household food expenditure
log of total household health care expenditure for 12 months
commune
Vietnam World Bank Livings Standards Survey.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp.88–90.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a cross-section from 1997
number of observations : 27765
observation : individuals
country : Vietnam
data(VietNamI)
data(VietNamI)
A dataframe containing :
number of direct pharmacy visits
log of total medical expenditure
age of household head
gender (male,female)
married ?
completed diploma level ?
number of of illnesses experiences in past 12 months
injured during survey period ?
number of illness days
number of days of limited activity
respondent has health insurance coverage ?
commune
Vietnam World Bank Livings Standards Survey.
Cameron, A.C. and P.K. Trivedi (2005) Microeconometrics : methods and applications, Cambridge, pp.848–853.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
a panel of 595 observations from 1976 to 1982
number of observations : 4165
observation : individuals
country : United States
data(Wages)
data(Wages)
A dataframe containing :
years of full-time work experience
weeks worked
blue collar ?
works in a manufacturing industry ?
resides in the south ?
resides in a standard metropolitan statistical are ?
married ?
a factor with levels (male,female)
individual's wage set by a union contract ?
years of education
is the individual black ?
logarithm of wage
Cornwell, C. and P. Rupert (1988) “Efficient estimation with panel data: an empirical comparison of instrumental variables estimators”, Journal of Applied Econometrics, 3, 149–155.
Panel study of income dynamics.
Baltagi, Badi H. (2003) Econometric analysis of panel data, John Wiley and sons, https://www.wiley.com/legacy/wileychi/baltagi/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a panel of 595 observations from 1976 to 1982
number of observations : 3294
observation : individuals
country : United States
data(Wages1)
data(Wages1)
A time series containing :
experience in years
a factor with levels (male,female)
years of schooling
wage (in 1980 $) per hour
Verbeek, Marno (2004) A Guide to Modern Econometrics, John Wiley and Sons.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
,
a cross-section from 1987
number of observations : 3382
observation : individuals
country : United States
data(Workinghours)
data(Workinghours)
A dataframe containing :
wife working hours per year
the other household income in hundreds of dollars
age of the wife
education years of the wife
number of children for ages 0 to 5
number of children for ages 6 to 13
number of children for ages 14 to 17
non–white ?
is the home owned by the household ?
is the home on mortgage ?
occupation of the husband, one of mp (manager or
local unemployment rate in %
Lee, Myoung–Jae (1995) “Semi–parametric estimation of simultaneous equations with limited dependent variables : a case study of female labour supply”, Journal of Applied Econometrics, 10(2), April–June, 187–200.
Journal of Applied Econometrics data archive : http://qed.econ.queensu.ca/jae/.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations
weekly observations from 1975 to 1989
number of observations : 778
observation : country
country : Japan
data(Yen)
data(Yen)
A dataframe containing :
the date of the observation (19850104 is January, 4, 1985)
the ask price of the dollar in units of Yen in the spot market on Friday of the current week
the ask price of the dollar in units of Yen in the 30-day forward market on Friday of the current week
the bid price of the dollar in units of Yen in the spot market on the delivery date on a current forward contract
Bekaert, G. and R. Hodrick (1993) “On biases in the measurement of foreign exchange risk premiums”, Journal of International Money and Finance, 12, 115-138.
Hayashi, F. (2000) Econometrics, Princeton University Press, http://fhayashi.fc2web.com/hayashi_econometrics.htm, chapter 6, 438-443.
DM
,
Pound
,
Index.Source
,
Index.Economics
,
Index.Econometrics
,
Index.Observations
,
Index.Time.Series
a cross-section
number of observations : 2412
observation : individuals
country : United States
data(Yogurt)
data(Yogurt)
A dataframe containing :
individuals identifiers
one of yoplait
, dannon
,
hiland
, weight
(weight
watcher)
is there a newspaper feature advertisement for brand z?
price of brand z
Jain, Dipak C., Naufel J. Vilcassim and Pradeep K. Chintagunta (1994) “A random–coefficients logit brand–choice model applied to panel data”, Journal of Business and Economics Statistics, 12(3), 317.
Journal of Business Economics and Statistics web site : https://amstat.tandfonline.com/loi/ubes20.
Index.Source
, Index.Economics
, Index.Econometrics
, Index.Observations