Cross-National Data on the Web
- Widely-Used Compendia of Development Data
- Health and Health Care Data
- Data for Latin America and the Caribbean
- Infant, Child, and Maternal Mortality
- Family Planning
- Infant Immunization
- Economic Affluence
- Educational Attainment
- Income Poverty and Income Inequality
- Water and Sanitation
- Geographical Variables
- Democracy, Civil and Political Rights, Women in Parliament
- State Capacity
- Free-Market Orientation
As of July 28, 2011, all of these links worked
1. Widely-Used Compendia of Development Data
The World Bank World Development Indicators (“WDI”) are the most widely-cited data pertaining to economic and social development. The WDI include hundreds of variables pertaining to GDP per capita, income inequality, income poverty, mortality (adult, infant, maternal), life expectancy at birth, age at first marriage, fertility, population in different age groups, urban population, population density, contraceptive prevalence, birth attendance, doctors per capita, nurses per capita, hospital beds per capita, access to sanitation and safe water, immunization rates, adult illiteracy rates, HIV prevalence, etc.. Some indicators are disaggregated by gender, urban/rural, and so on. To use the WDI statistical compiler, (1) select a country or countries, then click “next”; (2) select variables (series), then click “next”; and (3) select years, then click “apply changes.”
The Quality of Government datasets housed at the University of Gothenburg, Sweden (but published in English), consists of about 2500 indicators on national quality of governance (as measured by such indicators as corruption, bureaucratic quality, democracy), some of its hypothesized causes (e.g., colonial origin, religion, and ethnolinguistic fractionalization), and some of its hypothesized consequences (e.g., GDP per capita, educational attainment, infant mortality, gender bias, environmental sustainability, life satisfaction, and trust). The indicators were compiled from about 100 sources. The Standard TS [time series] dataset has data on each indicator in each year from 1946 to 2016, provided a credible estimate can be found. The unit of analysis is the country-year — hence, Sweden-1946, Sweden-1947, and so on.
UNdata includes 33 regularly updated databases with more than 60 million separate pieces of information on such topics as “Agriculture, Crime, Education, Employment, Energy, Environment, Health, HIV/AIDS, Human Development, Industry, Information and Communication Technology, National Accounts, Population, Refugees, Tourism, and Trade.”
The Human Development Reports published annually by the United Nations Development Programme (UNDP) include a wide range of statistical data related to people’s ability to live a long and healthy life, acquire knowledge, and achieve a decent standard of living. Besides the Human Development Report Office’s annual global Reports covering all countries, associated agencies in most countries of the world have published national and subnational Human Development Reports. The Human Development Report 2010, the 20th Anniversary Edition, reaffirms the benefits of the whole 20-year project and tweaks significantly, with careful justification, the algorithms for calculating the major indices, including the Human Development Index. By 2015, however, some the 2010 revisions had been re-thought. A comparison of the technical notes of the 2013 and 2015 reports shows that in 2015 the two education sub-indices were combined using the arithmetic rather than the geometric mean, as had been the case before 2010; and that the “goalposts” for the health, education, and standard of living indices had gone back to being stipulated a priori (e.g., 85 for life expectancy, 15 for average years of schooling, etc.) rather than being set, in accordance with the 2010 revisions, to the highest and lowest actual levels across all countries, causing the “goalposts” to shift every year (which is not helpful for time-series analysis). You can access the Human Development Report data directly by visiting this website.
UNICEF has published State of the World’s Children annually since 1980. Each edition includes statistical data on indicators related to the well-being of children. UNICEF’s Multiple Indicator Cluster Survey website has data on children and women, including updated infant and under-5 mortality statistics and access to recent survey reports and data.
The United Nations Population Division has a wide range of useful demographic data, some of which are accessible through this user-friendly data query system.
The Norwegian Social Science Data Services keeps current a useful MacroData Guide (in English) with links to state-of-the-art sources of cross-national data on demography, economics, education, health, labor and employment, crime, corruption, natural resources, politics, conflict, human rights, inequality, gender, religion, and other topics.
2. Health and Health Care Data
The World Health Organization’s Global Health Observatory (GHO) has a large cross-national data repository, especially for years since 1990, on mortality, disease incidence, child nutrition, child health, maternal and reproductive health, immunization, HIV/AIDS, tuberculosis, malaria, water and sanitation), non communicable diseases and risk factors, health systems, environmental health, injuries, and violence.
Since the mid-1980s, some 300 Demographic and Health Surveys (DHS) have been conducted in 90 developing countries. The surveys produce highly regarded data on maternal and child health service delivery and related indicators. The DHS website includes a useful stat complier.
3. Data for Latin America and the Caribbean
The Socio-Economic Database for Latin America and the Caribbean (SEDLAC), housed at the Center for Distributional, Labor and Social Studies (CEDLAS) at Argentina’s Universidad Nacional de La Plata, uses microdata from over 300 household surveys in 24 countries to produce comparable statistics on per capita income, income inequality, income poverty, household size, educational attainment, housing quality, durable goods ownership, access to electricity, safe water, and adequate sanitation, employment, and eligibility for disability and retirement pensions. Some of the variables are disaggregated by gender. The data pertain mostly to the period since 1990, but in some countries statistics go back as far as 1974. Country experts update the tables when microdata from a new survey become available, and new databases are published approximately twice per year. The website can be used in either English or Spanish.
The Pan American Health Organization (PAHO), an agency of the World Health Organization, provides health data on 48 countries and territories in the Western Hemisphere, from Canada to St. Lucia to Argentina. The site includes a statistical compiler called Core Indicators – Interactive Version.
4. Infant, Child, and Maternal Mortality
The infant mortality figures published in the World Bank’s World Development Indicators are based on estimates compiled by the Inter-agency Group on Child Mortality Estimation, which includes specialists from the World Bank, World Health Organization, UNICEF, and the United Nations Population Division. The census, survey, and vital registration data underlying these estimates are available at the child mortality website of the Interagency Group on Child Mortality Estimation. The Interagency Group has a similar site for maternal mortality data.
The method used to produce the Interagency Group infant and under-5 mortality estimates was described initially in Kenneth Hill et al., Trends in Child Mortality in the Developing World: 1960-1996. New York: UNICEF, 1999. “Levels and Trends of Child Mortality in 2006: Estimates Developed By the Inter-agency Group for Child Mortality Estimation” provides a more comprehensive account of the methodology employed, along with infant and under-5 mortality estimates for most of the world’s countries through 2005.
Since the mid-1980s, some 200 Demographic and Health Surveys (DHS) have been conducted in 75 developing countries. The surveys provide highly regarded data on infant and child mortality, among other indicators.
UNICEF’s Multiple Indicator Cluster Survey website has data on children and women, including updated infant and under-5 mortality statistics and access to recent survey reports and data. Designed and administered by UNICEF, other international organizations, and local government agencies, the MICS surveys are tailored to suit the particular informational needs of the host country.
You can query the United Nations Population Division’s World Population Prospects for infant and under-5 mortality estimates by country by five-year period (e.g., for 2010-2015). Data are available from 1950 to (worrisomely) 2100.
The UN Maternal Mortality Estimation Inter-Agency Group provides maternal mortality estimates for 171 countries in each year from 1985 to 2015. The data are available here and are described in a 2015 article in the journal Lancet.
5. Family Planning
The Demographic and Health Surveys (DHS) have data on indicators of family planning.
Researchers associated with the Track20 project of Avenir Health (formerly the Futures Group) have used an expert rating system to measure family planning program effort in 60 poor developing countries. The researchers send questionnaires to country experts, aggregate the responses into component measures of different aspects of family planning effort, and assign each country an overall score equal to its achieved percentage of the maximum attainable score on the combined components. The results of surveys from 1972, 1982, 1989, 1994, 1999, 2004, 2009, and 2014 are available here.
6. Infant Immunization
In June 2000, researchers at UNICEF and the World Health Organization began a concerted effort to evaluate and reconcile data on immunization coverage around the world. Their goal was to produce, for as many countries as possible and for each year from 1980 onward, a “consensus estimate” of the share of a target population (usually children surviving to age 1) that had been immunized with a specific antigen. To produce these estimates, they reviewed and evaluated all available immunization coverage information for as many countries as possible for as many years as possible from 1980 onward. The first estimates were released in 2001; updated series are available here.
The World Health Organization (WHO) has data on infant and child undernourishment. Some countries have data spanning the early 1980s to the early 2000s. The sources of the data for each country are unusually well-described.
The Food and Agriculture Organization (FAO) has compiled data for about 190 countries on the proportion of the total, adult, and child populations that suffer from hunger (less than the calorie requirement for an active and healthy life in a particular country), as well as on many other food and nutrition-related indicators (food needs; food, protein, and micronutrient availability; food trade; food aid). The data are described in an appendix to the annual The State of Food Insecurity in the World.
8. Economic Affluence
The Penn World Table 9.0 has information on GDP in 182 countries for some or all of the years from 1950 to 2014. For studies comparing living standards across countries and over time, expenditure-side real GDP at chained PPPs, a particular GDP measure (RGDPe) is recommended. To get the per capita figure for a particular country-year, divide the variable rgdpe (in column “E” of the .xlsx version of the database) by the variable pop (in column “G” of the .xlsx version of the database).
The Maddison Project Database provides statistics on population, total GDP, and GDP per capita (at PPP, in constant 1990 international Geary-Khamis dollars) for many countries and regional aggregates (e.g., Latin America as a whole) over the very long run, as well as the short run. GDP per capita estimates are available for 16 countries for 1 C.E., and for larger and larger numbers of countries for subsequent years. The data come up only to 2010, but they go back much farther than the Penn World Table data.
As of early 2017 the World Bank World Development Indicators included 17 measures of GDP (Gross Domestic Product) and 14 indicators of GNI (Gross National Income, formerly known as Gross National Product). The measures differ from one another on several dimensions, including (1) levels vs. growth rates, (2) total vs. per capita vs. per worker vs. per unit of energy use, (3) current US dollars vs. constant international dollars, and (4) market exchange rates vs. purchasing power parity. For studies comparing living standards across countries and over time, use GDP or GNI per capita figures in constant international dollars at purchasing power parity. As of early 2017 these variables were labeled “GDP per capita, PPP (constant 2011 international $)” and “GNI per capita, PPP (constant 2011 international $).” GDP is a measure of output; GNI is a measure of income. Among 149 countries with 2014 data on both of these indicators, the difference exceeded 10 percent in only 10 countries. In general, GNI is higher than GDP in countries that receive a lot of remittances from foreign workers (e.g., Philippines or Bangladesh) or that have (usually oil-funded) sovereign wealth funds that receive a lot of interest and dividends from investments in foreign countries (Timor Leste, Kuwait, Norway). Conversely, GDP is generally higher than GNI in countries that have high levels of foreign direct investment, often in oil or mineral extraction, and in which a large share of earnings flow back to the host country in the form of repatriated profits (Equatorial Guinea, Ireland, Mongolia, etc.). For the variables “GDP per capita, PPP (constant 2011 international $)” and “GNI per capita, PPP (constant 2011 international $),” data as of early 2017 were available only for the years 1990-2015. For earlier years, use the Penn World Table or Maddison Project data on GDP per capita in constant dollars at purchasing power parity.
9. Educational Attainment
Data on average years of schooling and other measures of educational attainment, as well as the same indicators for females only and for males only, are available at www.barrolee.com for 146 countries at five year intervals from 1950 to 2010. Download the “full dataset” in Excel format (if you want the full dataset for both males and females aged 25 and older, this will put the file BL2013_MF2599_v2.1.xls on your desktop). A particularly useful indicator of educational attainment is “average years of total schooling” in Column “L” of the database. A “Long-term Data” section of the Barro and Lee website includes “estimated school enrollment ratios from 1820 to 2010 and estimated educational attainment for the total, female and male populations from 1870 to 2010. The estimates are available in five-year intervals for 111 countries.”
For illiteracy, see the World Bank World Development Indicators and the UNDP Human Development Reports listed at the top of this page under “Major Statistical Compendia.”
The UNESCO Institute for Statistics has recent data on enrollment ratios, repetition rates, and other educational indicators for most of the world’s countries.
10. Income Poverty and Income Inequality
A good source of national income inequality data for a large number of countries around the world is the World Income Inequality Database Version 3.4 (Jan 2017) hosted by the World Institute of Development Economics Research (WIDER) in Helsinki. The data cover 182 countries with 8,817 observations from as recently as 2015. You’ll need the .pdf user guide as well as the Excel spreadsheet file.
The Socio-Economic Database for Latin America and the Caribbean (SEDLAC) based at the Universidad Nacional de La Plata in Argentina calculates income inequality measures and poverty headcounts directly from survey microdata in each of 25 countries in the region. Although data for the 1980s and earlier is sparse, SEDLAC is probably the best source for post-1990 income inequality and income poverty data from Latin America.
A useful database of Gini coefficients (measures of income inequality) is Branco Milanovic’s All the Ginis. Gini coefficients range from 0 (everyone has the same amount of money) to 1 (one person has all the money, everyone else has none). You’ll need the description as well as the data. The link provides the data in Stata (.dta) format; if you would prefer Excel (.xlsx) I have converted the .dta file to .xlsx and stored it here. In cases where more than one Gini estimate exists for a given country-year, the “Giniall” column (Col. BD) indicates which one Milanovic finds most credible. The All the Ginis database does not use interpolation or extrapolation to fill in missing values, specifies the source from which each Gini was obtained, and indicates whether each Gini pertains to income or consumption (consumption Ginis tend to be lower) and, if it pertains to income, whether it registers the distribution of pre-tax-and-transfer or post-tax-and-transfer income (post-tax-and-transfer Ginis tend to be lower).
All the Ginis also indicates whether a particular Gini pertains to distribution across households or individuals. In the USA, the rich are less likely than the poor to be single, or single parents. So, rich households tend to be larger than poor households, which means that their higher incomes are divided among a larger number of people. This depresses the income share of rich individuals relative to rich households, and makes Ginis adjusted for household size (“equivalence adjusted”) lower than Ginis not so adjusted. In poor countries, rich families tend to have fewer children than poor families. So, rich households tend to be smaller than poor households, which means that their higher incomes are divided among a smaller number of individuals. This increases the income share of rich individuals relative to rich households, and makes Ginis adjusted for household size (“equivalence adjusted”) higher than Ginis not so adjusted.
The Global Consumption and Income Project (GCID), initiated in April 2016, provides annual Gini coefficients for 160 countries (Lahoti, Jayadev, and Reddy 2016). Its website has a data visualization application showing time series for up to four countries from 1960 to 2015…when it works, which was not always the case when I tried it on a couple of different browsers. A distinctive feature of this database is that it separates Gini estimates based on income from Gini estimates based on consumption. It uses interpolation and extrapolation to fill in missing values. As of February 2017 they appeared still to be working out the kinks…the income series for South Korea is not plausible.
The Standardized World Income Inequality Database version 5.1 (SWIID) developed by Frederick Solt provides annual Gini coefficients for 176 countries for as many years as possible from 1960 to 2014. Drawing on data collected by more than a dozen well-known compendia, it uses statistical techniques (including extrapolation and interpolation) to provide Ginis based on both gross and net per capita income, providing 95 percent confidence intervals around each annual point estimate. The website has a data visualization application. The data are described in Frederick Solt, “The Standardized World Income Inequality Database,” Social Science Quarterly 97.5 (2016), 1267-1281.
The strengths and weaknesses of the major compendia of data on income inequality are discussed in “Appraising Cross-National Income Inequality Databases,” a special issue of the Journal of Economic Inequality 13 No. 4 (December 2015).
This webpage is the gateway to the World Bank’s data on income poverty and income inequality.
11. Water and Sanitation
The WHO/UNICEF Joint Monitoring Programme for Water Supply and Sanitation provides carefully collected data on the proportion of the population with access to safe water and adequate sanitation. The most detailed information is to be found in the country files, some of which contain estimates from as far back as 1980. In most cases, however, UNICEF considers data collected before 1990 as significantly lower in quality than data from 1990 forward.
12. Geographical Variables
Data on land area, proportion of the population near the coast, latitude, population, and other such variables have been assembled by John Gallup, Andrew Mellinger, and Jeffrey Sachs. To obtain them, go to a web page at the Harvard Center for International Development and:
1. Scroll down to Geography Data Sets and click on General Measures of Geography. You’ll see 1) Physical geography and population (Revised data 9/04/01).
2. On a Mac (sorry, don’t know what to do on a PC), hold down “option” and click on “ASCII file (comma delimited),” which downloads a .csv comma-delimited file to your desktop.
3. Open up a newish version of Excel, and from within Excel, open the comma-delimited .csv file (to do this, you may need to switch within the “open” dialog box in Excel from “all readable documents” to “all documents”). Then save the resulting document as an Excel workbook.
4. The variable names won’t make much sense without the “Description of Data,” which is available as a Word file just above the line that says “ASCII file (comma delimited).”
13. Democracy, Civil and Political Rights, Women in Parliament
The most comprehensive dataset of quantitative democracy indicators is probably Polity IV, compiled by Ted Robert Gurr, Keith Jaggers, Monty Marshall, and their collaborators. This dataset stands out for its long empirical time frame (data go all the way back to a country’s date of independence, with the cut-off at 1800), its transparent and detailed coding rules, and its use of multiple coders and of tests of inter-coder reliability. To create the database, coders drawing on secondary literature assigned each of the world’s independent nations, in each year from 1800 to 2008, scores on “democracy” and “autocracy.” The scores are based on three sets of criteria: (1) “openness and competitiveness of the recruitment of the chief executive”; (2) “constraints on the authority of the chief executive”; and (3) “political participation and opposition.” Each criterion has subcomponents. For example, political participation and opposition includes “regulation of participation” (how much factionalism and personalism there is in politics) and “competitiveness of participation” (how much incumbents restrict political opposition). The subcomponents and components are scored, weighted, and combined to form a democracy score ranging from 10 to 0, as well as an autocracy score ranging from 0 to -10 (10 is most democratic, -10 is most autocratic). The two scores are then combined to form a “Polity” score ranging from 10 (most democratic) to -10 (most autocratic). The coding is done transparently and systematically and is checked for inter-coder consistency.
The data and a guide are at the Polity IV Project gateway page. Go there, scroll down to Polity IV Data Series version 2010, click on the link, then scroll down to Polity IV: Regime Authority Characteristics and Transitions Datasets,” and click on “Polity IV Users’ Manual pdf file” for the codebook (on the left side of the web page) and on “Excel times[sic]-series data” for the data (on the right side of the web page).
Freedom House since 1972 has rated countries annually on “political rights” and “civil liberties.” Go here and scroll down to “Freedom in the World Comparative and Historical Data.” One of the links under this heading allows you to download to your desktop an Excel spreadsheet titled “Individual country ratings and status,” which has the political rights and civil liberties time series for 195 countries and 15 territories from 1972 to present. The methodology by which the 2014 scores were produced is described here.
Freedom House since 2003 has rated countries annually on a finer 100-point scale, with political rights given 40 points (electoral process 12, political pluralism and participation 16, functioning of government 12) and civil liberties given 60 points (freedom of expression and belief 16, associational and organizational rights 12, rule of law 16, and personal autonomy and individual rights 16). These ratings (for the seven subcomponents, as well as in aggregate) are available here.
A useful description and critique of quantitative democracy indicators, including the Polity and Freedom House indicators, is Gerardo Munck and Jay Verkuilen, “Conceptualizing and Measuring Democracy: Evaluating Alternative Indices.” Comparative Political Studies 35 No. 1 (February 2002), 5-34. The article and commentary on it are available here if you or your institution subscribe to this journal. Another useful paper by Gerry Munck on this topic is, for the time being (April 2012), here.
The International Institute for Democracy and Electoral Assistance (IDEA) has useful databases on voter turnout, electoral systems, and gender quotas for national legislative seats. Click here and try the links under “Databases and Networks.”
14. State Capacity
A World Bank webpage provides access to aggregate governance indicators for 212 countries for 1996-2007 for six dimensions of governance: voice and accountability, political stability and absence of violence, government effectiveness, regulatory quality, rule of law, and control of corruption.
15. Free-Market Orientation
The Fraser Institute in Vancouver, BC, rates most countries of the world for 1970, 1975, 1980, 1985, 1990, 1995, and each year from 2000 to 2009 according to how closely each conforms to what the Institute defines as a free-market system. To make sense of the data you’ll need to consult the Report. The 2010 Report is downloadable in .pdf format, and the data in Excel format, at the Fraser Institute website.