24 November 2011

7 November 2011

Standardise Country Names For Stata Data

If you regularly put together data sets for cross-country analysis, you'll probably know that it's a real pain to standardise country names so that you can merge together files from different sources.

For example, you want to merge two data sets: A and B. In data set A the country Bosnia and Herzegovina is referred to as "Bosnia-Hertz" and in B it is called "Bosnia-Herzegovina". To merge them into one file that you can use for data analysis you have to find this discrepancy and then change at least one of the names so that they both are the same. This is really tedious to do across multiple data sets with tens or hundreds of countries.

Over the years I've created a Stata Do-file that standardises country names and attaches their IMF country codes. You can find the file here

It clearly only standardises country name variations that I've come across. An easy way to check if a country name has not been standardised is to see if the do-file did not attach an IMF country code, i.e. use the Stata code:

list country if imfcode == .

Hopefully this will save people some time. 

If you use my do-file please cite this blog post. Also, feel free to suggest additions/changes.

2 November 2011

Reproducibility in Research

This post by Mario Pineda-Krch complains about the woeful lack of reproducibility in computational sciences.

This reminded me of Jake Bowers's good piece in the Political Methodologist from earlier this year about how to do reproducible computational political science. The article actually inspired me to completely switch over all of my new writing to Sweave. Sweave allows you to combine your R code and LaTeX documents. If you make your Sweave document and data available to readers they can completely reproduce everything in your article: the models, the table, the graphs, everything. 

RStudio makes using Sweave really easy (though I still use a text editor for writing much of the code since RStudio doesn't do spellcheck). 

Political economy and political science journals don't seem to have been keeping up with these developments. In fact, poli sci journals often require MS Word documents and don't allow you to submit Sweave documents. Few journal submission systems even allow authors to submit data and code appendixes before the paper has been accepted or R&R-ed.