Showing posts with label data. Show all posts
Showing posts with label data. Show all posts

11 May 2014

Updates to repmis: caching downloaded data and Excel data downloading

Over the past few months I’ve added a few improvements to the repmis–miscellaneous functions for reproducible research–R package. I just want to briefly highlight two of them:

  • Caching downloaded data sets.

  • source_XlsxData for downloading data in Excel formatted files.

Both of these capabilities are in repmis version 0.2.9 and greater.

Caching

When working with data sourced directly from the internet, it can be time consuming (and make the data hoster angry) to repeatedly download the data. So, repmis’s source functions (source_data, source_DropboxData, and source_XlsxData) can now cache a downloaded data set by setting the argument cache = TRUE. For example:

DisData <- source_data("http://bit.ly/156oQ7a", cache = TRUE)

When the function is run again, the data set at http://bit.ly/156oQ7a will be loaded locally, rather than downloaded.

To delete the cached data set, simply run the function again with the argument clearCache = TRUE.

source_XlsxData

I recently added the source_XlsxData function to download Excel data sets directly into R. This function works very similarly to the other source functions. There are two differences:

  • You need to specify the sheet argument. This is either the name of one specific sheet in the downloaded Excel workbook or its number (e.g. the first sheet in the workbook would be sheet = 1).

  • You can pass other arguments to the read.xlsx function from the xlsx package.

Here’s a simple example:

RRurl <- 'http://www.carmenreinhart.com/user_uploads/data/22_data.xls'

RRData <- source_XlsxData(url = RRurl, sheet = 2, startRow = 5)

startRow = 5 basically drops the first 4 rows of the sheet.

25 March 2012

Disproportionality Data

So I was hunting around for some data on disproportional electoral outcomes (when the proportion of voters cast for political parties is not close to the proportion of legislative seats that they win).

Michael Gallagher keeps an updated version of his Least Squares (or Gallagher) Index of electoral disproportionality on his website, however it is in PDF format; very inconvenient for using in any stats project.

John Carey & Simon Hix have some nice data--that includes much of Gallagher's data and some countries he doesn't cover--in easy to use Stata format (here). This is the data from their recent Electoral Sweet Spot paper (see here). However it only goes to 2003.

I combined the best of these two data sets into one .csv file and am making it available so that hopefully others can use their research time for better things than copying and pasting data from a PDF file. You can easily import this data into R or Stata or whatever you may use.

The data set is downloadable HERE. More details on how I combined the data can be found there as well.

I couldn't stop myself from making a few descriptive figures with the data. The first is a map of average disproportionality between 2000 and 2011. The second plots disproportionality over time (you can see there hasn't been much change).


Gallagher Electoral Disproportionality Averaged Over Elections from 2000 through 2011



As always, the R code: