31 January 2013

repmis: misc. tools for reproducible research in R

I've started to put together an R package called repmis. It has miscellaneous tools for reproducible research with R. The idea behind the package is to collate commands that simplify some of the common R code used within knitr-type reproducible research papers.

It's still very much in the early stages of development and has two commands:

  • LoadandCite: a command to load all of the R packages used in a paper and create a BibTeX file containing citation information for them. It can also install the packages if they are on CRAN.
  • source_GitHubData: a command for downloading plain-text formatted data stored on GitHub or at any other secure (https) URL.

I've written about why you might want to use source_GitHubData before (see here and here).

You can use LoadandCite in a code chunk near the beginning of a knitr reproducible research document to load all of the R packages you will use in the document and automatically generate a BibTeX file you can draw on to cite them. Here's an example:

# Create vector of package names
PackagesUsed <- c("knitr", "xtable")

# Load and Cite
repmis::LoadandCite(PackagesUsed, file = "PackageCitations.bib") 

LoadandCite draws on knitr's write_bib command to create the bibliographies, so each citation is given a BibTeX key like this: R-package_name. For example the key for the xtable package is R-xtable. Be careful to save the citations in a new .bib file, because LoadandCite overwrites existing files.

Citation of R packages is very inconsistent in academic publications. Hopefully by making it easier to cite packages more people will do so.

Install/Constribute

Instructions for how to install repmis are available here.

Please feel free to fork the package and suggest additional commands that could be included.

2 comments:

Toby Dylan Hocking said...

Have you considered a method to specify package versions in LoadandCite? Since packages change over time, it is useful to specify which versions were used. I proposed to deal with this problem with a works_with_R() header in The difficulty of reproducible research using R.

Christopher Gandrud said...

Great idea! I just implemented it. Please download the newest version and let me know if there is anything else I can improve.